unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* New GC concept
@ 2021-06-04  3:30 Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Daniel Colascione @ 2021-06-04  3:30 UTC (permalink / raw)
  To: emacs-devel

Emacs has had the same GC for a decent amount of time now (since the 
1980s, really). I spent some time in 2020 rewriting it from scratch. I 
haven't had time to work on the new GC recently, but I figure I'd throw 
it out here to get some feedback on the general concept.

Check out 
https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c, 
specifically the big doc comment on top

The new GC basically replaces alloc.c and a few other things. It has a 
few cool features:

* fully copying and compacting

* special treatment of sxhash to preserve object identify even while we 
move it around in memory

* generational

* contiguous storage of mark bits separately from the data heap

* concurrent (in design, not current implementation): idea is that we do 
concurrent marking and barely pause for sweep

* small string optimization

* bump pointer allocation of new objects

* heap enumeration support

* hard requirement on pdumper

* specialized GC spaces for conses, strings, arrays, and so on: no 
stupid header word for cons cells bloating memory use by 50%!

* cool modern C implementation that relies heavily on compiler inlining 
and constant propagation

The current implementation is deficient in many ways. Honestly, I'm not 
even sure whether that specific revision compiles. But like I said, I 
haven't had time recently to continue work on it.

Still, I'm still curious about what people think of the overall effort. 
It might work nicely with the new native compilation stuff, giving us a 
managed code execution environment kind-of, sort-of on par with the big 
modern managed-code runtimes.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
@ 2021-06-04  8:00 ` Daniel Mendler
  2021-06-04  9:47   ` Daniel Colascione
  2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
  2021-06-07 17:32 ` Matt Armstrong
  2 siblings, 1 reply; 10+ messages in thread
From: Daniel Mendler @ 2021-06-04  8:00 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Interesting, thank you for working on this!

I had hoped that a new GC would surface at some point given the recent
improvements regarding native compilation. As you say this can bring
Emacs on par with modern managed language runtimes. Can you elaborate a
bit more of some of your concepts?

> * fully copying and compacting

How do you ensure that compaction works together with the conservative
stack scanning? You pin memory blocks, which are potentially referenced
by the stack?

> * generational

Do you rely on the OS memory protection as a write barrier to separate
the different generations, similar to how the bdwgc does that?

> * specialized GC spaces for conses, strings, arrays, and so on: no
stupid header word for cons cells bloating memory use by 50%!

Avoiding the headers is a good idea. You are using a first fit strategy
to find the next free space for a new object. How do you use the
headerless approach for objects of different sizes? Isn't it the case
that every block should then contain objects of only a single type and
of a single size? Probably most of the objects fall in a few size
classes, so it may be possible to do better than first fit?

Overall is your design similar to the approach of the bdwgc plus that a
memory/object layout tailored to the needs of Emacs and the compaction?
How well does such a GC hold up to a GC which is precise and does not
rely on the OS facilities for barriers? It appears such a precise GC is
impossible to retrofit on the existing Elisp runtime, so I assume your
approach is the right way to go.

Daniel Mendler



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
@ 2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
  2021-06-07 17:32 ` Matt Armstrong
  2 siblings, 0 replies; 10+ messages in thread
From: Andrea Corallo via Emacs development discussions. @ 2021-06-04  8:56 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> Emacs has had the same GC for a decent amount of time now (since the
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
> haven't had time to work on the new GC recently, but I figure I'd
> throw it out here to get some feedback on the general concept.
>
> Check out
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
> specifically the big doc comment on top
>
> The new GC basically replaces alloc.c and a few other things. It has a
> few cool features:
>
> * fully copying and compacting
>
> * special treatment of sxhash to preserve object identify even while
>   we move it around in memory
>
> * generational
>
> * contiguous storage of mark bits separately from the data heap
>
> * concurrent (in design, not current implementation): idea is that we
>   do concurrent marking and barely pause for sweep
>
> * small string optimization
>
> * bump pointer allocation of new objects
>
> * heap enumeration support
>
> * hard requirement on pdumper
>
> * specialized GC spaces for conses, strings, arrays, and so on: no
>   stupid header word for cons cells bloating memory use by 50%!
>
> * cool modern C implementation that relies heavily on compiler
>   inlining and constant propagation
>
> The current implementation is deficient in many ways. Honestly, I'm
> not even sure whether that specific revision compiles. But like I
> said, I haven't had time recently to continue work on it.
>
> Still, I'm still curious about what people think of the overall
> effort. It might work nicely with the new native compilation stuff,
> giving us a managed code execution environment kind-of, sort-of on par
> with the big modern managed-code runtimes.

Sounds cool!

The only comment I've so far is that IMO *the* important feature for a
new Emacs GC is to have it concurrent (or say concurrent as much as
possible).

Emacs user experience is often dictated by its reactivity, we need to
head towards a GC that is concurrent prioritizing in the design this
feature over others, I wouldn't mind sacrificing some efficiency for
that.

I like the idea of a moving/generational GC but possibily porting what
we have to a tri-color mark and sweep would solve already the problem
with less impact.  This is what I would have tried if I had time.

Thanks for this work!

  Andrea



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  8:00 ` Daniel Mendler
@ 2021-06-04  9:47   ` Daniel Colascione
  2021-06-04 10:50     ` Eli Zaretskii
  2021-06-04 11:06     ` Daniel Mendler
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Colascione @ 2021-06-04  9:47 UTC (permalink / raw)
  To: Daniel Mendler, emacs-devel

On 6/4/21 1:00 AM, Daniel Mendler wrote:

> Interesting, thank you for working on this!
>
> I had hoped that a new GC would surface at some point given the recent
> improvements regarding native compilation. As you say this can bring
> Emacs on par with modern managed language runtimes. Can you elaborate a
> bit more of some of your concepts?

Thanks for taking a look.

>> * fully copying and compacting
> How do you ensure that compaction works together with the conservative
> stack scanning? You pin memory blocks, which are potentially referenced
> by the stack?

Yes, pinning is how we combine conservative stack scanning with a 
copying collector. We don't pin whole memory blocks though. We pin at 
object granularity. (Things like Hosking's "mostly copying collector" 
use block pinning IIRC, but we can do much better these days.)

Just as each object has a mark bit, each object has a pin bit. We pin 
only those specific objects that conservative scanning flags as 
potentially referenced from native code. We still copy pinned objects 
from the from-space to the to-space actually --- it's important copy 
pinned objects because it's during copying that we update all the 
pointers that a pinned object might contain. Pinning just ensures that 
we copy in a specific way such that after we're done with GC and swap 
the to-space and from-space, each pinned object ends up at the same 
virtual address it had before GC started. This way, although we *do* 
copy pinned objects, the mutator never observes a pinned object changing 
position.

The pin bits end up using very little memory because they're stored 
contiguously in side arrays and almost entirely zero, and each zero page 
shares the same backing RAM until something makes it non-zero. Like I 
mentioned in the new alloc.c, address space is abundant.

>> * generational
> Do you rely on the OS memory protection as a write barrier to separate
> the different generations, similar to how the bdwgc does that?

Correct. IMHO, it's not practical to retrofit write barriers or read 
barriers into Emacs.

>> * specialized GC spaces for conses, strings, arrays, and so on: no
> stupid header word for cons cells bloating memory use by 50%!
>
> Avoiding the headers is a good idea. You are using a first fit strategy
> to find the next free space for a new object. How do you use the
> headerless approach for objects of different sizes?

We don't. :-)

In the new GC, the overall Emacs heap is divided into "heaps" for 
various object types; each heap has its own list of blocks and its own 
heap memory format. The heaps for fixed-size objects like cons cells and 
intervals don't have headers. The heaps for variable-sized objects like 
strings and vectorlikes *do* use conventional object headers.

> Isn't it the case
> that every block should then contain objects of only a single type and
> of a single size?

Some heaps (most importantly, the vectorlike heap) do support 
variable-sized objects, and blocks belonging to these heap types contain 
a mixture of object types.

>   Probably most of the objects fall in a few size
> classes, so it may be possible to do better than first fit?

First-fit is better than it sounds in the context of a compacting 
collector. First-fit allocation always (except in two cases described 
below) succeeds on the first try: because each GC pass compacts all the 
objects at the "start" of the heap, and we start first-fit allocation 
from the end of the last compacted object. That's why I wrote that the 
first-fit allocation scheme is equivalent in practice to bump-pointer 
allocation.

The two cases where we fail first-fit allocation are:

1) we're in a heap that supports variable-sized objects and there's not 
enough space in the current block to hold the object we're allocating, and

2) there's a pinned object "in the way" of where we want to place the 
object via first-fit allocation.

#1 isn't a problem in practice: if we're trying to allocate an object 
that's too big to place in the tail of the object's heap's current 
block, we allocate a new block and put the new object there instead. The 
new object is guaranteed to fit in the new block because we allocate 
larger-than-block objects in a separate storage area, as is traditional 
in GCs of this sort. (See the large vector list.) When we move to a new 
block this way, we don't commit the memory of the tail of the previous 
block, so moving to the next block is practically free, modulo page-tail 
wastage.

#2 isn't a problem either: pinned objects are rare, and when we do 
encounter one, we can "skip over" it efficiently using the free-object 
bitmap. Modern machines are really good at streaming analysis of bit 
arrays: we don't even need a freelist embedded in the heap, like Emacs 
currently has for conses. Scanning a bitmap is both simpler and kinder 
to the cache. Because pinned objects are rare, because pins are 
transient, and because each GC pass is a compacting pass, first-fit 
doesn't lead to the fragmentation that it normally causes in things like 
malloc implementations.

> Overall is your design similar to the approach of the bdwgc plus that a
> memory/object layout tailored to the needs of Emacs and the compaction?
> How well does such a GC hold up to a GC which is precise and does not
> rely on the OS facilities for barriers? It appears such a precise GC is
> impossible to retrofit on the existing Elisp runtime, so I assume your
> approach is the right way to go.

Most other GCs use software barriers, true. But even that's changing. 
Relying on OS facilities for barriers has an important advantage: it 
reduces code size. If we used the non-OS-level facility in a native 
compilation world, we'd have to emit a write barrier before *every* 
mutator write (~6 instructions). These barriers add up and bloat the 
generated code. If we use OS memory protection instead, the generated 
code can be a lot smaller. Plus, using OS facilities, we don't have to 
change the rest of the Emacs C core.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  9:47   ` Daniel Colascione
@ 2021-06-04 10:50     ` Eli Zaretskii
  2021-06-04 11:06     ` Daniel Mendler
  1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2021-06-04 10:50 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: mail, emacs-devel

> From: Daniel Colascione <dancol@dancol.org>
> Date: Fri, 4 Jun 2021 02:47:32 -0700
> 
> On 6/4/21 1:00 AM, Daniel Mendler wrote:
> 
> > Interesting, thank you for working on this!
> >
> > I had hoped that a new GC would surface at some point given the recent
> > improvements regarding native compilation. As you say this can bring
> > Emacs on par with modern managed language runtimes. Can you elaborate a
> > bit more of some of your concepts?
> 
> Thanks for taking a look.

Seconded.

And I really hope that more than just a look (and a discussion) will
come out of this.  Making our GC more efficient is an important
development goal, of which over the years we've seen several attempts,
but unfortunately little advancement in practice.  I hope interested
individuals will step forward and continue developing this (or any
other) initiative so that we will eventually be able to replace our GC
with a better one.

Thanks in advance for developing this aspect of Emacs.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  9:47   ` Daniel Colascione
  2021-06-04 10:50     ` Eli Zaretskii
@ 2021-06-04 11:06     ` Daniel Mendler
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Mendler @ 2021-06-04 11:06 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Thank you for answering my questions. Did you also consider the approach
of using a non-copying collector, keeping the classical mark and sweep?
Andrea mentioned in his mail that the focus should be on reactivity.
With mark and sweep it is possible to avoid the copying step, which
scales with the live size, such that one can achieve constant pause times.

On the other hand the copying step is probably quick for the expected
Emacs heap sizes. Furthermore with m&s, you have the fragmentation
problem, it is harder to use such a bump-style allocator and it is
harder to separate the generations, which is a requirement for the
hardware write barrier. So I think overall your design is a sound
approach as long as the heap stays at a reasonable size.

Your approach seems to be quite general purpose and does not require
intrusive changes to the Emacs C code; it seems to be relatively
decoupled from the Elisp runtime. Do you think it is realistic to
implement the GC as a "library" behind some abstract interface? Of
course there is some dependence on the object memory layout, but the GC
interface could offer APIs to request objects of different types. It
should then be possible to use different GCs which use a similar
approach (conservative stack scanning, no explicit read/write barriers).

Daniel Mendler




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
  2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
@ 2021-06-07 17:32 ` Matt Armstrong
  2021-06-07 18:03   ` Daniel Colascione
  2 siblings, 1 reply; 10+ messages in thread
From: Matt Armstrong @ 2021-06-07 17:32 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> Emacs has had the same GC for a decent amount of time now (since the 
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I 
> haven't had time to work on the new GC recently, but I figure I'd throw 
> it out here to get some feedback on the general concept.
>
> Check out 
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c, 
> specifically the big doc comment on top

Hey Daniel, I am no GC expert but I'm liking this a lot.  I love the
block comments in your alloc.c -- very clear and easy to understand.
You're in a uniquely good position to work on this.  I hope you
continue!

I'm curious about the answer to one of the unanswered questions in your
alloc.c FAQ: What about systems without virtual memory?  Asked another
way: can we reasonably expect to entirely replace the current GC with
this new one?  Are there platforms Emacs supports today that would be
left behind?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-07 17:32 ` Matt Armstrong
@ 2021-06-07 18:03   ` Daniel Colascione
  2021-06-07 19:51     ` Daniele Nicolodi
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Colascione @ 2021-06-07 18:03 UTC (permalink / raw)
  To: Matt Armstrong, emacs-devel

On 6/7/21 10:32 AM, Matt Armstrong wrote:

> Daniel Colascione <dancol@dancol.org> writes:
>
>> Emacs has had the same GC for a decent amount of time now (since the
>> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
>> haven't had time to work on the new GC recently, but I figure I'd throw
>> it out here to get some feedback on the general concept.
>>
>> Check out
>> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
>> specifically the big doc comment on top
> Hey Daniel, I am no GC expert but I'm liking this a lot.  I love the
> block comments in your alloc.c -- very clear and easy to understand.
> You're in a uniquely good position to work on this.  I hope you
> continue!
Thank you!
> I'm curious about the answer to one of the unanswered questions in your
> alloc.c FAQ: What about systems without virtual memory?  Asked another
> way: can we reasonably expect to entirely replace the current GC with
> this new one?  Are there platforms Emacs supports today that would be
> left behind?

We can definitely replace the existing GC with the new GC everywhere. 
I've designed the new GC to work on systems without virtual memory 
facilities. On these systems, we'll have to run the GC in 
non-concurrent, non-generational mode, but that's no regression from 
what we have today. We'll also probably want to use a smaller block size 
on these systems to reduce fragmentation overhead.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-07 18:03   ` Daniel Colascione
@ 2021-06-07 19:51     ` Daniele Nicolodi
  2021-06-08  2:22       ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Daniele Nicolodi @ 2021-06-07 19:51 UTC (permalink / raw)
  To: emacs-devel

On 07/06/2021 20:03, Daniel Colascione wrote:
> On 6/7/21 10:32 AM, Matt Armstrong wrote:
>> I'm curious about the answer to one of the unanswered questions in your
>> alloc.c FAQ: What about systems without virtual memory?  Asked another
>> way: can we reasonably expect to entirely replace the current GC with
>> this new one?  Are there platforms Emacs supports today that would be
>> left behind?
> 
> We can definitely replace the existing GC with the new GC everywhere. 
> I've designed the new GC to work on systems without virtual memory 
> facilities. On these systems, we'll have to run the GC in 
> non-concurrent, non-generational mode, but that's no regression from 
> what we have today. We'll also probably want to use a smaller block size 
> on these systems to reduce fragmentation overhead.

Isn't DOS the only system in this class? (It is not a rhetorical
question: a while ago I asked which systems are officially supports and
the answer was that all systems that currently run Emacs are supported).

Does it make sense to still support DOS?

Cheers,
Dan



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: New GC concept
  2021-06-07 19:51     ` Daniele Nicolodi
@ 2021-06-08  2:22       ` Eli Zaretskii
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2021-06-08  2:22 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: emacs-devel

> From: Daniele Nicolodi <daniele@grinta.net>
> Date: Mon, 7 Jun 2021 21:51:45 +0200
> 
> > We can definitely replace the existing GC with the new GC everywhere. 
> > I've designed the new GC to work on systems without virtual memory 
> > facilities. On these systems, we'll have to run the GC in 
> > non-concurrent, non-generational mode, but that's no regression from 
> > what we have today. We'll also probably want to use a smaller block size 
> > on these systems to reduce fragmentation overhead.
> 
> Isn't DOS the only system in this class? (It is not a rhetorical
> question: a while ago I asked which systems are officially supports and
> the answer was that all systems that currently run Emacs are supported).
> 
> Does it make sense to still support DOS?

The development environment which is used to build the MS-DOS port of
Emacs (DJGPP) does support virtual memory (IIUC what that means in
this context).



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-08  2:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  3:30 New GC concept Daniel Colascione
2021-06-04  8:00 ` Daniel Mendler
2021-06-04  9:47   ` Daniel Colascione
2021-06-04 10:50     ` Eli Zaretskii
2021-06-04 11:06     ` Daniel Mendler
2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
2021-06-07 17:32 ` Matt Armstrong
2021-06-07 18:03   ` Daniel Colascione
2021-06-07 19:51     ` Daniele Nicolodi
2021-06-08  2:22       ` Eli Zaretskii

unofficial mirror of emacs-devel@gnu.org 

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/emacs-devel/0 emacs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 emacs-devel emacs-devel/ https://yhetil.org/emacs-devel \
		emacs-devel@gnu.org
	public-inbox-index emacs-devel

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.devel
	nntp://news.gmane.io/gmane.emacs.devel


code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/emacs.git

AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git