unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* New GC concept
@ 2021-06-04  3:30 Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Daniel Colascione @ 2021-06-04  3:30 UTC (permalink / raw)
  To: emacs-devel

Emacs has had the same GC for a decent amount of time now (since the 
1980s, really). I spent some time in 2020 rewriting it from scratch. I 
haven't had time to work on the new GC recently, but I figure I'd throw 
it out here to get some feedback on the general concept.

Check out 
https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c, 
specifically the big doc comment on top

The new GC basically replaces alloc.c and a few other things. It has a 
few cool features:

* fully copying and compacting

* special treatment of sxhash to preserve object identify even while we 
move it around in memory

* generational

* contiguous storage of mark bits separately from the data heap

* concurrent (in design, not current implementation): idea is that we do 
concurrent marking and barely pause for sweep

* small string optimization

* bump pointer allocation of new objects

* heap enumeration support

* hard requirement on pdumper

* specialized GC spaces for conses, strings, arrays, and so on: no 
stupid header word for cons cells bloating memory use by 50%!

* cool modern C implementation that relies heavily on compiler inlining 
and constant propagation

The current implementation is deficient in many ways. Honestly, I'm not 
even sure whether that specific revision compiles. But like I said, I 
haven't had time recently to continue work on it.

Still, I'm still curious about what people think of the overall effort. 
It might work nicely with the new native compilation stuff, giving us a 
managed code execution environment kind-of, sort-of on par with the big 
modern managed-code runtimes.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
@ 2021-06-04  8:00 ` Daniel Mendler
  2021-06-04  9:47   ` Daniel Colascione
  2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
  2021-06-07 17:32 ` Matt Armstrong
  2 siblings, 1 reply; 16+ messages in thread
From: Daniel Mendler @ 2021-06-04  8:00 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Interesting, thank you for working on this!

I had hoped that a new GC would surface at some point given the recent
improvements regarding native compilation. As you say this can bring
Emacs on par with modern managed language runtimes. Can you elaborate a
bit more of some of your concepts?

> * fully copying and compacting

How do you ensure that compaction works together with the conservative
stack scanning? You pin memory blocks, which are potentially referenced
by the stack?

> * generational

Do you rely on the OS memory protection as a write barrier to separate
the different generations, similar to how the bdwgc does that?

> * specialized GC spaces for conses, strings, arrays, and so on: no
stupid header word for cons cells bloating memory use by 50%!

Avoiding the headers is a good idea. You are using a first fit strategy
to find the next free space for a new object. How do you use the
headerless approach for objects of different sizes? Isn't it the case
that every block should then contain objects of only a single type and
of a single size? Probably most of the objects fall in a few size
classes, so it may be possible to do better than first fit?

Overall is your design similar to the approach of the bdwgc plus that a
memory/object layout tailored to the needs of Emacs and the compaction?
How well does such a GC hold up to a GC which is precise and does not
rely on the OS facilities for barriers? It appears such a precise GC is
impossible to retrofit on the existing Elisp runtime, so I assume your
approach is the right way to go.

Daniel Mendler



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
@ 2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
  2021-06-07 17:32 ` Matt Armstrong
  2 siblings, 0 replies; 16+ messages in thread
From: Andrea Corallo via Emacs development discussions. @ 2021-06-04  8:56 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> Emacs has had the same GC for a decent amount of time now (since the
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
> haven't had time to work on the new GC recently, but I figure I'd
> throw it out here to get some feedback on the general concept.
>
> Check out
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
> specifically the big doc comment on top
>
> The new GC basically replaces alloc.c and a few other things. It has a
> few cool features:
>
> * fully copying and compacting
>
> * special treatment of sxhash to preserve object identify even while
>   we move it around in memory
>
> * generational
>
> * contiguous storage of mark bits separately from the data heap
>
> * concurrent (in design, not current implementation): idea is that we
>   do concurrent marking and barely pause for sweep
>
> * small string optimization
>
> * bump pointer allocation of new objects
>
> * heap enumeration support
>
> * hard requirement on pdumper
>
> * specialized GC spaces for conses, strings, arrays, and so on: no
>   stupid header word for cons cells bloating memory use by 50%!
>
> * cool modern C implementation that relies heavily on compiler
>   inlining and constant propagation
>
> The current implementation is deficient in many ways. Honestly, I'm
> not even sure whether that specific revision compiles. But like I
> said, I haven't had time recently to continue work on it.
>
> Still, I'm still curious about what people think of the overall
> effort. It might work nicely with the new native compilation stuff,
> giving us a managed code execution environment kind-of, sort-of on par
> with the big modern managed-code runtimes.

Sounds cool!

The only comment I've so far is that IMO *the* important feature for a
new Emacs GC is to have it concurrent (or say concurrent as much as
possible).

Emacs user experience is often dictated by its reactivity, we need to
head towards a GC that is concurrent prioritizing in the design this
feature over others, I wouldn't mind sacrificing some efficiency for
that.

I like the idea of a moving/generational GC but possibily porting what
we have to a tri-color mark and sweep would solve already the problem
with less impact.  This is what I would have tried if I had time.

Thanks for this work!

  Andrea



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  8:00 ` Daniel Mendler
@ 2021-06-04  9:47   ` Daniel Colascione
  2021-06-04 10:50     ` Eli Zaretskii
  2021-06-04 11:06     ` Daniel Mendler
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Colascione @ 2021-06-04  9:47 UTC (permalink / raw)
  To: Daniel Mendler, emacs-devel

On 6/4/21 1:00 AM, Daniel Mendler wrote:

> Interesting, thank you for working on this!
>
> I had hoped that a new GC would surface at some point given the recent
> improvements regarding native compilation. As you say this can bring
> Emacs on par with modern managed language runtimes. Can you elaborate a
> bit more of some of your concepts?

Thanks for taking a look.

>> * fully copying and compacting
> How do you ensure that compaction works together with the conservative
> stack scanning? You pin memory blocks, which are potentially referenced
> by the stack?

Yes, pinning is how we combine conservative stack scanning with a 
copying collector. We don't pin whole memory blocks though. We pin at 
object granularity. (Things like Hosking's "mostly copying collector" 
use block pinning IIRC, but we can do much better these days.)

Just as each object has a mark bit, each object has a pin bit. We pin 
only those specific objects that conservative scanning flags as 
potentially referenced from native code. We still copy pinned objects 
from the from-space to the to-space actually --- it's important copy 
pinned objects because it's during copying that we update all the 
pointers that a pinned object might contain. Pinning just ensures that 
we copy in a specific way such that after we're done with GC and swap 
the to-space and from-space, each pinned object ends up at the same 
virtual address it had before GC started. This way, although we *do* 
copy pinned objects, the mutator never observes a pinned object changing 
position.

The pin bits end up using very little memory because they're stored 
contiguously in side arrays and almost entirely zero, and each zero page 
shares the same backing RAM until something makes it non-zero. Like I 
mentioned in the new alloc.c, address space is abundant.

>> * generational
> Do you rely on the OS memory protection as a write barrier to separate
> the different generations, similar to how the bdwgc does that?

Correct. IMHO, it's not practical to retrofit write barriers or read 
barriers into Emacs.

>> * specialized GC spaces for conses, strings, arrays, and so on: no
> stupid header word for cons cells bloating memory use by 50%!
>
> Avoiding the headers is a good idea. You are using a first fit strategy
> to find the next free space for a new object. How do you use the
> headerless approach for objects of different sizes?

We don't. :-)

In the new GC, the overall Emacs heap is divided into "heaps" for 
various object types; each heap has its own list of blocks and its own 
heap memory format. The heaps for fixed-size objects like cons cells and 
intervals don't have headers. The heaps for variable-sized objects like 
strings and vectorlikes *do* use conventional object headers.

> Isn't it the case
> that every block should then contain objects of only a single type and
> of a single size?

Some heaps (most importantly, the vectorlike heap) do support 
variable-sized objects, and blocks belonging to these heap types contain 
a mixture of object types.

>   Probably most of the objects fall in a few size
> classes, so it may be possible to do better than first fit?

First-fit is better than it sounds in the context of a compacting 
collector. First-fit allocation always (except in two cases described 
below) succeeds on the first try: because each GC pass compacts all the 
objects at the "start" of the heap, and we start first-fit allocation 
from the end of the last compacted object. That's why I wrote that the 
first-fit allocation scheme is equivalent in practice to bump-pointer 
allocation.

The two cases where we fail first-fit allocation are:

1) we're in a heap that supports variable-sized objects and there's not 
enough space in the current block to hold the object we're allocating, and

2) there's a pinned object "in the way" of where we want to place the 
object via first-fit allocation.

#1 isn't a problem in practice: if we're trying to allocate an object 
that's too big to place in the tail of the object's heap's current 
block, we allocate a new block and put the new object there instead. The 
new object is guaranteed to fit in the new block because we allocate 
larger-than-block objects in a separate storage area, as is traditional 
in GCs of this sort. (See the large vector list.) When we move to a new 
block this way, we don't commit the memory of the tail of the previous 
block, so moving to the next block is practically free, modulo page-tail 
wastage.

#2 isn't a problem either: pinned objects are rare, and when we do 
encounter one, we can "skip over" it efficiently using the free-object 
bitmap. Modern machines are really good at streaming analysis of bit 
arrays: we don't even need a freelist embedded in the heap, like Emacs 
currently has for conses. Scanning a bitmap is both simpler and kinder 
to the cache. Because pinned objects are rare, because pins are 
transient, and because each GC pass is a compacting pass, first-fit 
doesn't lead to the fragmentation that it normally causes in things like 
malloc implementations.

> Overall is your design similar to the approach of the bdwgc plus that a
> memory/object layout tailored to the needs of Emacs and the compaction?
> How well does such a GC hold up to a GC which is precise and does not
> rely on the OS facilities for barriers? It appears such a precise GC is
> impossible to retrofit on the existing Elisp runtime, so I assume your
> approach is the right way to go.

Most other GCs use software barriers, true. But even that's changing. 
Relying on OS facilities for barriers has an important advantage: it 
reduces code size. If we used the non-OS-level facility in a native 
compilation world, we'd have to emit a write barrier before *every* 
mutator write (~6 instructions). These barriers add up and bloat the 
generated code. If we use OS memory protection instead, the generated 
code can be a lot smaller. Plus, using OS facilities, we don't have to 
change the rest of the Emacs C core.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  9:47   ` Daniel Colascione
@ 2021-06-04 10:50     ` Eli Zaretskii
  2021-06-21 13:00       ` Fejfighter
  2021-06-04 11:06     ` Daniel Mendler
  1 sibling, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-04 10:50 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: mail, emacs-devel

> From: Daniel Colascione <dancol@dancol.org>
> Date: Fri, 4 Jun 2021 02:47:32 -0700
> 
> On 6/4/21 1:00 AM, Daniel Mendler wrote:
> 
> > Interesting, thank you for working on this!
> >
> > I had hoped that a new GC would surface at some point given the recent
> > improvements regarding native compilation. As you say this can bring
> > Emacs on par with modern managed language runtimes. Can you elaborate a
> > bit more of some of your concepts?
> 
> Thanks for taking a look.

Seconded.

And I really hope that more than just a look (and a discussion) will
come out of this.  Making our GC more efficient is an important
development goal, of which over the years we've seen several attempts,
but unfortunately little advancement in practice.  I hope interested
individuals will step forward and continue developing this (or any
other) initiative so that we will eventually be able to replace our GC
with a better one.

Thanks in advance for developing this aspect of Emacs.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  9:47   ` Daniel Colascione
  2021-06-04 10:50     ` Eli Zaretskii
@ 2021-06-04 11:06     ` Daniel Mendler
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel Mendler @ 2021-06-04 11:06 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Thank you for answering my questions. Did you also consider the approach
of using a non-copying collector, keeping the classical mark and sweep?
Andrea mentioned in his mail that the focus should be on reactivity.
With mark and sweep it is possible to avoid the copying step, which
scales with the live size, such that one can achieve constant pause times.

On the other hand the copying step is probably quick for the expected
Emacs heap sizes. Furthermore with m&s, you have the fragmentation
problem, it is harder to use such a bump-style allocator and it is
harder to separate the generations, which is a requirement for the
hardware write barrier. So I think overall your design is a sound
approach as long as the heap stays at a reasonable size.

Your approach seems to be quite general purpose and does not require
intrusive changes to the Emacs C code; it seems to be relatively
decoupled from the Elisp runtime. Do you think it is realistic to
implement the GC as a "library" behind some abstract interface? Of
course there is some dependence on the object memory layout, but the GC
interface could offer APIs to request objects of different types. It
should then be possible to use different GCs which use a similar
approach (conservative stack scanning, no explicit read/write barriers).

Daniel Mendler




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04  3:30 New GC concept Daniel Colascione
  2021-06-04  8:00 ` Daniel Mendler
  2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
@ 2021-06-07 17:32 ` Matt Armstrong
  2021-06-07 18:03   ` Daniel Colascione
  2 siblings, 1 reply; 16+ messages in thread
From: Matt Armstrong @ 2021-06-07 17:32 UTC (permalink / raw)
  To: Daniel Colascione, emacs-devel

Daniel Colascione <dancol@dancol.org> writes:

> Emacs has had the same GC for a decent amount of time now (since the 
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I 
> haven't had time to work on the new GC recently, but I figure I'd throw 
> it out here to get some feedback on the general concept.
>
> Check out 
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c, 
> specifically the big doc comment on top

Hey Daniel, I am no GC expert but I'm liking this a lot.  I love the
block comments in your alloc.c -- very clear and easy to understand.
You're in a uniquely good position to work on this.  I hope you
continue!

I'm curious about the answer to one of the unanswered questions in your
alloc.c FAQ: What about systems without virtual memory?  Asked another
way: can we reasonably expect to entirely replace the current GC with
this new one?  Are there platforms Emacs supports today that would be
left behind?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-07 17:32 ` Matt Armstrong
@ 2021-06-07 18:03   ` Daniel Colascione
  2021-06-07 19:51     ` Daniele Nicolodi
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-07 18:03 UTC (permalink / raw)
  To: Matt Armstrong, emacs-devel

On 6/7/21 10:32 AM, Matt Armstrong wrote:

> Daniel Colascione <dancol@dancol.org> writes:
>
>> Emacs has had the same GC for a decent amount of time now (since the
>> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
>> haven't had time to work on the new GC recently, but I figure I'd throw
>> it out here to get some feedback on the general concept.
>>
>> Check out
>> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
>> specifically the big doc comment on top
> Hey Daniel, I am no GC expert but I'm liking this a lot.  I love the
> block comments in your alloc.c -- very clear and easy to understand.
> You're in a uniquely good position to work on this.  I hope you
> continue!
Thank you!
> I'm curious about the answer to one of the unanswered questions in your
> alloc.c FAQ: What about systems without virtual memory?  Asked another
> way: can we reasonably expect to entirely replace the current GC with
> this new one?  Are there platforms Emacs supports today that would be
> left behind?

We can definitely replace the existing GC with the new GC everywhere. 
I've designed the new GC to work on systems without virtual memory 
facilities. On these systems, we'll have to run the GC in 
non-concurrent, non-generational mode, but that's no regression from 
what we have today. We'll also probably want to use a smaller block size 
on these systems to reduce fragmentation overhead.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-07 18:03   ` Daniel Colascione
@ 2021-06-07 19:51     ` Daniele Nicolodi
  2021-06-08  2:22       ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Daniele Nicolodi @ 2021-06-07 19:51 UTC (permalink / raw)
  To: emacs-devel

On 07/06/2021 20:03, Daniel Colascione wrote:
> On 6/7/21 10:32 AM, Matt Armstrong wrote:
>> I'm curious about the answer to one of the unanswered questions in your
>> alloc.c FAQ: What about systems without virtual memory?  Asked another
>> way: can we reasonably expect to entirely replace the current GC with
>> this new one?  Are there platforms Emacs supports today that would be
>> left behind?
> 
> We can definitely replace the existing GC with the new GC everywhere. 
> I've designed the new GC to work on systems without virtual memory 
> facilities. On these systems, we'll have to run the GC in 
> non-concurrent, non-generational mode, but that's no regression from 
> what we have today. We'll also probably want to use a smaller block size 
> on these systems to reduce fragmentation overhead.

Isn't DOS the only system in this class? (It is not a rhetorical
question: a while ago I asked which systems are officially supports and
the answer was that all systems that currently run Emacs are supported).

Does it make sense to still support DOS?

Cheers,
Dan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-07 19:51     ` Daniele Nicolodi
@ 2021-06-08  2:22       ` Eli Zaretskii
  2021-06-21 22:58         ` Daniel Colascione
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-08  2:22 UTC (permalink / raw)
  To: Daniele Nicolodi; +Cc: emacs-devel

> From: Daniele Nicolodi <daniele@grinta.net>
> Date: Mon, 7 Jun 2021 21:51:45 +0200
> 
> > We can definitely replace the existing GC with the new GC everywhere. 
> > I've designed the new GC to work on systems without virtual memory 
> > facilities. On these systems, we'll have to run the GC in 
> > non-concurrent, non-generational mode, but that's no regression from 
> > what we have today. We'll also probably want to use a smaller block size 
> > on these systems to reduce fragmentation overhead.
> 
> Isn't DOS the only system in this class? (It is not a rhetorical
> question: a while ago I asked which systems are officially supports and
> the answer was that all systems that currently run Emacs are supported).
> 
> Does it make sense to still support DOS?

The development environment which is used to build the MS-DOS port of
Emacs (DJGPP) does support virtual memory (IIUC what that means in
this context).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-04 10:50     ` Eli Zaretskii
@ 2021-06-21 13:00       ` Fejfighter
  2021-06-21 13:31         ` Eli Zaretskii
  2021-06-21 22:43         ` Daniel Colascione
  0 siblings, 2 replies; 16+ messages in thread
From: Fejfighter @ 2021-06-21 13:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mail, Daniel Colascione, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

Not wanting to see this drop, I found a little time to bring the changes up
to emacs master with a few notes:

1) now sits on top of a4fb5811f (Do not attempt to write .elc....)
2) The C will compile with a couple of warnings as the native comp cases
are currently not handled.
3) There's dead code, commented code and other atrocities on top of the
current wip
3) It segfaults during the pdumper step in the build, there is not
immediately obvious reason for this, but I suspect a move or free occurs
and it's not tracked
4) I have ignored the comments for large_vector and large_vector_meta for
now, so the meta is kept with the vector struct.

code is up here: https://github.com/fejfighter/emacs/tree/feature/newgc
I'm hoping now that it's a little closer to master and compiling someone
might have a little more luck with the segfault issue I have been facing
but I will keep try in the mean time,

JeffW

On Fri, Jun 4, 2021 at 8:52 PM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Daniel Colascione <dancol@dancol.org>
> > Date: Fri, 4 Jun 2021 02:47:32 -0700
> >
> > On 6/4/21 1:00 AM, Daniel Mendler wrote:
> >
> > > Interesting, thank you for working on this!
> > >
> > > I had hoped that a new GC would surface at some point given the recent
> > > improvements regarding native compilation. As you say this can bring
> > > Emacs on par with modern managed language runtimes. Can you elaborate a
> > > bit more of some of your concepts?
> >
> > Thanks for taking a look.
>
> Seconded.
>
> And I really hope that more than just a look (and a discussion) will
> come out of this.  Making our GC more efficient is an important
> development goal, of which over the years we've seen several attempts,
> but unfortunately little advancement in practice.  I hope interested
> individuals will step forward and continue developing this (or any
> other) initiative so that we will eventually be able to replace our GC
> with a better one.
>
> Thanks in advance for developing this aspect of Emacs.
>
>

[-- Attachment #2: Type: text/html, Size: 2715 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-21 13:00       ` Fejfighter
@ 2021-06-21 13:31         ` Eli Zaretskii
  2021-06-21 22:43         ` Daniel Colascione
  1 sibling, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-21 13:31 UTC (permalink / raw)
  To: Fejfighter; +Cc: mail, dancol, emacs-devel

> From: Fejfighter <fejfighter@gmail.com>
> Date: Mon, 21 Jun 2021 23:00:09 +1000
> Cc: mail@daniel-mendler.de, Daniel Colascione <dancol@dancol.org>,
>  emacs-devel@gnu.org
> 
> Not wanting to see this drop, I found a little time to bring the changes up to emacs master with a few notes:

Thanks for working on this.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-21 13:00       ` Fejfighter
  2021-06-21 13:31         ` Eli Zaretskii
@ 2021-06-21 22:43         ` Daniel Colascione
  2021-07-24 13:39           ` Fejfighter
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-21 22:43 UTC (permalink / raw)
  To: Fejfighter, Eli Zaretskii; +Cc: mail, emacs-devel

On 6/21/21 6:00 AM, Fejfighter wrote:
> Not wanting to see this drop, I found a little time to bring the 
> changes up to emacs master with a few notes:
>
> 1) now sits on top of a4fb5811f (Do not attempt to write .elc....)


Awesome. Thanks!


> 2) The C will compile with a couple of warnings as the native comp 
> cases are currently not handled.
> 3) There's dead code, commented code and other atrocities on top of 
> the current wip

Yep. There were also plenty of atrocities in the original. :-)

> 3) It segfaults during the pdumper step in the build, there is not 
> immediately obvious reason for this, but I suspect a move or free 
> occurs and it's not tracked
> 4) I have ignored the comments for large_vector and large_vector_meta 
> for now, so the meta is kept with the vector struct.

Yeah. That's probably a minor optimization, but we should get around to 
completing it. The next big chunk of work is actually implementing 
concurrent marking.

> code is up here: 
> https://github.com/fejfighter/emacs/tree/feature/newgc 
> <https://github.com/fejfighter/emacs/tree/feature/newgc>
> I'm hoping now that it's a little closer to master and compiling 
> someone might have a little more luck with the segfault issue I have 
> been facing but I will keep try in the mean time,

FWIW, I find rr [1] to be exceptionally useful for diagnosing segfaults 
like this.


[1] https://rr-project.org/




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-08  2:22       ` Eli Zaretskii
@ 2021-06-21 22:58         ` Daniel Colascione
  2021-06-22 12:59           ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-21 22:58 UTC (permalink / raw)
  To: Eli Zaretskii, Daniele Nicolodi; +Cc: emacs-devel

On 6/7/21 7:22 PM, Eli Zaretskii wrote:
>> From: Daniele Nicolodi <daniele@grinta.net>
>> Date: Mon, 7 Jun 2021 21:51:45 +0200
>>
>>> We can definitely replace the existing GC with the new GC everywhere.
>>> I've designed the new GC to work on systems without virtual memory
>>> facilities. On these systems, we'll have to run the GC in
>>> non-concurrent, non-generational mode, but that's no regression from
>>> what we have today. We'll also probably want to use a smaller block size
>>> on these systems to reduce fragmentation overhead.
>> Isn't DOS the only system in this class? (It is not a rhetorical
>> question: a while ago I asked which systems are officially supports and
>> the answer was that all systems that currently run Emacs are supported).
>>
>> Does it make sense to still support DOS?
> The development environment which is used to build the MS-DOS port of
> Emacs (DJGPP) does support virtual memory (IIUC what that means in
> this context).

Oh, right. I completely forgot that we have DPMI.

It's been a very long time since I looked at that. Does DJGPP provide 
DPMI 0.9 or 1.0?

To get generational GC under DJGPP, we'll need something like a SIGSEGV 
handler, a bit of code that we run when the CPU signals a memory 
protection fault. I think we get there by installing an exception 
interrupt handler, as in 
http://www.delorie.com/djgpp/doc/dpmi/ch4.5.html, and I think it'll work 
in both DPMI 0.9 and 1.0. Another thing we need for generational GC is 
the ability to mark a range of pages read-only, as with mprotect. I 
think DPMI gives us the ability to change page permissions, but 0.9 does 
not. See http://www.delorie.com/djgpp/doc/dpmi/api/310507.html

The other thing we get with VM is the ability to swap the from-space and 
the to-space without an additional memory copy. DPMI 1.0 appears to 
provide a shared memory facility that would let us do that (the 
equivalent of mmap/MapViewOfFile of an anonymous segment), but I'm not 
sure that DPMI 0.9 gives us that ability.

Anyway, even if it is theoretically possible to implement the new GC's 
fancy VM stuff in terms of DPMI, I think it should have lower priority 
than the rest of the system. The new GC run without virtual memory use 
at all should still be no worse overall than the current GC, so MS-DOS 
Emacs at least wouldn't see a regression if we switched to a version of 
the new GC that didn't understand DPMI.

But DPMI support for the new GC would definitely be a fun retro 
computing project.




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-21 22:58         ` Daniel Colascione
@ 2021-06-22 12:59           ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-22 12:59 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: daniele, emacs-devel

> Cc: emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Jun 2021 15:58:20 -0700
> 
> > The development environment which is used to build the MS-DOS port of
> > Emacs (DJGPP) does support virtual memory (IIUC what that means in
> > this context).
> 
> Oh, right. I completely forgot that we have DPMI.
> 
> It's been a very long time since I looked at that. Does DJGPP provide 
> DPMI 0.9 or 1.0?

The DPMI provider which comes with DJGPP supports DPMI 0.9 with some
extensions.  (If one runs a DJGPP program on MS-Windows, one gets what
Windows provides instead, which is DPMI 0.9.)

> To get generational GC under DJGPP, we'll need something like a SIGSEGV 
> handler, a bit of code that we run when the CPU signals a memory 
> protection fault. I think we get there by installing an exception 
> interrupt handler, as in 
> http://www.delorie.com/djgpp/doc/dpmi/ch4.5.html, and I think it'll work 
> in both DPMI 0.9 and 1.0.

Yes, this is supported.

> Another thing we need for generational GC is 
> the ability to mark a range of pages read-only, as with mprotect. I 
> think DPMI gives us the ability to change page permissions, but 0.9 does 
> not. See http://www.delorie.com/djgpp/doc/dpmi/api/310507.html

DJGPP has mprotect.  It indeed requires DPMI 1.0, but it is also one
of the extensions supported by the DPMI provider that comes with
DJGPP.

> The other thing we get with VM is the ability to swap the from-space and 
> the to-space without an additional memory copy. DPMI 1.0 appears to 
> provide a shared memory facility that would let us do that (the 
> equivalent of mmap/MapViewOfFile of an anonymous segment), but I'm not 
> sure that DPMI 0.9 gives us that ability.

Right, this requires DPMI 1.0.

> Anyway, even if it is theoretically possible to implement the new GC's 
> fancy VM stuff in terms of DPMI, I think it should have lower priority 
> than the rest of the system. The new GC run without virtual memory use 
> at all should still be no worse overall than the current GC, so MS-DOS 
> Emacs at least wouldn't see a regression if we switched to a version of 
> the new GC that didn't understand DPMI.

Yes, definitely.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New GC concept
  2021-06-21 22:43         ` Daniel Colascione
@ 2021-07-24 13:39           ` Fejfighter
  0 siblings, 0 replies; 16+ messages in thread
From: Fejfighter @ 2021-07-24 13:39 UTC (permalink / raw)
  To: Daniel Colascione; +Cc: mail, Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]

On Tue, Jun 22, 2021 at 8:43 AM Daniel Colascione <dancol@dancol.org> wrote:

> On 6/21/21 6:00 AM, Fejfighter wrote:
>
>
> > code is up here:
> > https://github.com/fejfighter/emacs/tree/feature/newgc
> > <https://github.com/fejfighter/emacs/tree/feature/newgc>
> > I'm hoping now that it's a little closer to master and compiling
> > someone might have a little more luck with the segfault issue I have
> > been facing but I will keep try in the mean time,
>
> FWIW, I find rr [1] to be exceptionally useful for diagnosing segfaults
> like this.
>
>
> [1] https://rr-project.org/
>

While I had no luck with rr, I did trace that particular issue to not
unprotecting memory, which got me through a little further to a point of a
compacting sweep.
This is where I have been spinning my wheels for the last few weeks in the
short bursts of time I get to look at this codebase.

The problem is highlighted by xxx_check_obarray, but will show up in future
reads, where the ob array is not swept and does not get the updated
references.
I feel like it might be a special case where a global vector has
references, because when marking we traverse the array as required, but
this does not occur when sweeping, however, it does not affect other
vector-like things.

I think I will need to update the obarray, but simply calling
`scan_vectorlike(XPNTR(Vobarray), GC_PHASE_SWEEP);`  has bad values at that
point.

I'm hoping that this will either jog your memory and provide some
background or get someone curious that might have a different understanding
of how it all interacts and we can get over this particular hump,

Thanks,
Jeff W

[-- Attachment #2: Type: text/html, Size: 2479 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-24 13:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  3:30 New GC concept Daniel Colascione
2021-06-04  8:00 ` Daniel Mendler
2021-06-04  9:47   ` Daniel Colascione
2021-06-04 10:50     ` Eli Zaretskii
2021-06-21 13:00       ` Fejfighter
2021-06-21 13:31         ` Eli Zaretskii
2021-06-21 22:43         ` Daniel Colascione
2021-07-24 13:39           ` Fejfighter
2021-06-04 11:06     ` Daniel Mendler
2021-06-04  8:56 ` Andrea Corallo via Emacs development discussions.
2021-06-07 17:32 ` Matt Armstrong
2021-06-07 18:03   ` Daniel Colascione
2021-06-07 19:51     ` Daniele Nicolodi
2021-06-08  2:22       ` Eli Zaretskii
2021-06-21 22:58         ` Daniel Colascione
2021-06-22 12:59           ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).