* New GC concept
@ 2021-06-04 3:30 Daniel Colascione
2021-06-04 8:00 ` Daniel Mendler
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Daniel Colascione @ 2021-06-04 3:30 UTC (permalink / raw)
To: emacs-devel
Emacs has had the same GC for a decent amount of time now (since the
1980s, really). I spent some time in 2020 rewriting it from scratch. I
haven't had time to work on the new GC recently, but I figure I'd throw
it out here to get some feedback on the general concept.
Check out
https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
specifically the big doc comment on top
The new GC basically replaces alloc.c and a few other things. It has a
few cool features:
* fully copying and compacting
* special treatment of sxhash to preserve object identify even while we
move it around in memory
* generational
* contiguous storage of mark bits separately from the data heap
* concurrent (in design, not current implementation): idea is that we do
concurrent marking and barely pause for sweep
* small string optimization
* bump pointer allocation of new objects
* heap enumeration support
* hard requirement on pdumper
* specialized GC spaces for conses, strings, arrays, and so on: no
stupid header word for cons cells bloating memory use by 50%!
* cool modern C implementation that relies heavily on compiler inlining
and constant propagation
The current implementation is deficient in many ways. Honestly, I'm not
even sure whether that specific revision compiles. But like I said, I
haven't had time recently to continue work on it.
Still, I'm still curious about what people think of the overall effort.
It might work nicely with the new native compilation stuff, giving us a
managed code execution environment kind-of, sort-of on par with the big
modern managed-code runtimes.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 3:30 New GC concept Daniel Colascione
@ 2021-06-04 8:00 ` Daniel Mendler
2021-06-04 9:47 ` Daniel Colascione
2021-06-04 8:56 ` Andrea Corallo via Emacs development discussions.
2021-06-07 17:32 ` Matt Armstrong
2 siblings, 1 reply; 16+ messages in thread
From: Daniel Mendler @ 2021-06-04 8:00 UTC (permalink / raw)
To: Daniel Colascione, emacs-devel
Interesting, thank you for working on this!
I had hoped that a new GC would surface at some point given the recent
improvements regarding native compilation. As you say this can bring
Emacs on par with modern managed language runtimes. Can you elaborate a
bit more of some of your concepts?
> * fully copying and compacting
How do you ensure that compaction works together with the conservative
stack scanning? You pin memory blocks, which are potentially referenced
by the stack?
> * generational
Do you rely on the OS memory protection as a write barrier to separate
the different generations, similar to how the bdwgc does that?
> * specialized GC spaces for conses, strings, arrays, and so on: no
stupid header word for cons cells bloating memory use by 50%!
Avoiding the headers is a good idea. You are using a first fit strategy
to find the next free space for a new object. How do you use the
headerless approach for objects of different sizes? Isn't it the case
that every block should then contain objects of only a single type and
of a single size? Probably most of the objects fall in a few size
classes, so it may be possible to do better than first fit?
Overall is your design similar to the approach of the bdwgc plus that a
memory/object layout tailored to the needs of Emacs and the compaction?
How well does such a GC hold up to a GC which is precise and does not
rely on the OS facilities for barriers? It appears such a precise GC is
impossible to retrofit on the existing Elisp runtime, so I assume your
approach is the right way to go.
Daniel Mendler
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 3:30 New GC concept Daniel Colascione
2021-06-04 8:00 ` Daniel Mendler
@ 2021-06-04 8:56 ` Andrea Corallo via Emacs development discussions.
2021-06-07 17:32 ` Matt Armstrong
2 siblings, 0 replies; 16+ messages in thread
From: Andrea Corallo via Emacs development discussions. @ 2021-06-04 8:56 UTC (permalink / raw)
To: Daniel Colascione; +Cc: emacs-devel
Daniel Colascione <dancol@dancol.org> writes:
> Emacs has had the same GC for a decent amount of time now (since the
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
> haven't had time to work on the new GC recently, but I figure I'd
> throw it out here to get some feedback on the general concept.
>
> Check out
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
> specifically the big doc comment on top
>
> The new GC basically replaces alloc.c and a few other things. It has a
> few cool features:
>
> * fully copying and compacting
>
> * special treatment of sxhash to preserve object identify even while
> we move it around in memory
>
> * generational
>
> * contiguous storage of mark bits separately from the data heap
>
> * concurrent (in design, not current implementation): idea is that we
> do concurrent marking and barely pause for sweep
>
> * small string optimization
>
> * bump pointer allocation of new objects
>
> * heap enumeration support
>
> * hard requirement on pdumper
>
> * specialized GC spaces for conses, strings, arrays, and so on: no
> stupid header word for cons cells bloating memory use by 50%!
>
> * cool modern C implementation that relies heavily on compiler
> inlining and constant propagation
>
> The current implementation is deficient in many ways. Honestly, I'm
> not even sure whether that specific revision compiles. But like I
> said, I haven't had time recently to continue work on it.
>
> Still, I'm still curious about what people think of the overall
> effort. It might work nicely with the new native compilation stuff,
> giving us a managed code execution environment kind-of, sort-of on par
> with the big modern managed-code runtimes.
Sounds cool!
The only comment I've so far is that IMO *the* important feature for a
new Emacs GC is to have it concurrent (or say concurrent as much as
possible).
Emacs user experience is often dictated by its reactivity, we need to
head towards a GC that is concurrent prioritizing in the design this
feature over others, I wouldn't mind sacrificing some efficiency for
that.
I like the idea of a moving/generational GC but possibily porting what
we have to a tri-color mark and sweep would solve already the problem
with less impact. This is what I would have tried if I had time.
Thanks for this work!
Andrea
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 8:00 ` Daniel Mendler
@ 2021-06-04 9:47 ` Daniel Colascione
2021-06-04 10:50 ` Eli Zaretskii
2021-06-04 11:06 ` Daniel Mendler
0 siblings, 2 replies; 16+ messages in thread
From: Daniel Colascione @ 2021-06-04 9:47 UTC (permalink / raw)
To: Daniel Mendler, emacs-devel
On 6/4/21 1:00 AM, Daniel Mendler wrote:
> Interesting, thank you for working on this!
>
> I had hoped that a new GC would surface at some point given the recent
> improvements regarding native compilation. As you say this can bring
> Emacs on par with modern managed language runtimes. Can you elaborate a
> bit more of some of your concepts?
Thanks for taking a look.
>> * fully copying and compacting
> How do you ensure that compaction works together with the conservative
> stack scanning? You pin memory blocks, which are potentially referenced
> by the stack?
Yes, pinning is how we combine conservative stack scanning with a
copying collector. We don't pin whole memory blocks though. We pin at
object granularity. (Things like Hosking's "mostly copying collector"
use block pinning IIRC, but we can do much better these days.)
Just as each object has a mark bit, each object has a pin bit. We pin
only those specific objects that conservative scanning flags as
potentially referenced from native code. We still copy pinned objects
from the from-space to the to-space actually --- it's important copy
pinned objects because it's during copying that we update all the
pointers that a pinned object might contain. Pinning just ensures that
we copy in a specific way such that after we're done with GC and swap
the to-space and from-space, each pinned object ends up at the same
virtual address it had before GC started. This way, although we *do*
copy pinned objects, the mutator never observes a pinned object changing
position.
The pin bits end up using very little memory because they're stored
contiguously in side arrays and almost entirely zero, and each zero page
shares the same backing RAM until something makes it non-zero. Like I
mentioned in the new alloc.c, address space is abundant.
>> * generational
> Do you rely on the OS memory protection as a write barrier to separate
> the different generations, similar to how the bdwgc does that?
Correct. IMHO, it's not practical to retrofit write barriers or read
barriers into Emacs.
>> * specialized GC spaces for conses, strings, arrays, and so on: no
> stupid header word for cons cells bloating memory use by 50%!
>
> Avoiding the headers is a good idea. You are using a first fit strategy
> to find the next free space for a new object. How do you use the
> headerless approach for objects of different sizes?
We don't. :-)
In the new GC, the overall Emacs heap is divided into "heaps" for
various object types; each heap has its own list of blocks and its own
heap memory format. The heaps for fixed-size objects like cons cells and
intervals don't have headers. The heaps for variable-sized objects like
strings and vectorlikes *do* use conventional object headers.
> Isn't it the case
> that every block should then contain objects of only a single type and
> of a single size?
Some heaps (most importantly, the vectorlike heap) do support
variable-sized objects, and blocks belonging to these heap types contain
a mixture of object types.
> Probably most of the objects fall in a few size
> classes, so it may be possible to do better than first fit?
First-fit is better than it sounds in the context of a compacting
collector. First-fit allocation always (except in two cases described
below) succeeds on the first try: because each GC pass compacts all the
objects at the "start" of the heap, and we start first-fit allocation
from the end of the last compacted object. That's why I wrote that the
first-fit allocation scheme is equivalent in practice to bump-pointer
allocation.
The two cases where we fail first-fit allocation are:
1) we're in a heap that supports variable-sized objects and there's not
enough space in the current block to hold the object we're allocating, and
2) there's a pinned object "in the way" of where we want to place the
object via first-fit allocation.
#1 isn't a problem in practice: if we're trying to allocate an object
that's too big to place in the tail of the object's heap's current
block, we allocate a new block and put the new object there instead. The
new object is guaranteed to fit in the new block because we allocate
larger-than-block objects in a separate storage area, as is traditional
in GCs of this sort. (See the large vector list.) When we move to a new
block this way, we don't commit the memory of the tail of the previous
block, so moving to the next block is practically free, modulo page-tail
wastage.
#2 isn't a problem either: pinned objects are rare, and when we do
encounter one, we can "skip over" it efficiently using the free-object
bitmap. Modern machines are really good at streaming analysis of bit
arrays: we don't even need a freelist embedded in the heap, like Emacs
currently has for conses. Scanning a bitmap is both simpler and kinder
to the cache. Because pinned objects are rare, because pins are
transient, and because each GC pass is a compacting pass, first-fit
doesn't lead to the fragmentation that it normally causes in things like
malloc implementations.
> Overall is your design similar to the approach of the bdwgc plus that a
> memory/object layout tailored to the needs of Emacs and the compaction?
> How well does such a GC hold up to a GC which is precise and does not
> rely on the OS facilities for barriers? It appears such a precise GC is
> impossible to retrofit on the existing Elisp runtime, so I assume your
> approach is the right way to go.
Most other GCs use software barriers, true. But even that's changing.
Relying on OS facilities for barriers has an important advantage: it
reduces code size. If we used the non-OS-level facility in a native
compilation world, we'd have to emit a write barrier before *every*
mutator write (~6 instructions). These barriers add up and bloat the
generated code. If we use OS memory protection instead, the generated
code can be a lot smaller. Plus, using OS facilities, we don't have to
change the rest of the Emacs C core.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 9:47 ` Daniel Colascione
@ 2021-06-04 10:50 ` Eli Zaretskii
2021-06-21 13:00 ` Fejfighter
2021-06-04 11:06 ` Daniel Mendler
1 sibling, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-04 10:50 UTC (permalink / raw)
To: Daniel Colascione; +Cc: mail, emacs-devel
> From: Daniel Colascione <dancol@dancol.org>
> Date: Fri, 4 Jun 2021 02:47:32 -0700
>
> On 6/4/21 1:00 AM, Daniel Mendler wrote:
>
> > Interesting, thank you for working on this!
> >
> > I had hoped that a new GC would surface at some point given the recent
> > improvements regarding native compilation. As you say this can bring
> > Emacs on par with modern managed language runtimes. Can you elaborate a
> > bit more of some of your concepts?
>
> Thanks for taking a look.
Seconded.
And I really hope that more than just a look (and a discussion) will
come out of this. Making our GC more efficient is an important
development goal, of which over the years we've seen several attempts,
but unfortunately little advancement in practice. I hope interested
individuals will step forward and continue developing this (or any
other) initiative so that we will eventually be able to replace our GC
with a better one.
Thanks in advance for developing this aspect of Emacs.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 9:47 ` Daniel Colascione
2021-06-04 10:50 ` Eli Zaretskii
@ 2021-06-04 11:06 ` Daniel Mendler
1 sibling, 0 replies; 16+ messages in thread
From: Daniel Mendler @ 2021-06-04 11:06 UTC (permalink / raw)
To: Daniel Colascione, emacs-devel
Thank you for answering my questions. Did you also consider the approach
of using a non-copying collector, keeping the classical mark and sweep?
Andrea mentioned in his mail that the focus should be on reactivity.
With mark and sweep it is possible to avoid the copying step, which
scales with the live size, such that one can achieve constant pause times.
On the other hand the copying step is probably quick for the expected
Emacs heap sizes. Furthermore with m&s, you have the fragmentation
problem, it is harder to use such a bump-style allocator and it is
harder to separate the generations, which is a requirement for the
hardware write barrier. So I think overall your design is a sound
approach as long as the heap stays at a reasonable size.
Your approach seems to be quite general purpose and does not require
intrusive changes to the Emacs C code; it seems to be relatively
decoupled from the Elisp runtime. Do you think it is realistic to
implement the GC as a "library" behind some abstract interface? Of
course there is some dependence on the object memory layout, but the GC
interface could offer APIs to request objects of different types. It
should then be possible to use different GCs which use a similar
approach (conservative stack scanning, no explicit read/write barriers).
Daniel Mendler
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 3:30 New GC concept Daniel Colascione
2021-06-04 8:00 ` Daniel Mendler
2021-06-04 8:56 ` Andrea Corallo via Emacs development discussions.
@ 2021-06-07 17:32 ` Matt Armstrong
2021-06-07 18:03 ` Daniel Colascione
2 siblings, 1 reply; 16+ messages in thread
From: Matt Armstrong @ 2021-06-07 17:32 UTC (permalink / raw)
To: Daniel Colascione, emacs-devel
Daniel Colascione <dancol@dancol.org> writes:
> Emacs has had the same GC for a decent amount of time now (since the
> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
> haven't had time to work on the new GC recently, but I figure I'd throw
> it out here to get some feedback on the general concept.
>
> Check out
> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
> specifically the big doc comment on top
Hey Daniel, I am no GC expert but I'm liking this a lot. I love the
block comments in your alloc.c -- very clear and easy to understand.
You're in a uniquely good position to work on this. I hope you
continue!
I'm curious about the answer to one of the unanswered questions in your
alloc.c FAQ: What about systems without virtual memory? Asked another
way: can we reasonably expect to entirely replace the current GC with
this new one? Are there platforms Emacs supports today that would be
left behind?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-07 17:32 ` Matt Armstrong
@ 2021-06-07 18:03 ` Daniel Colascione
2021-06-07 19:51 ` Daniele Nicolodi
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-07 18:03 UTC (permalink / raw)
To: Matt Armstrong, emacs-devel
On 6/7/21 10:32 AM, Matt Armstrong wrote:
> Daniel Colascione <dancol@dancol.org> writes:
>
>> Emacs has had the same GC for a decent amount of time now (since the
>> 1980s, really). I spent some time in 2020 rewriting it from scratch. I
>> haven't had time to work on the new GC recently, but I figure I'd throw
>> it out here to get some feedback on the general concept.
>>
>> Check out
>> https://github.com/dcolascione/emacs-1/blob/newgc-wip/src/alloc.c,
>> specifically the big doc comment on top
> Hey Daniel, I am no GC expert but I'm liking this a lot. I love the
> block comments in your alloc.c -- very clear and easy to understand.
> You're in a uniquely good position to work on this. I hope you
> continue!
Thank you!
> I'm curious about the answer to one of the unanswered questions in your
> alloc.c FAQ: What about systems without virtual memory? Asked another
> way: can we reasonably expect to entirely replace the current GC with
> this new one? Are there platforms Emacs supports today that would be
> left behind?
We can definitely replace the existing GC with the new GC everywhere.
I've designed the new GC to work on systems without virtual memory
facilities. On these systems, we'll have to run the GC in
non-concurrent, non-generational mode, but that's no regression from
what we have today. We'll also probably want to use a smaller block size
on these systems to reduce fragmentation overhead.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-07 18:03 ` Daniel Colascione
@ 2021-06-07 19:51 ` Daniele Nicolodi
2021-06-08 2:22 ` Eli Zaretskii
0 siblings, 1 reply; 16+ messages in thread
From: Daniele Nicolodi @ 2021-06-07 19:51 UTC (permalink / raw)
To: emacs-devel
On 07/06/2021 20:03, Daniel Colascione wrote:
> On 6/7/21 10:32 AM, Matt Armstrong wrote:
>> I'm curious about the answer to one of the unanswered questions in your
>> alloc.c FAQ: What about systems without virtual memory? Asked another
>> way: can we reasonably expect to entirely replace the current GC with
>> this new one? Are there platforms Emacs supports today that would be
>> left behind?
>
> We can definitely replace the existing GC with the new GC everywhere.
> I've designed the new GC to work on systems without virtual memory
> facilities. On these systems, we'll have to run the GC in
> non-concurrent, non-generational mode, but that's no regression from
> what we have today. We'll also probably want to use a smaller block size
> on these systems to reduce fragmentation overhead.
Isn't DOS the only system in this class? (It is not a rhetorical
question: a while ago I asked which systems are officially supports and
the answer was that all systems that currently run Emacs are supported).
Does it make sense to still support DOS?
Cheers,
Dan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-07 19:51 ` Daniele Nicolodi
@ 2021-06-08 2:22 ` Eli Zaretskii
2021-06-21 22:58 ` Daniel Colascione
0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-08 2:22 UTC (permalink / raw)
To: Daniele Nicolodi; +Cc: emacs-devel
> From: Daniele Nicolodi <daniele@grinta.net>
> Date: Mon, 7 Jun 2021 21:51:45 +0200
>
> > We can definitely replace the existing GC with the new GC everywhere.
> > I've designed the new GC to work on systems without virtual memory
> > facilities. On these systems, we'll have to run the GC in
> > non-concurrent, non-generational mode, but that's no regression from
> > what we have today. We'll also probably want to use a smaller block size
> > on these systems to reduce fragmentation overhead.
>
> Isn't DOS the only system in this class? (It is not a rhetorical
> question: a while ago I asked which systems are officially supports and
> the answer was that all systems that currently run Emacs are supported).
>
> Does it make sense to still support DOS?
The development environment which is used to build the MS-DOS port of
Emacs (DJGPP) does support virtual memory (IIUC what that means in
this context).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-04 10:50 ` Eli Zaretskii
@ 2021-06-21 13:00 ` Fejfighter
2021-06-21 13:31 ` Eli Zaretskii
2021-06-21 22:43 ` Daniel Colascione
0 siblings, 2 replies; 16+ messages in thread
From: Fejfighter @ 2021-06-21 13:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mail, Daniel Colascione, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
Not wanting to see this drop, I found a little time to bring the changes up
to emacs master with a few notes:
1) now sits on top of a4fb5811f (Do not attempt to write .elc....)
2) The C will compile with a couple of warnings as the native comp cases
are currently not handled.
3) There's dead code, commented code and other atrocities on top of the
current wip
3) It segfaults during the pdumper step in the build, there is not
immediately obvious reason for this, but I suspect a move or free occurs
and it's not tracked
4) I have ignored the comments for large_vector and large_vector_meta for
now, so the meta is kept with the vector struct.
code is up here: https://github.com/fejfighter/emacs/tree/feature/newgc
I'm hoping now that it's a little closer to master and compiling someone
might have a little more luck with the segfault issue I have been facing
but I will keep try in the mean time,
JeffW
On Fri, Jun 4, 2021 at 8:52 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Daniel Colascione <dancol@dancol.org>
> > Date: Fri, 4 Jun 2021 02:47:32 -0700
> >
> > On 6/4/21 1:00 AM, Daniel Mendler wrote:
> >
> > > Interesting, thank you for working on this!
> > >
> > > I had hoped that a new GC would surface at some point given the recent
> > > improvements regarding native compilation. As you say this can bring
> > > Emacs on par with modern managed language runtimes. Can you elaborate a
> > > bit more of some of your concepts?
> >
> > Thanks for taking a look.
>
> Seconded.
>
> And I really hope that more than just a look (and a discussion) will
> come out of this. Making our GC more efficient is an important
> development goal, of which over the years we've seen several attempts,
> but unfortunately little advancement in practice. I hope interested
> individuals will step forward and continue developing this (or any
> other) initiative so that we will eventually be able to replace our GC
> with a better one.
>
> Thanks in advance for developing this aspect of Emacs.
>
>
[-- Attachment #2: Type: text/html, Size: 2715 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-21 13:00 ` Fejfighter
@ 2021-06-21 13:31 ` Eli Zaretskii
2021-06-21 22:43 ` Daniel Colascione
1 sibling, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-21 13:31 UTC (permalink / raw)
To: Fejfighter; +Cc: mail, dancol, emacs-devel
> From: Fejfighter <fejfighter@gmail.com>
> Date: Mon, 21 Jun 2021 23:00:09 +1000
> Cc: mail@daniel-mendler.de, Daniel Colascione <dancol@dancol.org>,
> emacs-devel@gnu.org
>
> Not wanting to see this drop, I found a little time to bring the changes up to emacs master with a few notes:
Thanks for working on this.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-21 13:00 ` Fejfighter
2021-06-21 13:31 ` Eli Zaretskii
@ 2021-06-21 22:43 ` Daniel Colascione
2021-07-24 13:39 ` Fejfighter
1 sibling, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-21 22:43 UTC (permalink / raw)
To: Fejfighter, Eli Zaretskii; +Cc: mail, emacs-devel
On 6/21/21 6:00 AM, Fejfighter wrote:
> Not wanting to see this drop, I found a little time to bring the
> changes up to emacs master with a few notes:
>
> 1) now sits on top of a4fb5811f (Do not attempt to write .elc....)
Awesome. Thanks!
> 2) The C will compile with a couple of warnings as the native comp
> cases are currently not handled.
> 3) There's dead code, commented code and other atrocities on top of
> the current wip
Yep. There were also plenty of atrocities in the original. :-)
> 3) It segfaults during the pdumper step in the build, there is not
> immediately obvious reason for this, but I suspect a move or free
> occurs and it's not tracked
> 4) I have ignored the comments for large_vector and large_vector_meta
> for now, so the meta is kept with the vector struct.
Yeah. That's probably a minor optimization, but we should get around to
completing it. The next big chunk of work is actually implementing
concurrent marking.
> code is up here:
> https://github.com/fejfighter/emacs/tree/feature/newgc
> <https://github.com/fejfighter/emacs/tree/feature/newgc>
> I'm hoping now that it's a little closer to master and compiling
> someone might have a little more luck with the segfault issue I have
> been facing but I will keep try in the mean time,
FWIW, I find rr [1] to be exceptionally useful for diagnosing segfaults
like this.
[1] https://rr-project.org/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-08 2:22 ` Eli Zaretskii
@ 2021-06-21 22:58 ` Daniel Colascione
2021-06-22 12:59 ` Eli Zaretskii
0 siblings, 1 reply; 16+ messages in thread
From: Daniel Colascione @ 2021-06-21 22:58 UTC (permalink / raw)
To: Eli Zaretskii, Daniele Nicolodi; +Cc: emacs-devel
On 6/7/21 7:22 PM, Eli Zaretskii wrote:
>> From: Daniele Nicolodi <daniele@grinta.net>
>> Date: Mon, 7 Jun 2021 21:51:45 +0200
>>
>>> We can definitely replace the existing GC with the new GC everywhere.
>>> I've designed the new GC to work on systems without virtual memory
>>> facilities. On these systems, we'll have to run the GC in
>>> non-concurrent, non-generational mode, but that's no regression from
>>> what we have today. We'll also probably want to use a smaller block size
>>> on these systems to reduce fragmentation overhead.
>> Isn't DOS the only system in this class? (It is not a rhetorical
>> question: a while ago I asked which systems are officially supports and
>> the answer was that all systems that currently run Emacs are supported).
>>
>> Does it make sense to still support DOS?
> The development environment which is used to build the MS-DOS port of
> Emacs (DJGPP) does support virtual memory (IIUC what that means in
> this context).
Oh, right. I completely forgot that we have DPMI.
It's been a very long time since I looked at that. Does DJGPP provide
DPMI 0.9 or 1.0?
To get generational GC under DJGPP, we'll need something like a SIGSEGV
handler, a bit of code that we run when the CPU signals a memory
protection fault. I think we get there by installing an exception
interrupt handler, as in
http://www.delorie.com/djgpp/doc/dpmi/ch4.5.html, and I think it'll work
in both DPMI 0.9 and 1.0. Another thing we need for generational GC is
the ability to mark a range of pages read-only, as with mprotect. I
think DPMI gives us the ability to change page permissions, but 0.9 does
not. See http://www.delorie.com/djgpp/doc/dpmi/api/310507.html
The other thing we get with VM is the ability to swap the from-space and
the to-space without an additional memory copy. DPMI 1.0 appears to
provide a shared memory facility that would let us do that (the
equivalent of mmap/MapViewOfFile of an anonymous segment), but I'm not
sure that DPMI 0.9 gives us that ability.
Anyway, even if it is theoretically possible to implement the new GC's
fancy VM stuff in terms of DPMI, I think it should have lower priority
than the rest of the system. The new GC run without virtual memory use
at all should still be no worse overall than the current GC, so MS-DOS
Emacs at least wouldn't see a regression if we switched to a version of
the new GC that didn't understand DPMI.
But DPMI support for the new GC would definitely be a fun retro
computing project.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-21 22:58 ` Daniel Colascione
@ 2021-06-22 12:59 ` Eli Zaretskii
0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2021-06-22 12:59 UTC (permalink / raw)
To: Daniel Colascione; +Cc: daniele, emacs-devel
> Cc: emacs-devel@gnu.org
> From: Daniel Colascione <dancol@dancol.org>
> Date: Mon, 21 Jun 2021 15:58:20 -0700
>
> > The development environment which is used to build the MS-DOS port of
> > Emacs (DJGPP) does support virtual memory (IIUC what that means in
> > this context).
>
> Oh, right. I completely forgot that we have DPMI.
>
> It's been a very long time since I looked at that. Does DJGPP provide
> DPMI 0.9 or 1.0?
The DPMI provider which comes with DJGPP supports DPMI 0.9 with some
extensions. (If one runs a DJGPP program on MS-Windows, one gets what
Windows provides instead, which is DPMI 0.9.)
> To get generational GC under DJGPP, we'll need something like a SIGSEGV
> handler, a bit of code that we run when the CPU signals a memory
> protection fault. I think we get there by installing an exception
> interrupt handler, as in
> http://www.delorie.com/djgpp/doc/dpmi/ch4.5.html, and I think it'll work
> in both DPMI 0.9 and 1.0.
Yes, this is supported.
> Another thing we need for generational GC is
> the ability to mark a range of pages read-only, as with mprotect. I
> think DPMI gives us the ability to change page permissions, but 0.9 does
> not. See http://www.delorie.com/djgpp/doc/dpmi/api/310507.html
DJGPP has mprotect. It indeed requires DPMI 1.0, but it is also one
of the extensions supported by the DPMI provider that comes with
DJGPP.
> The other thing we get with VM is the ability to swap the from-space and
> the to-space without an additional memory copy. DPMI 1.0 appears to
> provide a shared memory facility that would let us do that (the
> equivalent of mmap/MapViewOfFile of an anonymous segment), but I'm not
> sure that DPMI 0.9 gives us that ability.
Right, this requires DPMI 1.0.
> Anyway, even if it is theoretically possible to implement the new GC's
> fancy VM stuff in terms of DPMI, I think it should have lower priority
> than the rest of the system. The new GC run without virtual memory use
> at all should still be no worse overall than the current GC, so MS-DOS
> Emacs at least wouldn't see a regression if we switched to a version of
> the new GC that didn't understand DPMI.
Yes, definitely.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: New GC concept
2021-06-21 22:43 ` Daniel Colascione
@ 2021-07-24 13:39 ` Fejfighter
0 siblings, 0 replies; 16+ messages in thread
From: Fejfighter @ 2021-07-24 13:39 UTC (permalink / raw)
To: Daniel Colascione; +Cc: mail, Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1630 bytes --]
On Tue, Jun 22, 2021 at 8:43 AM Daniel Colascione <dancol@dancol.org> wrote:
> On 6/21/21 6:00 AM, Fejfighter wrote:
>
>
> > code is up here:
> > https://github.com/fejfighter/emacs/tree/feature/newgc
> > <https://github.com/fejfighter/emacs/tree/feature/newgc>
> > I'm hoping now that it's a little closer to master and compiling
> > someone might have a little more luck with the segfault issue I have
> > been facing but I will keep try in the mean time,
>
> FWIW, I find rr [1] to be exceptionally useful for diagnosing segfaults
> like this.
>
>
> [1] https://rr-project.org/
>
While I had no luck with rr, I did trace that particular issue to not
unprotecting memory, which got me through a little further to a point of a
compacting sweep.
This is where I have been spinning my wheels for the last few weeks in the
short bursts of time I get to look at this codebase.
The problem is highlighted by xxx_check_obarray, but will show up in future
reads, where the ob array is not swept and does not get the updated
references.
I feel like it might be a special case where a global vector has
references, because when marking we traverse the array as required, but
this does not occur when sweeping, however, it does not affect other
vector-like things.
I think I will need to update the obarray, but simply calling
`scan_vectorlike(XPNTR(Vobarray), GC_PHASE_SWEEP);` has bad values at that
point.
I'm hoping that this will either jog your memory and provide some
background or get someone curious that might have a different understanding
of how it all interacts and we can get over this particular hump,
Thanks,
Jeff W
[-- Attachment #2: Type: text/html, Size: 2479 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-07-24 13:39 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-04 3:30 New GC concept Daniel Colascione
2021-06-04 8:00 ` Daniel Mendler
2021-06-04 9:47 ` Daniel Colascione
2021-06-04 10:50 ` Eli Zaretskii
2021-06-21 13:00 ` Fejfighter
2021-06-21 13:31 ` Eli Zaretskii
2021-06-21 22:43 ` Daniel Colascione
2021-07-24 13:39 ` Fejfighter
2021-06-04 11:06 ` Daniel Mendler
2021-06-04 8:56 ` Andrea Corallo via Emacs development discussions.
2021-06-07 17:32 ` Matt Armstrong
2021-06-07 18:03 ` Daniel Colascione
2021-06-07 19:51 ` Daniele Nicolodi
2021-06-08 2:22 ` Eli Zaretskii
2021-06-21 22:58 ` Daniel Colascione
2021-06-22 12:59 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).