* pdumper's performance @ 2018-08-29 20:29 Stefan Monnier 2018-08-29 22:10 ` Daniel Colascione 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2018-08-29 20:29 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel I was looking at the pdumper and one thing I was wondering is how to use of all the <foo>_marked_p and set_<foo>_marked functions impacts GC (and hence) runtime performance. I mean the fact that they're functions is perfectly fine, but the fact that they need to test pdumper_object_p might have a measurable impact, since I believe these operations are performed very many times per GC. Also I don't quite understand why this is needed: IIUC the markbits of pdump'd objects are stored elsewhere, but I don't understand why that needs to be the case. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-08-29 20:29 pdumper's performance Stefan Monnier @ 2018-08-29 22:10 ` Daniel Colascione 2018-08-30 2:14 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Daniel Colascione @ 2018-08-29 22:10 UTC (permalink / raw) To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel > I was looking at the pdumper and one thing I was wondering is how to use > of all the <foo>_marked_p and set_<foo>_marked functions impacts GC (and > hence) runtime performance. > > I mean the fact that they're functions is perfectly fine, but the fact > that they need to test pdumper_object_p might have a measurable impact, > since I believe these operations are performed very many times per GC. The cache line that holds that the dump region bounds ends up being very hot, so testing it isn't really that expensive. You can see for yourself whether there's an impact. Compile an Emacs with support for both pdumper and unexec, dump it with unexec, and compare its GC performance to Emacs built without support for pdumper and also dumped with unexec. As I recall, the difference is minimal. Besides, you get performance back at the end: the dump region's mark bits are stored contiguously and we can clear them very quickly at the end of GC. > Also I don't quite understand why this is needed: IIUC the markbits of > pdump'd objects are stored elsewhere, but I don't understand why that > needs to be the case. Because we don't store dumped objects in blocks and so the calculations of the normal locations of their mark bits would be wrong. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-08-29 22:10 ` Daniel Colascione @ 2018-08-30 2:14 ` Stefan Monnier 2018-08-30 5:19 ` Daniel Colascione 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2018-08-30 2:14 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel Thanks Daniel for your prompt response. I have some further questions, tho. > You can see for yourself whether there's an impact. Compile an Emacs with > support for both pdumper and unexec, dump it with unexec, and compare its > GC performance to Emacs built without support for pdumper and also dumped > with unexec. I hoping to save myself the time ;-) [ BTW, part of the reason for those questions is that I'm writing an article about the history of Elisp, and I'd like to understand how your code works so I can say something intelligent about it. Oh and there's not much time left before the deadline. Another part of course, is that I'd like to see this feature land on master. ] > As I recall, the difference is minimal. Do you recall the tests you used and the ballpark of the difference? >> Also I don't quite understand why this is needed: IIUC the markbits of >> pdump'd objects are stored elsewhere, but I don't understand why that >> needs to be the case. > Because we don't store dumped objects in blocks and so the calculations of > the normal locations of their mark bits would be wrong. Hmm... OK that could explain it for conses and floats where we keep the markbits separately from the objects in bitmaps alongside those blocs, but you also have those <foo>_marked_p and set_<foo>_marked functions for all other types of objects where the markbit is normally stored within the object itself (i.e. it doesn't matter whether they're in blocks or not). Why did you choose to use a completely different layout for the objects loaded from the dump? I naively thought your code would take cons_blocks, symbol_blocks, ... and write those blocks as-is so objects keep the same layout, and things like mark_maybe_object don't need to be changed at all. I understand this would end up writing larger dumps (since they would include some free objects), but I'd have expected it would lead to simpler code and a smaller patch. What am I missing? Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-08-30 2:14 ` Stefan Monnier @ 2018-08-30 5:19 ` Daniel Colascione 2018-09-04 16:26 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Daniel Colascione @ 2018-08-30 5:19 UTC (permalink / raw) To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel > Thanks Daniel for your prompt response. I have some further questions, > tho. > >> You can see for yourself whether there's an impact. Compile an Emacs >> with >> support for both pdumper and unexec, dump it with unexec, and compare >> its >> GC performance to Emacs built without support for pdumper and also >> dumped >> with unexec. > > I hoping to save myself the time ;-) > [ BTW, part of the reason for those questions is that I'm writing an > article about the history of Elisp, and I'd like to understand how > your code works so I can say something intelligent about it. > Oh and there's not much time left before the deadline. Cool. > Another part of course, is that I'd like to see this feature land > on master. ] Me too. ;-) > >> As I recall, the difference is minimal. > > Do you recall the tests you used and the ballpark of the difference? Exactly the above. IIRC, the difference amounted to a millisecond or two on an emacs -Q startup plus an immediate (garbage-collect) --- but that's without the no-relocation optimization below. >>> Also I don't quite understand why this is needed: IIUC the markbits of >>> pdump'd objects are stored elsewhere, but I don't understand why that >>> needs to be the case. >> Because we don't store dumped objects in blocks and so the calculations >> of >> the normal locations of their mark bits would be wrong. > > Hmm... OK that could explain it for conses and floats where we keep the > markbits separately from the objects in bitmaps alongside those blocs, > but you also have those <foo>_marked_p and set_<foo>_marked functions for > all other types of objects where the markbit is normally stored within > the object itself (i.e. it doesn't matter whether they're in blocks or > not). > > Why did you choose to use a completely different layout for the objects > loaded from the dump? The objects themselves have the same layout that they do in the normal heap. (The layout of a cons cell is unchanged, for example.) Dumping objects individually instead of in blocks both simplifies the implementation and allows for a more compact dump, as you point out below. > I naively thought your code would take > cons_blocks, symbol_blocks, ... and write those blocks as-is so objects > keep the same layout, and things like mark_maybe_object don't need to be > changed at all. I understand this would end up writing larger dumps > (since they would include some free objects), but I'd have expected it > would lead to simpler code and a smaller patch. If we keep the mark bits out of the objects, we can avoid modifying the object pages just for GC. In the non-PIC case, in which in principle we don't have to relocate the dump, that means that the pages in the dump stay clean and file-backed, not dirty, COWed, and pagefile-backed as they would if we banged on them just for the GC. That's an efficiency win. For a future more-efficient GC, contiguous object storage with external mark bits is probably the way to go for the entire heap. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-08-30 5:19 ` Daniel Colascione @ 2018-09-04 16:26 ` Stefan Monnier 2018-09-04 16:42 ` Daniel Colascione 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2018-09-04 16:26 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel > Dumping objects individually instead of in blocks both simplifies the > implementation and allows for a more compact dump, as you point > out below. I understand the compactness argument, but I'm surprised it makes the implementation simpler. I'll trust you on that, tho. >> I naively thought your code would take >> cons_blocks, symbol_blocks, ... and write those blocks as-is so objects >> keep the same layout, and things like mark_maybe_object don't need to be >> changed at all. I understand this would end up writing larger dumps >> (since they would include some free objects), but I'd have expected it >> would lead to simpler code and a smaller patch. > > If we keep the mark bits out of the objects, we can avoid modifying the > object pages just for GC. In the non-PIC case, in which in principle we > don't have to relocate the dump, that means that the pages in the dump > stay clean and file-backed, not dirty, COWed, and pagefile-backed as they > would if we banged on them just for the GC. That's an efficiency win. I kind of see the benefit, but: - unlike the purespace, the dumped&restored heap is not read-only, so even with the markbits elsewhere the pages will generally not all stay clean. - the benefit you mention only applies to the case where no-relocation was needed. And I suspect that in practice keeping those pages "clean and file-backed" will very rarely make a noticeable difference anyway. - I like the idea of keeping markbits separately (XEmacs does the same, IIUC, for the needs of their incremental GC where they want to mark pages as read-only to catch writes from the mutator, while still letting the collector twiddle the mark bits). But I think this should be a separate patch and should apply to all objects rather than only the pdump'd ones. > For a future more-efficient GC, contiguous object storage with external > mark bits is probably the way to go for the entire heap. Agreed, obviously [ not only because of the text above, but also because I implemented the separate markbits for floats and cons cells ;-) ] Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 16:26 ` Stefan Monnier @ 2018-09-04 16:42 ` Daniel Colascione 2018-09-04 19:30 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Daniel Colascione @ 2018-09-04 16:42 UTC (permalink / raw) To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel > letting the collector twiddle the mark bits). But I think this should > be a separate patch and should apply to all objects rather than only > the pdump'd ones. I'm not interested in replicating the cons block structure in the dump image, especially when separating markbits is the right thing to do anyway. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 16:42 ` Daniel Colascione @ 2018-09-04 19:30 ` Stefan Monnier 2018-09-04 19:35 ` Daniel Colascione 0 siblings, 1 reply; 12+ messages in thread From: Stefan Monnier @ 2018-09-04 19:30 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel >> letting the collector twiddle the mark bits). But I think this should >> be a separate patch and should apply to all objects rather than only >> the pdump'd ones. > I'm not interested in replicating the cons block structure in the dump > image, especially when separating markbits is the right thing to do > anyway. I'm more concerned about the non-cons non-float objects. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 19:30 ` Stefan Monnier @ 2018-09-04 19:35 ` Daniel Colascione 2018-09-04 20:58 ` Stefan Monnier 0 siblings, 1 reply; 12+ messages in thread From: Daniel Colascione @ 2018-09-04 19:35 UTC (permalink / raw) To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel >>> letting the collector twiddle the mark bits). But I think this >>> should >>> be a separate patch and should apply to all objects rather than only >>> the pdump'd ones. >> I'm not interested in replicating the cons block structure in the dump >> image, especially when separating markbits is the right thing to do >> anyway. > > I'm more concerned about the non-cons non-float objects. Try it and see before deciding that there's a performance problem. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 19:35 ` Daniel Colascione @ 2018-09-04 20:58 ` Stefan Monnier 2018-09-04 21:20 ` Daniel Colascione 2018-09-04 21:33 ` Stefan Monnier 0 siblings, 2 replies; 12+ messages in thread From: Stefan Monnier @ 2018-09-04 20:58 UTC (permalink / raw) To: Daniel Colascione; +Cc: emacs-devel > Try it and see before deciding that there's a performance problem. Hm... there's a misunderstanding: the issue is not one of performance problem, but of separation of concerns. Moving the markbits elsewhere should be done in a separate patch (before or after the pdumper patch). Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 20:58 ` Stefan Monnier @ 2018-09-04 21:20 ` Daniel Colascione 2018-09-04 21:49 ` Stefan Monnier 2018-09-04 21:33 ` Stefan Monnier 1 sibling, 1 reply; 12+ messages in thread From: Daniel Colascione @ 2018-09-04 21:20 UTC (permalink / raw) To: Stefan Monnier; +Cc: Daniel Colascione, emacs-devel >> Try it and see before deciding that there's a performance problem. > > Hm... there's a misunderstanding: the issue is not one of performance > problem, but of separation of concerns. Moving the markbits elsewhere > should be done in a separate patch (before or after the pdumper patch). There is no separation-of-concerns problem. The mark bit thing isn't some unrelated bonus feature of pdumper. It's how the dump image *works*. To make pdumper work differently, and worse, merely to create some kind of fictitious separation between its component parts strikes me as a bad idea and unnecessary work. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 21:20 ` Daniel Colascione @ 2018-09-04 21:49 ` Stefan Monnier 0 siblings, 0 replies; 12+ messages in thread From: Stefan Monnier @ 2018-09-04 21:49 UTC (permalink / raw) To: emacs-devel >> Hm... there's a misunderstanding: the issue is not one of performance >> problem, but of separation of concerns. Moving the markbits elsewhere >> should be done in a separate patch (before or after the pdumper patch). > There is no separation-of-concerns problem. The mark bit thing isn't some > unrelated bonus feature of pdumper. Then I don't understand yet: could you explain why pdumper needs to put the markbits elsewhere for pdump'd objects than for non-pdump'd objects? Since the layout of the markbits was designed without thinking of the pdumper case, I can imagine that maybe the current placement of markbits is inconvenient for the pdumper case (tho I don't actually which part is inconvenient). But surely there should be a way to layout the markbits such that it's convenient both for pdump'd and the non-pdump'd objects. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: pdumper's performance 2018-09-04 20:58 ` Stefan Monnier 2018-09-04 21:20 ` Daniel Colascione @ 2018-09-04 21:33 ` Stefan Monnier 1 sibling, 0 replies; 12+ messages in thread From: Stefan Monnier @ 2018-09-04 21:33 UTC (permalink / raw) To: emacs-devel >> Try it and see before deciding that there's a performance problem. > Hm... there's a misunderstanding: the issue is not one of performance > problem, but of separation of concerns. Moving the markbits elsewhere > should be done in a separate patch (before or after the pdumper patch). I now realize that indeed, the subject says "performance" (and that I'm the one who put this word there), but I actually haven't looked at the resulting performance and since you said there was no perceptible difference, I assume it's a non-issue. Stefan ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-09-04 21:49 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-29 20:29 pdumper's performance Stefan Monnier 2018-08-29 22:10 ` Daniel Colascione 2018-08-30 2:14 ` Stefan Monnier 2018-08-30 5:19 ` Daniel Colascione 2018-09-04 16:26 ` Stefan Monnier 2018-09-04 16:42 ` Daniel Colascione 2018-09-04 19:30 ` Stefan Monnier 2018-09-04 19:35 ` Daniel Colascione 2018-09-04 20:58 ` Stefan Monnier 2018-09-04 21:20 ` Daniel Colascione 2018-09-04 21:49 ` Stefan Monnier 2018-09-04 21:33 ` Stefan Monnier
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.