* GC thread performance @ 2017-11-27 22:40 Hans Åberg 2017-11-27 23:05 ` Stefan Israelsson Tampe 0 siblings, 1 reply; 12+ messages in thread From: Hans Åberg @ 2017-11-27 22:40 UTC (permalink / raw) To: Guile User With the Boehm GC in Guile, does multithread allocations take longer time than the same amount in a single thread? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-11-27 22:40 GC thread performance Hans Åberg @ 2017-11-27 23:05 ` Stefan Israelsson Tampe 2017-11-27 23:13 ` Hans Åberg 0 siblings, 1 reply; 12+ messages in thread From: Stefan Israelsson Tampe @ 2017-11-27 23:05 UTC (permalink / raw) To: Hans Åberg There are lists of free memory kept in each thread so usually small allocation is one multithreaded large allocation and then a bunch of small allocation in thread consuming that large allocation. So for small memory segments I would say that the overhead probably is neglible. On Mon, Nov 27, 2017 at 11:40 PM, Hans Åberg <haberg-1@telia.com <mailto:haberg-1@telia.com>> wrote: With the Boehm GC in Guile, does multithread allocations take longer time than the same amount in a single thread? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-11-27 23:05 ` Stefan Israelsson Tampe @ 2017-11-27 23:13 ` Hans Åberg 2017-11-27 23:23 ` Marko Rauhamaa 0 siblings, 1 reply; 12+ messages in thread From: Hans Åberg @ 2017-11-27 23:13 UTC (permalink / raw) To: Stefan Israelsson Tampe; +Cc: Guile User > On 28 Nov 2017, at 00:05, Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote: > > There are lists of free memory kept in each thread so usually small allocation is one multithreaded large allocation and then > a bunch of small allocation in thread consuming that large allocation. So for small memory segments I would say that the > overhead probably is neglible. I saw overhead also for the small allocations, 20-30% maybe. This is in a program that makes a lot of allocations relative other computations. So that made me wonder about Guile. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-11-27 23:13 ` Hans Åberg @ 2017-11-27 23:23 ` Marko Rauhamaa 2017-11-27 23:44 ` Hans Åberg 0 siblings, 1 reply; 12+ messages in thread From: Marko Rauhamaa @ 2017-11-27 23:23 UTC (permalink / raw) To: Hans Åberg; +Cc: Guile User Hans Åberg <haberg-1@telia.com>: > I saw overhead also for the small allocations, 20-30% maybe. This is > in a program that makes a lot of allocations relative other > computations. So that made me wonder about Guile. I don't have an answer to your question although I would imagine GC cannot be effectively scaled across multiple threads. I generally steer clear of threads and stick to multiple processes where parallelism is needed. Marko ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-11-27 23:23 ` Marko Rauhamaa @ 2017-11-27 23:44 ` Hans Åberg 2017-12-01 19:49 ` Linas Vepstas 0 siblings, 1 reply; 12+ messages in thread From: Hans Åberg @ 2017-11-27 23:44 UTC (permalink / raw) To: Marko Rauhamaa; +Cc: Guile User > On 28 Nov 2017, at 00:23, Marko Rauhamaa <marko@pacujo.net> wrote: > > Hans Åberg <haberg-1@telia.com>: >> I saw overhead also for the small allocations, 20-30% maybe. This is >> in a program that makes a lot of allocations relative other >> computations. So that made me wonder about Guile. > > I don't have an answer to your question although I would imagine GC > cannot be effectively scaled across multiple threads. Or perhaps special implementation methods are required. > I generally steer clear of threads and stick to multiple processes where > parallelism is needed. Indeed, that was I did: just one crucial point. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-11-27 23:44 ` Hans Åberg @ 2017-12-01 19:49 ` Linas Vepstas 2017-12-01 23:23 ` Hans Åberg 2017-12-02 8:50 ` Marko Rauhamaa 0 siblings, 2 replies; 12+ messages in thread From: Linas Vepstas @ 2017-12-01 19:49 UTC (permalink / raw) To: Hans Åberg; +Cc: Guile User On Mon, Nov 27, 2017 at 5:44 PM, Hans Åberg <haberg-1@telia.com> wrote: > > > > On 28 Nov 2017, at 00:23, Marko Rauhamaa <marko@pacujo.net> wrote: > > > > Hans Åberg <haberg-1@telia.com>: > >> I saw overhead also for the small allocations, 20-30% maybe. This is > >> in a program that makes a lot of allocations relative other > >> computations. So that made me wonder about Guile. > I cannot speak to GC, but I freuqently encounter situations in guile where using the parallel constructs e.g. par-for-each, end up running slower than the single-threaded version. For example, using 2 or 3 threads, I get a 2x and 3x speedup, great, but using 4x gives a slowdown, often 10x slower than single-threaded. I try to make sure that the insides of the loop are large and long-running, so that the cost of creating and scheduling threads is inconsequential. I have not attempted to determine the cause of this, but basically, that entire subsystem needs a careful review and measurement. --linas -- *"The problem is not that artificial intelligence will get too smart and take over the world," computer scientist Pedro Domingos writes, "the problem is that it's too stupid and already has." * ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-01 19:49 ` Linas Vepstas @ 2017-12-01 23:23 ` Hans Åberg 2017-12-02 8:50 ` Marko Rauhamaa 1 sibling, 0 replies; 12+ messages in thread From: Hans Åberg @ 2017-12-01 23:23 UTC (permalink / raw) To: linasvepstas; +Cc: Guile User > On 1 Dec 2017, at 20:49, Linas Vepstas <linasvepstas@gmail.com> wrote: > > On Mon, Nov 27, 2017 at 5:44 PM, Hans Åberg <haberg-1@telia.com> wrote: > >> >> >>> On 28 Nov 2017, at 00:23, Marko Rauhamaa <marko@pacujo.net> wrote: >>> >>> Hans Åberg <haberg-1@telia.com>: >>>> I saw overhead also for the small allocations, 20-30% maybe. This is >>>> in a program that makes a lot of allocations relative other >>>> computations. So that made me wonder about Guile. >> > > I cannot speak to GC, but I freuqently encounter situations in guile where > using the parallel constructs e.g. par-for-each, end up running slower than > the single-threaded version. For example, using 2 or 3 threads, I get a 2x > and 3x speedup, great, but using 4x gives a slowdown, often 10x slower than > single-threaded. I try to make sure that the insides of the loop are large > and long-running, so that the cost of creating and scheduling threads is > inconsequential. In the program I tested it depends on what I run, but one pattern is that user time goes up about two times when approaching the hardware concurrency, and then stabilizes. System time jumps up thirty times when going from one to two threads, and then further doubles when approaching the hardware concurrency. > I have not attempted to determine the cause of this, but basically, that > entire subsystem needs a careful review and measurement. The GC also collects outside the main thread, so if that part takes up too much system resources, the program itself might too slow down by that. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-01 19:49 ` Linas Vepstas 2017-12-01 23:23 ` Hans Åberg @ 2017-12-02 8:50 ` Marko Rauhamaa 2017-12-02 14:20 ` Hans Åberg 2017-12-02 14:30 ` Chris Vine 1 sibling, 2 replies; 12+ messages in thread From: Marko Rauhamaa @ 2017-12-02 8:50 UTC (permalink / raw) To: Linas Vepstas; +Cc: Guile User, Hans Åberg Linas Vepstas <linasvepstas@gmail.com>: > I cannot speak to GC, but I freuqently encounter situations in guile > where using the parallel constructs e.g. par-for-each, end up running > slower than the single-threaded version. For example, using 2 or 3 > threads, I get a 2x and 3x speedup, great, but using 4x gives a > slowdown, often 10x slower than single-threaded. I try to make sure > that the insides of the loop are large and long-running, so that the > cost of creating and scheduling threads is inconsequential. > > I have not attempted to determine the cause of this, but basically, > that entire subsystem needs a careful review and measurement. I'll have to speculate, too. Guile guarantees the consistency of the data model under all circumstances. Bad synchronization between threads is allowed to cause unspecified behavior, but it should never trigger a SIGSEGV. In practice, that means excessive locking: all data access needs to take place in a critical section. Linux (at least) has locking primitives that have virtually no overhead when there is no contention. However, when there is contention, the context is switched to the kernel and a memory barrier causes the CPU hardware to flush its memory cache. The same issue hampers C developers as well, although the newest C standards explicitly shift to the coder the responsibility for the consistency of the data model. Thus, in principle, a C programmer can try clever tricks to eke out performance with multithreading. CPython has famously grappled with the same performance issues. The so-called GIL effectively turns every Python program into a single-threaded process. The point of multithreading is not a gain in performance. Rather, it is a programming paradigm to multiplex events. These and other issues with multithreading have caused many of us to reject multithreading as a paradigm altogether. Use event-driven programming for multiplexing and multiprocessing for performance. Marko ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-02 8:50 ` Marko Rauhamaa @ 2017-12-02 14:20 ` Hans Åberg 2017-12-03 22:15 ` Arne Babenhauserheide 2017-12-02 14:30 ` Chris Vine 1 sibling, 1 reply; 12+ messages in thread From: Hans Åberg @ 2017-12-02 14:20 UTC (permalink / raw) To: Marko Rauhamaa; +Cc: Guile User > On 2 Dec 2017, at 09:50, Marko Rauhamaa <marko@pacujo.net> wrote: > > Guile guarantees the consistency of the data model under all > circumstances. Bad synchronization between threads is allowed to cause > unspecified behavior, but it should never trigger a SIGSEGV. In > practice, that means excessive locking: all data access needs to take > place in a critical section. > These and other issues with multithreading have caused many of us to > reject multithreading as a paradigm altogether. Use event-driven > programming for multiplexing and multiprocessing for performance. I see the the expected behavior when turing off the GC altogether, using malloc and free without cleanup: I third of the time on one thread, decreasing up to hardware concurrency. The Boehm GC sets a lock around all allocations and collections [1], so it might be such excessive locking that slows it down. That pages seems old: I could not see a difference when choosing the options at [2]. 1. http://www.hboehm.info/gc/scale.html 2. http://www.hboehm.info/gc/simple_example.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-02 14:20 ` Hans Åberg @ 2017-12-03 22:15 ` Arne Babenhauserheide 2017-12-04 9:38 ` Hans Åberg 0 siblings, 1 reply; 12+ messages in thread From: Arne Babenhauserheide @ 2017-12-03 22:15 UTC (permalink / raw) To: Hans Åberg; +Cc: Guile User [-- Attachment #1: Type: text/plain, Size: 576 bytes --] Hans Åberg <haberg-1@telia.com> writes: > I see the the expected behavior when turing off the GC altogether, > using malloc and free without cleanup: I third of the time on one > thread, decreasing up to hardware concurrency. I also saw that - I think we tracked it down to the GC heuristics miscalculating when to run and running either too often or too seldomly. I think we didn't get to actually fixing the underlying issue, though we found that it should be possible. Best wishes, Arne -- Unpolitisch sein heißt politisch sein ohne es zu merken [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-03 22:15 ` Arne Babenhauserheide @ 2017-12-04 9:38 ` Hans Åberg 0 siblings, 0 replies; 12+ messages in thread From: Hans Åberg @ 2017-12-04 9:38 UTC (permalink / raw) To: Arne Babenhauserheide; +Cc: Guile User > On 3 Dec 2017, at 23:15, Arne Babenhauserheide <arne_bab@web.de> wrote: > > Hans Åberg <haberg-1@telia.com> writes: > >> I see the the expected behavior when turing off the GC altogether, >> using malloc and free without cleanup: I third of the time on one >> thread, decreasing up to hardware concurrency. > > I also saw that - I think we tracked it down to the GC heuristics > miscalculating when to run and running either too often or too seldomly. I looked at the GC interface for way to make collections more rare, but could not find that. > I think we didn't get to actually fixing the underlying issue, though we > found that it should be possible. The GC mailing list server is down, and has been that for some time it seems, so it is unclear about the state of the package. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GC thread performance 2017-12-02 8:50 ` Marko Rauhamaa 2017-12-02 14:20 ` Hans Åberg @ 2017-12-02 14:30 ` Chris Vine 1 sibling, 0 replies; 12+ messages in thread From: Chris Vine @ 2017-12-02 14:30 UTC (permalink / raw) To: guile-user On Sat, 02 Dec 2017 10:50:29 +0200 Marko Rauhamaa <marko@pacujo.net> wrote: > Linas Vepstas <linasvepstas@gmail.com>: > > I cannot speak to GC, but I freuqently encounter situations in guile > > where using the parallel constructs e.g. par-for-each, end up > > running slower than the single-threaded version. For example, using > > 2 or 3 threads, I get a 2x and 3x speedup, great, but using 4x > > gives a slowdown, often 10x slower than single-threaded. I try to > > make sure that the insides of the loop are large and long-running, > > so that the cost of creating and scheduling threads is > > inconsequential. > > > > I have not attempted to determine the cause of this, but basically, > > that entire subsystem needs a careful review and measurement. > > I'll have to speculate, too. > > Guile guarantees the consistency of the data model under all > circumstances. Bad synchronization between threads is allowed to cause > unspecified behavior, but it should never trigger a SIGSEGV. In > practice, that means excessive locking: all data access needs to take > place in a critical section. If you mean that you believe guile carries out significant and excessive locking to maintain the invariants of its containers and other SCM objects, I do not think that is right. I don't think guile generally does use locking for that: instead it uses the fact that SCM objects are stored in a type of size uintptr_t and aligned on 8-byte boundaries to enable pointer tagging (see the file tags.h in the source distribution). Guile can therefore leverage the fact that on any platform supported by guile threads, native pointer/integer types aligned on their natural size boundary are atomic, with in C11 terms relaxed (ie unsynchronized) memory ordering. guile is not going to carry out CAS-style or acquire/release synchronisation for you: a SCM object will be in a valid state but that state may be completely different from the one you expect if you don't synchronize in your own code. Guile may of course lock its own internal global non-fluid data, but that is a different point from the one I think you are making. As I recall, Andy Wingo was reporting in relation to his fibers library that it started to slow with more than about 8 native threads running. As I recall, after 8 native threads, significant improvements in speed failed to occur even with more than 8 processors. You are also likely to get a slow down if you run more native threads than you have processors. Chris ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-12-04 9:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-27 22:40 GC thread performance Hans Åberg 2017-11-27 23:05 ` Stefan Israelsson Tampe 2017-11-27 23:13 ` Hans Åberg 2017-11-27 23:23 ` Marko Rauhamaa 2017-11-27 23:44 ` Hans Åberg 2017-12-01 19:49 ` Linas Vepstas 2017-12-01 23:23 ` Hans Åberg 2017-12-02 8:50 ` Marko Rauhamaa 2017-12-02 14:20 ` Hans Åberg 2017-12-03 22:15 ` Arne Babenhauserheide 2017-12-04 9:38 ` Hans Åberg 2017-12-02 14:30 ` Chris Vine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).