unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43564: cuirass: Contention while registering new builds.
@ 2020-09-22 16:52 Mathieu Othacehe
  2020-09-28  7:51 ` Mathieu Othacehe
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-09-22 16:52 UTC (permalink / raw)
  To: 43564

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]


Hello,

The attached screenshot shows that 9 evaluations are currently "In
progress" for "guix-master" specification.

Evaluations 16725 to 16738 are completed, 7 hours ago and 56 minutes ago
respectively. They are still shown as "In progress" because the
"register" phase in "build-packages" is not over. It is also possible to
see that new builds are registered for those evaluations, but very
slowly.

As each evaluation has to register around 50k builds, there might be
some sort of "writing" contention on the database, explaining the long
build registration time.

Thanks,

Mathieu
-- 
https://othacehe.org

[-- Attachment #2: cuirass.png --]
[-- Type: image/png, Size: 61101 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-09-22 16:52 bug#43564: cuirass: Contention while registering new builds Mathieu Othacehe
@ 2020-09-28  7:51 ` Mathieu Othacehe
  2020-09-28  8:17   ` Mathieu Othacehe
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-09-28  7:51 UTC (permalink / raw)
  To: 43564-done


Hello,

> As each evaluation has to register around 50k builds, there might be
> some sort of "writing" contention on the database, explaining the long
> build registration time.

Turns out, once an evaluation is over, new builds are registered. This
registration tries to insert a new build for each derivation returned by
the evaluation phase. If the new build does not add a new output, the
insertion query is then rollbacked. This means that there are at least
as many insertion queries as new derivations.

SQlite allows at most one writer at a time, and even though we are using WAL
mode, performing a lot of insertions will reduce the reading performances.
When multiple evaluations are performed in parallel, the large number of
concurrent insertion queries also causes contention.

Having 10 pending evaluations means that we are probably trying to
insert around 500.000 records concurrently. This is something that
SQLite does not seem to be designed for.

To avoid those issues, we need to check first in the "Outputs" table
which derivations are already registered. This means that most of the
insertion queries will be replaced by reading queries, that are much
less expensive and more suitable for Cuirass concurrent implementation.

In the best case of one pending evaluation, the registration duration is
reduced from 1800 seconds to 320 seconds. I think that the gain is way
larger when there are multiple pending evaluations.

Pushed a fix as 461e07e14e1c8013343c0a2cb26c0e022e10d5e4.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-09-28  7:51 ` Mathieu Othacehe
@ 2020-09-28  8:17   ` Mathieu Othacehe
  2020-09-29 15:02   ` Mathieu Othacehe
  2020-10-05 12:07   ` Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-09-28  8:17 UTC (permalink / raw)
  To: 43564


For future reference, here's a small addition. As I said, we are using
the Cuirass SQLite database in WAL mode. This means that insertion
queries are added to a separate cuirass.db-wal file. Once in a while,
this file is supposed to be merged in the main cuirass.db file.

This mechanism known as checkpointing is supposed to occur automatically
once 4MB of data are added to the WAL file. On berlin, the WAL file of
Cuirass is 9.4M and the Guix WAL file is almost 1G.

As reading from the WAL file is less effective, and checkpointing a
large file will end-up locking the database for some times, we should
definitely try to understand why the Guix WAL file get so big.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-09-28  7:51 ` Mathieu Othacehe
  2020-09-28  8:17   ` Mathieu Othacehe
@ 2020-09-29 15:02   ` Mathieu Othacehe
  2020-10-04 11:12     ` Mathieu Othacehe
  2020-10-05 12:07   ` Ludovic Courtès
  2 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-09-29 15:02 UTC (permalink / raw)
  To: 43564


Hello,

> In the best case of one pending evaluation, the registration duration is
> reduced from 1800 seconds to 320 seconds. I think that the gain is way
> larger when there are multiple pending evaluations.
>
> Pushed a fix as 461e07e14e1c8013343c0a2cb26c0e022e10d5e4.

While this improves the situation, another evaluation contention episode
occurred this afternoon (evaluations 16920 to 16938).

Even though reading queries are cheaper, I guess that when 10 fibers try
to execute 50k read queries, there's some contention in the
communication with the 4 database workers.

I'm trying locally to spawn database workers dedicated to registration,
that execute the evaluation queries in a single batch. Let's see if it
helps.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-09-29 15:02   ` Mathieu Othacehe
@ 2020-10-04 11:12     ` Mathieu Othacehe
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-10-04 11:12 UTC (permalink / raw)
  To: 43564


Hello,

> I'm trying locally to spawn database workers dedicated to registration,
> that execute the evaluation queries in a single batch. Let's see if it
> helps.

I added 4 registration dedicated database workers and removed from
registration all accesses to the store. Now the evaluation returns
objects that contain all required information.

This seem to improve the situation quite a lot. For example, those three
registrations were done simultaneously in ~200 seconds:

--8<---------------cut here---------------start------------->8---
2020-10-04T12:46:19 registration took 205.429772 seconds
2020-10-04T12:46:19 evaluation 17099 registered 156 new derivations
2020-10-04T12:46:19 building 156 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/156)
2020-10-04T12:46:19 registration took 212.892482 seconds
2020-10-04T12:46:19 evaluation 17105 registered 164 new derivations
2020-10-04T12:46:19 building 164 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/164)
2020-10-04T12:46:19 registration took 205.009416 seconds
2020-10-04T12:46:19 evaluation 17102 registered 160 new derivations
2020-10-04T12:46:19 building 160 derivations in batches of 200
2020-10-04T12:46:19 building batch of 200 derivations (0/160)
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-09-28  7:51 ` Mathieu Othacehe
  2020-09-28  8:17   ` Mathieu Othacehe
  2020-09-29 15:02   ` Mathieu Othacehe
@ 2020-10-05 12:07   ` Ludovic Courtès
  2020-10-05 13:09     ` Mathieu Othacehe
  2 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2020-10-05 12:07 UTC (permalink / raw)
  To: 43564; +Cc: othacehe

Hello!

Mathieu Othacehe <othacehe@gnu.org> skribis:

> Turns out, once an evaluation is over, new builds are registered. This
> registration tries to insert a new build for each derivation returned by
> the evaluation phase. If the new build does not add a new output, the
> insertion query is then rollbacked. This means that there are at least
> as many insertion queries as new derivations.

Isn’t that the real problem, that we’re doing one transaction per
derivation?

Ideally, upon evaluation completion, I’d expect us to create a single
transaction that registers all 50K derivations at once, no?

Thanks for the great improvements in this area!

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-10-05 12:07   ` Ludovic Courtès
@ 2020-10-05 13:09     ` Mathieu Othacehe
  2020-10-09  7:59       ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-10-05 13:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 43564


Hey Ludo,

> Isn’t that the real problem, that we’re doing one transaction per
> derivation?

Is it really better in term of performance to send batch of queries
within a single transaction? I haven't tried it yet.

I think that the real bottleneck was having N fibers fighting over 4
workers to execute large number of insertions. Having all those queries
done by workers dedicated to registration in a single pass seems to
improve a lot the situation.

Maybe having those registration workers posting unique transactions
containing all the queries would be even more beneficial.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#43564: cuirass: Contention while registering new builds.
  2020-10-05 13:09     ` Mathieu Othacehe
@ 2020-10-09  7:59       ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2020-10-09  7:59 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43564

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> Isn’t that the real problem, that we’re doing one transaction per
>> derivation?
>
> Is it really better in term of performance to send batch of queries
> within a single transaction? I haven't tried it yet.

I’m definitely not a database expert, but I think there’s overhead for
setting up a transaction, which we’re paying N times here.

> I think that the real bottleneck was having N fibers fighting over 4
> workers to execute large number of insertions. Having all those queries
> done by workers dedicated to registration in a single pass seems to
> improve a lot the situation.

Then that’s great, let’s profit!  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-09  8:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22 16:52 bug#43564: cuirass: Contention while registering new builds Mathieu Othacehe
2020-09-28  7:51 ` Mathieu Othacehe
2020-09-28  8:17   ` Mathieu Othacehe
2020-09-29 15:02   ` Mathieu Othacehe
2020-10-04 11:12     ` Mathieu Othacehe
2020-10-05 12:07   ` Ludovic Courtès
2020-10-05 13:09     ` Mathieu Othacehe
2020-10-09  7:59       ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).