unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#43850: cuirass: inconsistent SQL queries execution time.
@ 2020-10-07 16:04 Mathieu Othacehe
  2020-10-14 16:53 ` Mathieu Othacehe
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-07 16:04 UTC (permalink / raw)
  To: 43850


Hello,

Over the last few weeks I made sure that all Cuirass SQL queries were
using indexes. As the "Builds" and "Outputs" tables can be really large,
having queries covered by indexes is imperative for consistent queries
duration.

However, I observed that some queries have inconsistent duration.

--8<---------------cut here---------------start------------->8---
2020-10-07T17:59:09 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:13 GET /build/3183151/details
2020-10-07T17:59:13 GET /eval/92
2020-10-07T17:59:14 builds request took 3.66e-4 seconds
2020-10-07T17:59:14 Database worker unresponsive for 5 seconds (db-get-builds-max).
2020-10-07T17:59:15 GET /eval/13234
2020-10-07T17:59:16 GET /build/3146487/details
2020-10-07T17:59:19 Database worker unresponsive for 5 seconds (db-get-builds-max).
--8<---------------cut here---------------end--------------->8---

"db-get-builds-max" query has the following query plan:

--8<---------------cut here---------------start------------->8---
QUERY PLAN
|--SEARCH TABLE Builds USING INDEX Builds_stoptime_id (stoptime=?)
`--SCALAR SUBQUERY 1
   `--SEARCH TABLE Builds USING INDEX Builds_evaluation_index (evaluation=?)
--8<---------------cut here---------------end--------------->8---

so it should always reasonably fast. The log seem to indicate that the
worker running one of those queries is busy for more than 10 seconds
here. My understanding of SQLite WAL mode is that writers should not
block readers. So I don't get why this query can be intermittently so
slow.

Thanks,

Mathieu
-- 
https://othacehe.org




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-07 16:04 bug#43850: cuirass: inconsistent SQL queries execution time Mathieu Othacehe
@ 2020-10-14 16:53 ` Mathieu Othacehe
  2020-10-20 11:45   ` Mathieu Othacehe
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-14 16:53 UTC (permalink / raw)
  To: 43850


Hello,

I pushed and deployed several patches that:

- update metrics in a single transaction
- register builds in a single transaction
- use a single write database worker, queuing queries and submitting
them by batches (in a single transaction).
- optimize some SQLite parameters (decrease WAL size, use more RAM
caching)

this should improve the situation, even if I still observe some
inconsistent execution duration.

I still have a few improvement ideas such as pre-allocating a large
database file to decrease fragmentation and running periodic vacuums.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-14 16:53 ` Mathieu Othacehe
@ 2020-10-20 11:45   ` Mathieu Othacehe
  2020-10-22 11:49     ` Mathieu Othacehe
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-20 11:45 UTC (permalink / raw)
  To: 43850


Hello,

> this should improve the situation, even if I still observe some
> inconsistent execution duration.

I tried to use the two following pragma:

--8<---------------cut here---------------start------------->8---
PRAGMA synchronous = OFF
PRAGMA mmap_size = 10737418240
--8<---------------cut here---------------end--------------->8---

hoping that mmaping the Cuirass database and disabling synchronisation
would help, but there were still inconsistent duration.

I have now copied the database to a tmpfs mounted directory to make sure
that those inconsistent duration are only caused by the I/O pressure on
berlin.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-20 11:45   ` Mathieu Othacehe
@ 2020-10-22 11:49     ` Mathieu Othacehe
  2020-10-26 22:59       ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-22 11:49 UTC (permalink / raw)
  To: 43850


Hello,

> I have now copied the database to a tmpfs mounted directory to make sure
> that those inconsistent duration are only caused by the I/O pressure on
> berlin.

This helps a lot. The Cuirass web service has been running smooth since
two days, without any inconsistent query times.

I'm considering using a tmpfs backed database for good. The problem is
that we would need a save/restore mechanism in case Berlin
reboots.

WDYT?

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-22 11:49     ` Mathieu Othacehe
@ 2020-10-26 22:59       ` Ludovic Courtès
  2020-10-27  9:28         ` Christopher Baines
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2020-10-26 22:59 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43850

Hi,

Mathieu Othacehe <othacehe@gnu.org> skribis:

>> I have now copied the database to a tmpfs mounted directory to make sure
>> that those inconsistent duration are only caused by the I/O pressure on
>> berlin.
>
> This helps a lot. The Cuirass web service has been running smooth since
> two days, without any inconsistent query times.

Interesting.

> I'm considering using a tmpfs backed database for good. The problem is
> that we would need a save/restore mechanism in case Berlin
> reboots.

Hmm sounds risky, no?

I wonder if we could instead ensure no I/O-intensive workload runs that
machine.  I’m thinking in particular of the derivations that produce
ISO/qcow images that are not offloaded but maybe should.

WDYT?  Do you think that’d be enough?  Or is tmpfs our only hope?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-26 22:59       ` Ludovic Courtès
@ 2020-10-27  9:28         ` Christopher Baines
  2020-10-27 14:10           ` Mathieu Othacehe
  0 siblings, 1 reply; 11+ messages in thread
From: Christopher Baines @ 2020-10-27  9:28 UTC (permalink / raw)
  To: Ludovic Courtès, Mathieu Othacehe; +Cc: 43850

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>>> I have now copied the database to a tmpfs mounted directory to make sure
>>> that those inconsistent duration are only caused by the I/O pressure on
>>> berlin.
>>
>> This helps a lot. The Cuirass web service has been running smooth since
>> two days, without any inconsistent query times.
>
> Interesting.
>
>> I'm considering using a tmpfs backed database for good. The problem is
>> that we would need a save/restore mechanism in case Berlin
>> reboots.
>
> Hmm sounds risky, no?
>
> I wonder if we could instead ensure no I/O-intensive workload runs that
> machine.  I’m thinking in particular of the derivations that produce
> ISO/qcow images that are not offloaded but maybe should.
>
> WDYT?  Do you think that’d be enough?  Or is tmpfs our only hope?

I think Ricardo mentioned that the machine running Cuirass uses an SSD
for the root filesystem, so moving the database there may help?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-27  9:28         ` Christopher Baines
@ 2020-10-27 14:10           ` Mathieu Othacehe
  2020-10-27 18:11             ` Mathieu Othacehe
  2020-10-31  4:33             ` Maxim Cournoyer
  0 siblings, 2 replies; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-27 14:10 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 43850


Hello Chris,

> I think Ricardo mentioned that the machine running Cuirass uses an SSD
> for the root filesystem, so moving the database there may help?

Looks like the database was already on the SSD before my tmpfs
experiment.

--8<---------------cut here---------------start------------->8---
mathieu@berlin ~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
none             95G     0   95G   0% /dev
/dev/sda1       916G  321G  549G  37% /
/dev/sdb1        37T   34T  2.6T  94% /gnu
tmpfs            95G  8.0K   95G   1% /dev/shm
tmpfs            10G  2.4G  7.7G  24% /var/lib/cuirass_tmpfs
--8<---------------cut here---------------end--------------->8---

I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-27 14:10           ` Mathieu Othacehe
@ 2020-10-27 18:11             ` Mathieu Othacehe
  2021-03-25 12:46               ` Mathieu Othacehe
  2020-10-31  4:33             ` Maxim Cournoyer
  1 sibling, 1 reply; 11+ messages in thread
From: Mathieu Othacehe @ 2020-10-27 18:11 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 43850


> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.

Turns out /tmp is mounted on /dev/sda, so all the building and ISO
production are first written on /dev/sda before being copied to the
store in /dev/sdb.

Reducing the build activity of berlin, as Ludo proposed should help
then.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-27 14:10           ` Mathieu Othacehe
  2020-10-27 18:11             ` Mathieu Othacehe
@ 2020-10-31  4:33             ` Maxim Cournoyer
  2020-11-01 18:23               ` Mathieu Othacehe
  1 sibling, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2020-10-31  4:33 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 43850

Hello,

Mathieu Othacehe <othacehe@gnu.org> writes:

> Hello Chris,
>
>> I think Ricardo mentioned that the machine running Cuirass uses an SSD
>> for the root filesystem, so moving the database there may help?
>
> Looks like the database was already on the SSD before my tmpfs
> experiment.
>
> mathieu@berlin ~$ df -h
> Filesystem      Size  Used Avail Use% Mounted on
> none             95G     0   95G   0% /dev
> /dev/sda1       916G  321G  549G  37% /
> /dev/sdb1        37T   34T  2.6T  94% /gnu
> tmpfs            95G  8.0K   95G   1% /dev/shm
> tmpfs            10G  2.4G  7.7G  24% /var/lib/cuirass_tmpfs
>
> I don't really get why I/O pressure on /dev/sdb could impact /dev/sda.
>
> Thanks,
>
> Mathieu

As an aside, running

--8<---------------cut here---------------start------------->8---
sudo sqlite3 /var/guix/db/db.sqlite vacuum
--8<---------------cut here---------------end--------------->8---

shaved off some 40 Mb from my large database file:

-rw-r--r-- 1 root root 468889600 Oct 31 00:16 db.sqlite
-rw-r--r-- 1 root root 510648320 Oct 28 23:36 db.sqlite.bak

Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
key places (where lots of data gets removed from the DB).  There's also
the auto_vacuum PRAGMA, which is not enabled currently:

--8<---------------cut here---------------start------------->8---
sqlite3 /var/guix/db/db.sqlite 'pragma auto_vacuum'
0
--8<---------------cut here---------------end--------------->8---

But the later doesn't necessarily sound like a good idea:

   Note, however, that auto-vacuum only truncates the freelist pages from
   the file. Auto-vacuum does not defragment the database nor repack
   individual database pages the way that the VACUUM command does. In fact,
   because it moves pages around within the file, auto-vacuum can actually
   make fragmentation worse. [0]

[0]:  https://www.sqlite.org/pragma.html#pragma_auto_vacuum

Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-31  4:33             ` Maxim Cournoyer
@ 2020-11-01 18:23               ` Mathieu Othacehe
  0 siblings, 0 replies; 11+ messages in thread
From: Mathieu Othacehe @ 2020-11-01 18:23 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 43850


Hey Maxim,

> Perhaps we should run 'vacuum' when invoking 'guix gc' or at some other
> key places (where lots of data gets removed from the DB).  There's also
> the auto_vacuum PRAGMA, which is not enabled currently:

Vacuuming periodically seems important, however, it didn't resulted in
noticeable improvements during high I/O pressure when I tested it.

I still think that we should vacuum periodically, even though it means
that all database writes are postponed during the duration of the
vacuuming.

Having a separate fiber in charge of that operation could work I guess.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#43850: cuirass: inconsistent SQL queries execution time.
  2020-10-27 18:11             ` Mathieu Othacehe
@ 2021-03-25 12:46               ` Mathieu Othacehe
  0 siblings, 0 replies; 11+ messages in thread
From: Mathieu Othacehe @ 2021-03-25 12:46 UTC (permalink / raw)
  To: 43850-done


Closing as this is not any issue anymore since the switch to PostgreSQL.

Mathieu




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-25 12:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-07 16:04 bug#43850: cuirass: inconsistent SQL queries execution time Mathieu Othacehe
2020-10-14 16:53 ` Mathieu Othacehe
2020-10-20 11:45   ` Mathieu Othacehe
2020-10-22 11:49     ` Mathieu Othacehe
2020-10-26 22:59       ` Ludovic Courtès
2020-10-27  9:28         ` Christopher Baines
2020-10-27 14:10           ` Mathieu Othacehe
2020-10-27 18:11             ` Mathieu Othacehe
2021-03-25 12:46               ` Mathieu Othacehe
2020-10-31  4:33             ` Maxim Cournoyer
2020-11-01 18:23               ` Mathieu Othacehe

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).