unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [Cuirass] Missing database indexes?
@ 2018-11-10 17:33 Ludovic Courtès
  2018-11-10 20:11 ` Björn Höfling
  2018-11-19  9:47 ` swedebugia
  0 siblings, 2 replies; 16+ messages in thread
From: Ludovic Courtès @ 2018-11-10 17:33 UTC (permalink / raw)
  To: Danny Milosavljevic, clement; +Cc: guix-devel

Hello!

I was investigating the slowness of our /api/latestbuilds requests on
berlin.

I found that if we have just the two indexes currently defined in
‘schema.sql’, basically everything involves a table scan:

--8<---------------cut here---------------start------------->8---
sqlite> EXPLAIN QUERY PLAN select * from builds where system = "x";                                                                                                       
0|0|0|SCAN TABLE builds
--8<---------------cut here---------------end--------------->8---

I tentatively defined new indexes that seem to help:

--8<---------------cut here---------------start------------->8---
sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation);
sqlite> CREATE INDEX Builds_index_status ON Builds(status);
sqlite> CREATE INDEX Builds_index_system ON Builds(system, evaluation);
sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12;
0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?)
sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and system ="x";
0|0|0|SEARCH TABLE builds USING INDEX Builds_index_system (system=? AND evaluation=?)
sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status = 0;
0|0|0|SEARCH TABLE builds USING INDEX Builds_index_status (status=?)
--8<---------------cut here---------------end--------------->8---

Now, ‘db-get-builds’ in Cuirass uses a more complex query.  In
particular, it orders things, very roughly along these lines:

--8<---------------cut here---------------start------------->8---
sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ;
0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
--8<---------------cut here---------------end--------------->8---

I’m pretty much a database newbie so please forgive the naive question,
but is there something we can do to avoid this extra B-tree step, which
seems costly in space and time?  <http://www.sqlite.com/matrix/eqp.html>
suggests it’s just a matter of adding yet another index but I couldn’t
get that.

Anything else we should do?

Thanks in advance!  :-)

Ludo’.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès
@ 2018-11-10 20:11 ` Björn Höfling
  2018-11-11 17:06   ` Ludovic Courtès
  2018-11-19  9:47 ` swedebugia
  1 sibling, 1 reply; 16+ messages in thread
From: Björn Höfling @ 2018-11-10 20:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 1595 bytes --]

On Sat, 10 Nov 2018 18:33:23 +0100
ludo@gnu.org (Ludovic Courtès) wrote:

> Now, ‘db-get-builds’ in Cuirass uses a more complex query.  In
> particular, it orders things, very roughly along these lines:
> 
> --8<---------------cut here---------------start------------->8---
> sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12
> sqlite> and status > 0 order by stoptime ;  
> 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation
> (evaluation=?) 0|0|0|USE TEMP B-TREE FOR ORDER BY
> --8<---------------cut here---------------end--------------->8---
> 
> I’m pretty much a database newbie so please forgive the naive
> question, but is there something we can do to avoid this extra B-tree
> step, which seems costly in space and time?
> <http://www.sqlite.com/matrix/eqp.html> suggests it’s just a matter
> of adding yet another index but I couldn’t get that.
> 
> Anything else we should do?

The link you provided explains it: The column over which you are sorting
(stoptime) is not indexed. Add it to the (same) index:

--8<---------------cut here---------------start------------->8---
sqlite> DROP INDEX Builds_index_evaluation;
sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation, stoptime);
sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ;
0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?)

--8<---------------cut here---------------end--------------->8---

If there is more SQL-trouble, I can try to help out.

Björn

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-10 20:11 ` Björn Höfling
@ 2018-11-11 17:06   ` Ludovic Courtès
  2018-11-12 18:50     ` Björn Höfling
  2018-11-12 23:31     ` Danny Milosavljevic
  0 siblings, 2 replies; 16+ messages in thread
From: Ludovic Courtès @ 2018-11-11 17:06 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel, clement

Hi Björn,

Björn Höfling <bjoern.hoefling@bjoernhoefling.de> skribis:

> The link you provided explains it: The column over which you are sorting
> (stoptime) is not indexed. Add it to the (same) index:
>
> sqlite> DROP INDEX Builds_index_evaluation;
> sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation, stoptime);
> sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ;
> 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?)
>
> If there is more SQL-trouble, I can try to help out.

Indeed, that solves the problem for this simple example, thanks!

Now, if I go back to the big query that /api/latestbuilds makes¹, the
result is still pretty bad:

--8<---------------cut here---------------start------------->8---
sqlite> EXPLAIN QUERY PLAN SELECT * FROM (
   ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime,
   ...> Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system,
   ...> Builds.nix_name, Specifications.name
   ...> FROM Builds
   ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id
   ...> INNER JOIN Specifications ON Evaluations.specification = Specifications.name
   ...> WHERE (:id IS NULL OR (:id = Builds.rowid))
   ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation))
   ...> AND (:jobset IS NULL OR (:jobset = Specifications.name))
   ...> AND (:job IS NULL OR (:job = Builds.job_name))
   ...> AND (:system IS NULL OR (:system = Builds.system))
   ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation))
   ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status >= 0)
   ...>                      OR (:status = 'pending' AND Builds.status < 0)
   ...>                      OR (:status = 'succeeded' AND Builds.status = 0)
   ...>                      OR (:status = 'failed' AND Builds.status > 0))
   ...> AND (:borderlowtime IS NULL OR :borderlowid IS NULL
   ...>  OR ((:borderlowtime, :borderlowid) < (Builds.stoptime, Builds.rowid)))
   ...> AND (:borderhightime IS NULL OR :borderhighid IS NULL
   ...>  OR ((:borderhightime, :borderhighid) > (Builds.stoptime, Builds.rowid)))
   ...> ORDER BY
   ...> CASE WHEN :borderlowtime IS NULL
   ...>        OR :borderlowid IS NULL THEN Builds.stoptime
   ...>                                ELSE -Builds.stoptime
   ...> END DESC,
   ...> CASE WHEN :borderlowtime IS NULL
   ...>        OR :borderlowid IS NULL THEN Builds.rowid
   ...>                                ELSE -Builds.rowid
   ...> END DESC
   ...> LIMIT :nr)
   ...> ORDER BY stoptime, rowid ASC;
1|0|0|SCAN TABLE Builds
1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?)
1|2|2|SEARCH TABLE Specifications USING COVERING INDEX sqlite_autoindex_Specifications_1 (name=?)
1|0|0|USE TEMP B-TREE FOR ORDER BY
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY
--8<---------------cut here---------------end--------------->8---

I don’t really know what additional index to create (and I’d rather let
SQLite do it for me, if it were possible).

Thoughts?

Thanks,
Ludo’.

¹ https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/tree/src/cuirass/database.scm#n550

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-11 17:06   ` Ludovic Courtès
@ 2018-11-12 18:50     ` Björn Höfling
  2018-11-12 19:42       ` Amirouche Boubekki
                         ` (2 more replies)
  2018-11-12 23:31     ` Danny Milosavljevic
  1 sibling, 3 replies; 16+ messages in thread
From: Björn Höfling @ 2018-11-12 18:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 5869 bytes --]

Hi Ludo,

On Sun, 11 Nov 2018 18:06:00 +0100
ludo@gnu.org (Ludovic Courtès) wrote:

> Indeed, that solves the problem for this simple example, thanks!
> 
> Now, if I go back to the big query that /api/latestbuilds makes¹, the
> result is still pretty bad:
> 
> --8<---------------cut here---------------start------------->8---
> sqlite> EXPLAIN QUERY PLAN SELECT * FROM (
>    ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp,
> Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status,
> Builds.job_name, Builds.system, ...> Builds.nix_name,
> Specifications.name ...> FROM Builds
>    ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id
>    ...> INNER JOIN Specifications ON Evaluations.specification =
> Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid))
>    ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation))
>    ...> AND (:jobset IS NULL OR (:jobset = Specifications.name))
>    ...> AND (:job IS NULL OR (:job = Builds.job_name))
>    ...> AND (:system IS NULL OR (:system = Builds.system))
>    ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation))
>    ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status
> >= 0) ...>                      OR (:status = 'pending' AND
> >Builds.status < 0)
>    ...>                      OR (:status = 'succeeded' AND
> Builds.status = 0) ...>                      OR (:status = 'failed'
> AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL
> OR :borderlowid IS NULL ...>  OR ((:borderlowtime, :borderlowid) <
> (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL
> OR :borderhighid IS NULL ...>  OR ((:borderhightime, :borderhighid) >
> (Builds.stoptime, Builds.rowid))) ...> ORDER BY
>    ...> CASE WHEN :borderlowtime IS NULL
>    ...>        OR :borderlowid IS NULL THEN Builds.stoptime
>    ...>                                ELSE -Builds.stoptime
>    ...> END DESC,
>    ...> CASE WHEN :borderlowtime IS NULL
>    ...>        OR :borderlowid IS NULL THEN Builds.rowid
>    ...>                                ELSE -Builds.rowid
>    ...> END DESC
>    ...> LIMIT :nr)
>    ...> ORDER BY stoptime, rowid ASC;  
> 1|0|0|SCAN TABLE Builds
> 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?)
> 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX
> sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR
> ORDER BY 0|0|0|SCAN SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> --8<---------------cut here---------------end--------------->8---
> 
> I don’t really know what additional index to create (and I’d rather
> let SQLite do it for me, if it were possible).

I don't know if there is any automated process to assist you. I have
the feeling that query optimization is more an art than science.

Hm. This code smells ... It looks too complicated.

I don't know if this brings you further concerning performance, here
are some thoughts:

One problematic part is this construct:

:variable IS NULL OR :variable=my_column)

Here is a very simple example:

sqlite> CREATE TABLE tst (
   ...> id INTEGER PRIMARY KEY AUTOINCREMENT,
   ...> name TEXT NOT NULL,
   ...> age INTEGER NOT NULL);
sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age);

sqlite> EXPLAIN QUERY PLAN
   ...> SELECT * FROM tst WHERE (23=23 OR id=:id);
0|0|0|SCAN TABLE tst

sqlite> EXPLAIN QUERY PLAN
   ...> SELECT * FROM tst WHERE id=:id;
0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)

sqlite> EXPLAIN QUERY PLAN
   ...> SELECT * FROM tst WHERE name=:name AND age < 42;
0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND age<?)

So, even when we have a constant part(23=23) in the OR clause, this
leads to a full table scan. I think the optimizer cannot detect the
fact that it is a constant boolean value. In the other examples, it is
using the index.

Even this OR-clause with two variables looks better:

SELECT * FROM tst WHERE (id=:id1 OR id=:id2);
0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1

I double-checked with Postgresql and it is also performing a full table
scan in the "boolean-constant OR :id=id" case. I could not find any
references on the net about it.

When this would be Java/JPA I would suggest to dynamically create the
query. Can we do something in Scheme-DB too? I.e. pseudo-code

(string-append sql-prefix
  (unless (empty? derivation) "AND :derivation=Builds.derivation")
  (unless (empty? jobset) "AND :jobset=Builds.jobset)
  ...)
;;; Should be more some kind of folding, because of the "AND"
 
;;; Parameter-filling needs to be considered too


Two more things I noticed that are not directly performance oriented:

We are directly relying on the rowid here, there is no explicit
id-column.

This could lead to unpredicted results and reorderings (6th Quirk in
document):

https://www.sqlite.org/rowidtable.html

We should add a column:

id INTEGER PRIMARY KEY AUTOINCREMENT

Problem is that this concept of AUTOINCREMENT does only work for
Primary Keys in Sqlite. So we need to degrade "derivation" to a
secondary key, i.e. make it non-null and unique:

derivation    TEXT NOT NULL UNIQUE,

Is there anything speaking against that?


Lastly, the query has a limit and an order-by. The question is: Will
the result be first ordered and then the limit taken? The answer (I know
only for Postgresql and MySql, but I think it is the same for Sqlite,
I haven't found any reference): The order is always executed first, but
it has to be stable. In this case it is, because we order by
Builds.rowid, which is a key. Did this happen intentionally or just by
chance? Should we better add a note about that to the SQL code?

Björn






[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 18:50     ` Björn Höfling
@ 2018-11-12 19:42       ` Amirouche Boubekki
  2018-11-12 23:27       ` Danny Milosavljevic
  2018-11-13  8:10       ` Clément Lassieur
  2 siblings, 0 replies; 16+ messages in thread
From: Amirouche Boubekki @ 2018-11-12 19:42 UTC (permalink / raw)
  To: bjoern.hoefling; +Cc: guix-devel, Clément Lassieur

[-- Attachment #1: Type: text/plain, Size: 7181 bytes --]

Hello all,

Le lun. 12 nov. 2018 à 19:51, Björn Höfling <
bjoern.hoefling@bjoernhoefling.de> a écrit :

> Hi Ludo,
>
> On Sun, 11 Nov 2018 18:06:00 +0100
> ludo@gnu.org (Ludovic Courtès) wrote:
>
> > Indeed, that solves the problem for this simple example, thanks!
> >
> > Now, if I go back to the big query that /api/latestbuilds makes¹, the
> > result is still pretty bad:
> >
> > --8<---------------cut here---------------start------------->8---
> > sqlite> EXPLAIN QUERY PLAN SELECT * FROM (
> >    ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp,
> > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status,
> > Builds.job_name, Builds.system, ...> Builds.nix_name,
> > Specifications.name ...> FROM Builds
> >    ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id
> >    ...> INNER JOIN Specifications ON Evaluations.specification =
> > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid))
> >    ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation))
> >    ...> AND (:jobset IS NULL OR (:jobset = Specifications.name))
> >    ...> AND (:job IS NULL OR (:job = Builds.job_name))
> >    ...> AND (:system IS NULL OR (:system = Builds.system))
> >    ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation))
> >    ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status
> > >= 0) ...>                      OR (:status = 'pending' AND
> > >Builds.status < 0)
> >    ...>                      OR (:status = 'succeeded' AND
> > Builds.status = 0) ...>                      OR (:status = 'failed'
> > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL
> > OR :borderlowid IS NULL ...>  OR ((:borderlowtime, :borderlowid) <
> > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL
> > OR :borderhighid IS NULL ...>  OR ((:borderhightime, :borderhighid) >
> > (Builds.stoptime, Builds.rowid))) ...> ORDER BY
> >    ...> CASE WHEN :borderlowtime IS NULL
> >    ...>        OR :borderlowid IS NULL THEN Builds.stoptime
> >    ...>                                ELSE -Builds.stoptime
> >    ...> END DESC,
> >    ...> CASE WHEN :borderlowtime IS NULL
> >    ...>        OR :borderlowid IS NULL THEN Builds.rowid
> >    ...>                                ELSE -Builds.rowid
> >    ...> END DESC
> >    ...> LIMIT :nr)
> >    ...> ORDER BY stoptime, rowid ASC;
> > 1|0|0|SCAN TABLE Builds
> > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?)
> > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX
> > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR
> > ORDER BY 0|0|0|SCAN SUBQUERY 1
> > 0|0|0|USE TEMP B-TREE FOR ORDER BY
> > --8<---------------cut here---------------end--------------->8---
> >
> > I don’t really know what additional index to create (and I’d rather
> > let SQLite do it for me, if it were possible).
>
> I don't know if there is any automated process to assist you. I have
> the feeling that query optimization is more an art than science.
>
>
Hm. This code smells ... It looks too complicated.
>

Agreed.


>
> I don't know if this brings you further concerning performance, here
> are some thoughts:
>
> One problematic part is this construct:
>
> :variable IS NULL OR :variable=my_column)
>
> Here is a very simple example:
>
> sqlite> CREATE TABLE tst (
>    ...> id INTEGER PRIMARY KEY AUTOINCREMENT,
>    ...> name TEXT NOT NULL,
>    ...> age INTEGER NOT NULL);
> sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age);
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE (23=23 OR id=:id);
> 0|0|0|SCAN TABLE tst
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE id=:id;
> 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE name=:name AND age < 42;
> 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND
> age<?)
>
> So, even when we have a constant part(23=23) in the OR clause, this
> leads to a full table scan. I think the optimizer cannot detect the
> fact that it is a constant boolean value. In the other examples, it is
> using the index.
>
> Even this OR-clause with two variables looks better:
>
> SELECT * FROM tst WHERE (id=:id1 OR id=:id2);
> 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
> 0|0|0|EXECUTE LIST SUBQUERY 1
>
> I double-checked with Postgresql and it is also performing a full table
> scan in the "boolean-constant OR :id=id" case. I could not find any
> references on the net about it.
>
> When this would be Java/JPA I would suggest to dynamically create the
> query. Can we do something in Scheme-DB too? I.e. pseudo-code
>
> (string-append sql-prefix
>   (unless (empty? derivation) "AND :derivation=Builds.derivation")
>   (unless (empty? jobset) "AND :jobset=Builds.jobset)
>   ...)
> ;;; Should be more some kind of folding, because of the "AND"
>
> ;;; Parameter-filling needs to be considered too
>
>
> Two more things I noticed that are not directly performance oriented:
>
> We are directly relying on the rowid here, there is no explicit
> id-column.
>
> This could lead to unpredicted results and reorderings (6th Quirk in
> document):
>
> https://www.sqlite.org/rowidtable.html
>
> We should add a column:
>
> id INTEGER PRIMARY KEY AUTOINCREMENT
>
> Problem is that this concept of AUTOINCREMENT does only work for
> Primary Keys in Sqlite. So we need to degrade "derivation" to a
> secondary key, i.e. make it non-null and unique:
>
> derivation    TEXT NOT NULL UNIQUE,
>
> Is there anything speaking against that?
>
>
> Lastly, the query has a limit and an order-by. The question is: Will
> the result be first ordered and then the limit taken? The answer (I know
> only for Postgresql and MySql, but I think it is the same for Sqlite,
> I haven't found any reference): The order is always executed first, but
> it has to be stable. In this case it is, because we order by
> Builds.rowid, which is a key. Did this happen intentionally or just by
> chance? Should we better add a note about that to the SQL code?
>
> Björn
>

What Björn said.

I might be wrong but if :derivation :jobset :job :system and :evalutation
are exculsive
ie. if one is set the other are null. They could all inherit from a
"Buildable" class in Object-Oriented terms.
Then you'd rather use a Generic Foreign Key pattern where you have two
columns 'object_type' and 'object_id'
where 'object_type' is one of the "object" type that is buildable and
'object_id' is the identifier of the row in the table
named by 'object_type'.

Otherwise you can try narrow the search to a given 'stoptime' or
'starttime' slice and index those columns.
Something like:

SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime,
Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system,
Builds.nix_name, Specifications.name
FROM Builds
INNER JOIN ...
INNER JOIN ...
WHERE Builds.stoptime > yesterday AND Builds.stoptime < now();


HTH

[-- Attachment #2: Type: text/html, Size: 9309 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 18:50     ` Björn Höfling
  2018-11-12 19:42       ` Amirouche Boubekki
@ 2018-11-12 23:27       ` Danny Milosavljevic
  2018-11-14 11:14         ` Ludovic Courtès
  2018-11-16 22:31         ` Björn Höfling
  2018-11-13  8:10       ` Clément Lassieur
  2 siblings, 2 replies; 16+ messages in thread
From: Danny Milosavljevic @ 2018-11-12 23:27 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 3358 bytes --]

Hi Björn,

On Mon, 12 Nov 2018 19:50:44 +0100
Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote:

> Hm. This code smells ... It looks too complicated.

I was trying to cut down the number of prepared statements in use and prevent a
combinatorial explosion while keeping the kinds of queries we can do open.

Either the value of a parameter is specified, in which case the associated column
is filtered for it; or it isn't, then it's not.

> So, even when we have a constant part(23=23) in the OR clause, this
> leads to a full table scan. I think the optimizer cannot detect the
> fact that it is a constant boolean value. 

Sounds like an easy fix in sqlite.  Could you report this upstream?

> I double-checked with Postgresql and it is also performing a full table
> scan in the "boolean-constant OR :id=id" case. I could not find any
> references on the net about it.

Something easy to try would be to use the row values in sqlite instead.

See also https://www.sqlite.org/rowvalue.html
See also https://lists.gnu.org/archive/html/guix-devel/2018-07/msg00101.html

> When this would be Java/JPA I would suggest to dynamically create the
> query. Can we do something in Scheme-DB too? I.e. pseudo-code

(1) Combinatorial explosion of the number of prepared statements
(2) Slow to parse all those SQL statements

But if we don't use all combinations in practise then it's not so bad
and we could generate those statements after all.  It's still a
workaround if we have to do that.

Then we'd have to make sure that the user can't specify arbitrary
combinations and/or limit the number of prepared statements that
exist at the same time.  Cuirass is a web-facing service so these
are not theoretical problems - someone *will* break it (maybe on
purpose) if it's possible.

All in all, generating these SQL statements is like generating a
program source code and recompiling it every time you want to change
the filter used.  It might be the only good solution in this case
(maybe) - but in general it's an antipattern.

> We should add a column:
> 
> id INTEGER PRIMARY KEY AUTOINCREMENT
> 
> Problem is that this concept of AUTOINCREMENT does only work for
> Primary Keys in Sqlite. So we need to degrade "derivation" to a
> secondary key, i.e. make it non-null and unique:
> 
> derivation    TEXT NOT NULL UNIQUE,
> 
> Is there anything speaking against that?

Sounds good.  Note that when you use autoincrement, you eventually
have to handle the case when the value overflows.  The window of
used IDs slowly creeps up over the months.

> Lastly, the query has a limit and an order-by. The question is: Will
> the result be first ordered and then the limit taken? The answer (I know
> only for Postgresql and MySql, but I think it is the same for Sqlite,

I don't think the documentation at https://www.sqlite.org/lang_select.html
specifies, but it's the only thing that makes sense.  Otherwise LIMIT would
be useless as a pager.  Still would be good to have official confirmation
by the sqlite authors.

> I haven't found any reference): The order is always executed first, but
> it has to be stable. In this case it is, because we order by
> Builds.rowid, which is a key.

>Did this happen intentionally

Of course.  Otherwise LIMIT would be useless as a pager.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-11 17:06   ` Ludovic Courtès
  2018-11-12 18:50     ` Björn Höfling
@ 2018-11-12 23:31     ` Danny Milosavljevic
  2018-11-13  0:04       ` Danny Milosavljevic
  2018-11-14 11:11       ` Ludovic Courtès
  1 sibling, 2 replies; 16+ messages in thread
From: Danny Milosavljevic @ 2018-11-12 23:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

Hi Ludo,

On Sun, 11 Nov 2018 18:06:00 +0100
ludo@gnu.org (Ludovic Courtès) wrote:

> I don’t really know what additional index to create (and I’d rather let
> SQLite do it for me, if it were possible).

Not exactly what you mean but there's this:

https://www.sqlite.org/lang_analyze.html

It does statistical analysis of queries that ran and will optimize for
that case on subsequent connections.

Other than that, I always create indices by hand since it's a one-time
effort anyway - so just searching for a solution to automate it takes
longer than just adding an index for a column I match on.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 23:31     ` Danny Milosavljevic
@ 2018-11-13  0:04       ` Danny Milosavljevic
  2018-11-14 11:11       ` Ludovic Courtès
  1 sibling, 0 replies; 16+ messages in thread
From: Danny Milosavljevic @ 2018-11-13  0:04 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 812 bytes --]

On Tue, 13 Nov 2018 00:31:40 +0100
Danny Milosavljevic <dannym@scratchpost.org> wrote:

> Hi Ludo,
> 
> On Sun, 11 Nov 2018 18:06:00 +0100
> ludo@gnu.org (Ludovic Courtès) wrote:
> 
> > I don’t really know what additional index to create (and I’d rather let
> > SQLite do it for me, if it were possible).  
> 
> Not exactly what you mean but there's this:
> 
> https://www.sqlite.org/lang_analyze.html
> 
> It does statistical analysis of queries that ran and will optimize for
> that case on subsequent connections.
> 
> Other than that, I always create indices by hand since it's a one-time
> effort anyway - so just searching for a solution to automate it takes
> longer than just adding an index for a column I match on.

Also

https://www.sqlite.org/optoverview.html#autoindex

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 18:50     ` Björn Höfling
  2018-11-12 19:42       ` Amirouche Boubekki
  2018-11-12 23:27       ` Danny Milosavljevic
@ 2018-11-13  8:10       ` Clément Lassieur
  2018-11-16 22:42         ` Björn Höfling
  2 siblings, 1 reply; 16+ messages in thread
From: Clément Lassieur @ 2018-11-13  8:10 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel

Hi Björn,

Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:

> We are directly relying on the rowid here, there is no explicit
> id-column.
>
> This could lead to unpredicted results and reorderings (6th Quirk in
> document):
>
> https://www.sqlite.org/rowidtable.html
>
> We should add a column:
>
> id INTEGER PRIMARY KEY AUTOINCREMENT
>
> Problem is that this concept of AUTOINCREMENT does only work for
> Primary Keys in Sqlite. So we need to degrade "derivation" to a
> secondary key, i.e. make it non-null and unique:
>
> derivation    TEXT NOT NULL UNIQUE,
>
> Is there anything speaking against that?

We only use that rowid to display a number at the left of every 'build'
row.  I think it would make more sense to use the derivation name where
we currently use the rowid.  It would also be more understandable for
the users.

We don't even need rowid for sorting because we can sort with the
timestamps.

The only issue is that we get further from hydra, but we are already
pretty far away anyway.

Clément

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 23:31     ` Danny Milosavljevic
  2018-11-13  0:04       ` Danny Milosavljevic
@ 2018-11-14 11:11       ` Ludovic Courtès
  2018-11-19 10:44         ` Danny Milosavljevic
  1 sibling, 1 reply; 16+ messages in thread
From: Ludovic Courtès @ 2018-11-14 11:11 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

Hi,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

> On Sun, 11 Nov 2018 18:06:00 +0100
> ludo@gnu.org (Ludovic Courtès) wrote:
>
>> I don’t really know what additional index to create (and I’d rather let
>> SQLite do it for me, if it were possible).
>
> Not exactly what you mean but there's this:
>
> https://www.sqlite.org/lang_analyze.html
>
> It does statistical analysis of queries that ran and will optimize for
> that case on subsequent connections.

That’s close to what I was hoping for.  We could do “PRAGMA optimize”
before closing the database session as they suggest:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 634 bytes --]

diff --git a/src/cuirass/database.scm b/src/cuirass/database.scm
index 8b83c18..fb037d1 100644
--- a/src/cuirass/database.scm
+++ b/src/cuirass/database.scm
@@ -403,7 +403,9 @@ a critical section that allows database operations to be serialized."
      ;; be costly and may defeat statement caching.
      (parameterize ((%db-channel (make-critical-section db)))
        body ...)
-     (db-close db))))
+     (begin
+       (sqlite-exec db "PRAGMA optimize;")
+       (db-close db)))))
 
 (define* (read-quoted-string #:optional (port (current-input-port)))
   "Read all of the characters out of PORT and return them as a SQL quoted

[-- Attachment #3: Type: text/plain, Size: 1477 bytes --]


… though I think we never really close it properly because Cuirass runs
“forever.”

I tried doing this in a Guile session on berlin:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(cuirass database)
scheme@(guile-user)> (%package-database "/var/lib/cuirass/cuirass.db")
$6 = "/gnu/store/g1q2lv75a2fibii4y52fndz5zpbmyl12-cuirass-0.0.1-21.0b40dca/var/lib/cuirass/cuirass.db"
scheme@(guile-user)>  (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;")))
$7 = ()
scheme@(guile-user)>  (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;")))
$8 = ()
scheme@(guile-user)> ,time (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;")))
$9 = ()
;; 13.215291s real time, 13.229189s run time.  0.016093s spent in GC.
scheme@(guile-user)> ,time (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;")))
$10 = ()
;; 13.204621s real time, 13.230655s run time.  0.029333s spent in GC.
--8<---------------cut here---------------end--------------->8---

It doesn’t seem to help much, perhaps because the query is too complex?

Thoughts?

Ludo’.

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 23:27       ` Danny Milosavljevic
@ 2018-11-14 11:14         ` Ludovic Courtès
  2018-11-16 22:31         ` Björn Höfling
  1 sibling, 0 replies; 16+ messages in thread
From: Ludovic Courtès @ 2018-11-14 11:14 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, clement

Hello,

Danny Milosavljevic <dannym@scratchpost.org> skribis:

> On Mon, 12 Nov 2018 19:50:44 +0100
> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote:
>
>> Hm. This code smells ... It looks too complicated.
>
> I was trying to cut down the number of prepared statements in use and prevent a
> combinatorial explosion while keeping the kinds of queries we can do open.

I understand both arguments.  I wonder if there’s a way to conciliate
these two issues.  What do you both think?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-12 23:27       ` Danny Milosavljevic
  2018-11-14 11:14         ` Ludovic Courtès
@ 2018-11-16 22:31         ` Björn Höfling
  1 sibling, 0 replies; 16+ messages in thread
From: Björn Höfling @ 2018-11-16 22:31 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 2379 bytes --]

On Tue, 13 Nov 2018 00:27:15 +0100
Danny Milosavljevic <dannym@scratchpost.org> wrote:

> Hi Björn,
> 
> On Mon, 12 Nov 2018 19:50:44 +0100
> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote:
> 
> > Hm. This code smells ... It looks too complicated.  
> 
> I was trying to cut down the number of prepared statements in use and
> prevent a combinatorial explosion while keeping the kinds of queries
> we can do open.
> 
> Either the value of a parameter is specified, in which case the
> associated column is filtered for it; or it isn't, then it's not.
> 
> > So, even when we have a constant part(23=23) in the OR clause, this
> > leads to a full table scan. I think the optimizer cannot detect the
> > fact that it is a constant boolean value.   
> 
> Sounds like an easy fix in sqlite.  Could you report this upstream?

I thought about these things again, with the help of stackoverflow and
this was really silly: If the left part of the clause

(:jobset IS NULL OR (:jobset = Specifications.name))

is constant, null, then the whole clause is true. And if that is
the case for all, then everything is true, of cause resulting in a full
table scan. That is it I think what the query optimizer is thinking of
here. I guess it will be better if there are some non-null and some
null clauses mixed.

I have the full DB here now, I will look into it the next days.


> > I double-checked with Postgresql and it is also performing a full
> > table scan in the "boolean-constant OR :id=id" case. I could not
> > find any references on the net about it.  
> 
> Something easy to try would be to use the row values in sqlite
> instead.
> 
> See also https://www.sqlite.org/rowvalue.html
> See also
> https://lists.gnu.org/archive/html/guix-devel/2018-07/msg00101.html

That would be another idea.

 
> > When this would be Java/JPA I would suggest to dynamically create
> > the query. Can we do something in Scheme-DB too? I.e. pseudo-code  
> 
> (1) Combinatorial explosion of the number of prepared statements
> (2) Slow to parse all those SQL statements
> 
> But if we don't use all combinations in practise then it's not so bad
> and we could generate those statements after all.  It's still a
> workaround if we have to do that.

You are right. That sounds reasonable. It just looks ugly to read.

Björn

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-13  8:10       ` Clément Lassieur
@ 2018-11-16 22:42         ` Björn Höfling
  0 siblings, 0 replies; 16+ messages in thread
From: Björn Höfling @ 2018-11-16 22:42 UTC (permalink / raw)
  To: Clément Lassieur; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Tue, 13 Nov 2018 09:10:30 +0100
Clément Lassieur <clement@lassieur.org> wrote:

> Hi Björn,
> 
> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:
> 
> > We are directly relying on the rowid here, there is no explicit
> > id-column.
> >
> > This could lead to unpredicted results and reorderings (6th Quirk in
> > document):
> >
> > https://www.sqlite.org/rowidtable.html
> >
> > We should add a column:
> >
> > id INTEGER PRIMARY KEY AUTOINCREMENT
> >
> > Problem is that this concept of AUTOINCREMENT does only work for
> > Primary Keys in Sqlite. So we need to degrade "derivation" to a
> > secondary key, i.e. make it non-null and unique:
> >
> > derivation    TEXT NOT NULL UNIQUE,
> >
> > Is there anything speaking against that?  
> 
> We only use that rowid to display a number at the left of every
> 'build' row.  I think it would make more sense to use the derivation
> name where we currently use the rowid.  It would also be more
> understandable for the users.
> 
> We don't even need rowid for sorting because we can sort with the
> timestamps.
> 
> The only issue is that we get further from hydra, but we are already
> pretty far away anyway.

I'm undecided about this, I don't understand enough of it yet. In some
sense I would prefer a numeric id to refer to. On the other hand the
derivation is the actual key, and if that is generated a second time,
the build is the same and will not be executed again.

Björn

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès
  2018-11-10 20:11 ` Björn Höfling
@ 2018-11-19  9:47 ` swedebugia
  1 sibling, 0 replies; 16+ messages in thread
From: swedebugia @ 2018-11-19  9:47 UTC (permalink / raw)
  To: guix-devel

Hi

On 2018-11-10 18:33, Ludovic Courtès wrote:

snip

> --8<---------------cut here---------------start------------->8---
> sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation);

snip

> Anything else we should do?

I woke up today with an idea. :)

Now on berlin it seems that all jobset are in one enormous table, correct?

How about creating a database/sqlite file for every specification and a 
table for every jobset? This would also solve the problem with 
overrunning id. (there will never be more ids than jobs in a single 
jobset in a table)

This means cuirass only work on one database per specification (now 
there are 5 specs on berlin)

Every jobset having its own table/file will lower the total number of 
rows, thus lower the computing cost of a table scan.

-- 
Cheers
Swedebugia

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-14 11:11       ` Ludovic Courtès
@ 2018-11-19 10:44         ` Danny Milosavljevic
  2018-12-19 22:45           ` Amirouche Boubekki
  0 siblings, 1 reply; 16+ messages in thread
From: Danny Milosavljevic @ 2018-11-19 10:44 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, clement

[-- Attachment #1: Type: text/plain, Size: 276 bytes --]

Hi Ludo,

>It doesn’t seem to help much, perhaps because the query is too complex?

Yeah, probably.

According to the docs a log message is supposed to appear when it is doing it.

We should just special-case the common queries so the optimizer has a
easier life.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Cuirass] Missing database indexes?
  2018-11-19 10:44         ` Danny Milosavljevic
@ 2018-12-19 22:45           ` Amirouche Boubekki
  0 siblings, 0 replies; 16+ messages in thread
From: Amirouche Boubekki @ 2018-12-19 22:45 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, Clément Lassieur

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

Did you consider using wiredtiger?

Le lun. 19 nov. 2018 à 11:47, Danny Milosavljevic <dannym@scratchpost.org>
a écrit :

> Hi Ludo,
>
> >It doesn’t seem to help much, perhaps because the query is too complex?
>
> Yeah, probably.
>
> According to the docs a log message is supposed to appear when it is doing
> it.
>
> We should just special-case the common queries so the optimizer has a
> easier life.
>

[-- Attachment #2: Type: text/html, Size: 713 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-12-19 22:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès
2018-11-10 20:11 ` Björn Höfling
2018-11-11 17:06   ` Ludovic Courtès
2018-11-12 18:50     ` Björn Höfling
2018-11-12 19:42       ` Amirouche Boubekki
2018-11-12 23:27       ` Danny Milosavljevic
2018-11-14 11:14         ` Ludovic Courtès
2018-11-16 22:31         ` Björn Höfling
2018-11-13  8:10       ` Clément Lassieur
2018-11-16 22:42         ` Björn Höfling
2018-11-12 23:31     ` Danny Milosavljevic
2018-11-13  0:04       ` Danny Milosavljevic
2018-11-14 11:11       ` Ludovic Courtès
2018-11-19 10:44         ` Danny Milosavljevic
2018-12-19 22:45           ` Amirouche Boubekki
2018-11-19  9:47 ` swedebugia

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).