Re: [Cuirass] Missing database indexes?

all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
To: bjoern.hoefling@bjoernhoefling.de
Cc: guix-devel <guix-devel@gnu.org>,
	"Clément Lassieur" <clement@lassieur.org>
Subject: Re: [Cuirass] Missing database indexes?
Date: Mon, 12 Nov 2018 20:42:52 +0100	[thread overview]
Message-ID: <CAL7_Mo_QpFZJu0p46kEA0J+wHUdOuxi4zgq7pvvkwofEaF0ViQ@mail.gmail.com> (raw)
In-Reply-To: <20181112195044.6d64f51c@alma-ubu>

[-- Attachment #1: Type: text/plain, Size: 7181 bytes --]

Hello all,

Le lun. 12 nov. 2018 à 19:51, Björn Höfling <
bjoern.hoefling@bjoernhoefling.de> a écrit :

> Hi Ludo,
>
> On Sun, 11 Nov 2018 18:06:00 +0100
> ludo@gnu.org (Ludovic Courtès) wrote:
>
> > Indeed, that solves the problem for this simple example, thanks!
> >
> > Now, if I go back to the big query that /api/latestbuilds makes¹, the
> > result is still pretty bad:
> >
> > --8<---------------cut here---------------start------------->8---
> > sqlite> EXPLAIN QUERY PLAN SELECT * FROM (
> >    ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp,
> > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status,
> > Builds.job_name, Builds.system, ...> Builds.nix_name,
> > Specifications.name ...> FROM Builds
> >    ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id
> >    ...> INNER JOIN Specifications ON Evaluations.specification =
> > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid))
> >    ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation))
> >    ...> AND (:jobset IS NULL OR (:jobset = Specifications.name))
> >    ...> AND (:job IS NULL OR (:job = Builds.job_name))
> >    ...> AND (:system IS NULL OR (:system = Builds.system))
> >    ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation))
> >    ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status
> > >= 0) ...>                      OR (:status = 'pending' AND
> > >Builds.status < 0)
> >    ...>                      OR (:status = 'succeeded' AND
> > Builds.status = 0) ...>                      OR (:status = 'failed'
> > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL
> > OR :borderlowid IS NULL ...>  OR ((:borderlowtime, :borderlowid) <
> > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL
> > OR :borderhighid IS NULL ...>  OR ((:borderhightime, :borderhighid) >
> > (Builds.stoptime, Builds.rowid))) ...> ORDER BY
> >    ...> CASE WHEN :borderlowtime IS NULL
> >    ...>        OR :borderlowid IS NULL THEN Builds.stoptime
> >    ...>                                ELSE -Builds.stoptime
> >    ...> END DESC,
> >    ...> CASE WHEN :borderlowtime IS NULL
> >    ...>        OR :borderlowid IS NULL THEN Builds.rowid
> >    ...>                                ELSE -Builds.rowid
> >    ...> END DESC
> >    ...> LIMIT :nr)
> >    ...> ORDER BY stoptime, rowid ASC;
> > 1|0|0|SCAN TABLE Builds
> > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?)
> > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX
> > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR
> > ORDER BY 0|0|0|SCAN SUBQUERY 1
> > 0|0|0|USE TEMP B-TREE FOR ORDER BY
> > --8<---------------cut here---------------end--------------->8---
> >
> > I don’t really know what additional index to create (and I’d rather
> > let SQLite do it for me, if it were possible).
>
> I don't know if there is any automated process to assist you. I have
> the feeling that query optimization is more an art than science.
>
>
Hm. This code smells ... It looks too complicated.
>

Agreed.


>
> I don't know if this brings you further concerning performance, here
> are some thoughts:
>
> One problematic part is this construct:
>
> :variable IS NULL OR :variable=my_column)
>
> Here is a very simple example:
>
> sqlite> CREATE TABLE tst (
>    ...> id INTEGER PRIMARY KEY AUTOINCREMENT,
>    ...> name TEXT NOT NULL,
>    ...> age INTEGER NOT NULL);
> sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age);
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE (23=23 OR id=:id);
> 0|0|0|SCAN TABLE tst
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE id=:id;
> 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
>
> sqlite> EXPLAIN QUERY PLAN
>    ...> SELECT * FROM tst WHERE name=:name AND age < 42;
> 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND
> age<?)
>
> So, even when we have a constant part(23=23) in the OR clause, this
> leads to a full table scan. I think the optimizer cannot detect the
> fact that it is a constant boolean value. In the other examples, it is
> using the index.
>
> Even this OR-clause with two variables looks better:
>
> SELECT * FROM tst WHERE (id=:id1 OR id=:id2);
> 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
> 0|0|0|EXECUTE LIST SUBQUERY 1
>
> I double-checked with Postgresql and it is also performing a full table
> scan in the "boolean-constant OR :id=id" case. I could not find any
> references on the net about it.
>
> When this would be Java/JPA I would suggest to dynamically create the
> query. Can we do something in Scheme-DB too? I.e. pseudo-code
>
> (string-append sql-prefix
>   (unless (empty? derivation) "AND :derivation=Builds.derivation")
>   (unless (empty? jobset) "AND :jobset=Builds.jobset)
>   ...)
> ;;; Should be more some kind of folding, because of the "AND"
>
> ;;; Parameter-filling needs to be considered too
>
>
> Two more things I noticed that are not directly performance oriented:
>
> We are directly relying on the rowid here, there is no explicit
> id-column.
>
> This could lead to unpredicted results and reorderings (6th Quirk in
> document):
>
> https://www.sqlite.org/rowidtable.html
>
> We should add a column:
>
> id INTEGER PRIMARY KEY AUTOINCREMENT
>
> Problem is that this concept of AUTOINCREMENT does only work for
> Primary Keys in Sqlite. So we need to degrade "derivation" to a
> secondary key, i.e. make it non-null and unique:
>
> derivation    TEXT NOT NULL UNIQUE,
>
> Is there anything speaking against that?
>
>
> Lastly, the query has a limit and an order-by. The question is: Will
> the result be first ordered and then the limit taken? The answer (I know
> only for Postgresql and MySql, but I think it is the same for Sqlite,
> I haven't found any reference): The order is always executed first, but
> it has to be stable. In this case it is, because we order by
> Builds.rowid, which is a key. Did this happen intentionally or just by
> chance? Should we better add a note about that to the SQL code?
>
> Björn
>

What Björn said.

I might be wrong but if :derivation :jobset :job :system and :evalutation
are exculsive
ie. if one is set the other are null. They could all inherit from a
"Buildable" class in Object-Oriented terms.
Then you'd rather use a Generic Foreign Key pattern where you have two
columns 'object_type' and 'object_id'
where 'object_type' is one of the "object" type that is buildable and
'object_id' is the identifier of the row in the table
named by 'object_type'.

Otherwise you can try narrow the search to a given 'stoptime' or
'starttime' slice and index those columns.
Something like:

SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime,
Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system,
Builds.nix_name, Specifications.name
FROM Builds
INNER JOIN ...
INNER JOIN ...
WHERE Builds.stoptime > yesterday AND Builds.stoptime < now();


HTH

[-- Attachment #2: Type: text/html, Size: 9309 bytes --]

next prev parent reply	other threads:[~2018-11-12 19:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès
2018-11-10 20:11 ` Björn Höfling
2018-11-11 17:06   ` Ludovic Courtès
2018-11-12 18:50     ` Björn Höfling
2018-11-12 19:42       ` Amirouche Boubekki [this message]
2018-11-12 23:27       ` Danny Milosavljevic
2018-11-14 11:14         ` Ludovic Courtès
2018-11-16 22:31         ` Björn Höfling
2018-11-13  8:10       ` Clément Lassieur
2018-11-16 22:42         ` Björn Höfling
2018-11-12 23:31     ` Danny Milosavljevic
2018-11-13  0:04       ` Danny Milosavljevic
2018-11-14 11:11       ` Ludovic Courtès
2018-11-19 10:44         ` Danny Milosavljevic
2018-12-19 22:45           ` Amirouche Boubekki
2018-11-19  9:47 ` swedebugia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAL7_Mo_QpFZJu0p46kEA0J+wHUdOuxi4zgq7pvvkwofEaF0ViQ@mail.gmail.com \
    --to=amirouche.boubekki@gmail.com \
    --cc=bjoern.hoefling@bjoernhoefling.de \
    --cc=clement@lassieur.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.