From: "Björn Höfling" <bjoern.hoefling@bjoernhoefling.de>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org, clement@lassieur.org
Subject: Re: [Cuirass] Missing database indexes?
Date: Mon, 12 Nov 2018 19:50:44 +0100 [thread overview]
Message-ID: <20181112195044.6d64f51c@alma-ubu> (raw)
In-Reply-To: <87k1ljr1c7.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 5869 bytes --]
Hi Ludo,
On Sun, 11 Nov 2018 18:06:00 +0100
ludo@gnu.org (Ludovic Courtès) wrote:
> Indeed, that solves the problem for this simple example, thanks!
>
> Now, if I go back to the big query that /api/latestbuilds makes¹, the
> result is still pretty bad:
>
> --8<---------------cut here---------------start------------->8---
> sqlite> EXPLAIN QUERY PLAN SELECT * FROM (
> ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp,
> Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status,
> Builds.job_name, Builds.system, ...> Builds.nix_name,
> Specifications.name ...> FROM Builds
> ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id
> ...> INNER JOIN Specifications ON Evaluations.specification =
> Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid))
> ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation))
> ...> AND (:jobset IS NULL OR (:jobset = Specifications.name))
> ...> AND (:job IS NULL OR (:job = Builds.job_name))
> ...> AND (:system IS NULL OR (:system = Builds.system))
> ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation))
> ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status
> >= 0) ...> OR (:status = 'pending' AND
> >Builds.status < 0)
> ...> OR (:status = 'succeeded' AND
> Builds.status = 0) ...> OR (:status = 'failed'
> AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL
> OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) <
> (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL
> OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) >
> (Builds.stoptime, Builds.rowid))) ...> ORDER BY
> ...> CASE WHEN :borderlowtime IS NULL
> ...> OR :borderlowid IS NULL THEN Builds.stoptime
> ...> ELSE -Builds.stoptime
> ...> END DESC,
> ...> CASE WHEN :borderlowtime IS NULL
> ...> OR :borderlowid IS NULL THEN Builds.rowid
> ...> ELSE -Builds.rowid
> ...> END DESC
> ...> LIMIT :nr)
> ...> ORDER BY stoptime, rowid ASC;
> 1|0|0|SCAN TABLE Builds
> 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?)
> 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX
> sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR
> ORDER BY 0|0|0|SCAN SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR ORDER BY
> --8<---------------cut here---------------end--------------->8---
>
> I don’t really know what additional index to create (and I’d rather
> let SQLite do it for me, if it were possible).
I don't know if there is any automated process to assist you. I have
the feeling that query optimization is more an art than science.
Hm. This code smells ... It looks too complicated.
I don't know if this brings you further concerning performance, here
are some thoughts:
One problematic part is this construct:
:variable IS NULL OR :variable=my_column)
Here is a very simple example:
sqlite> CREATE TABLE tst (
...> id INTEGER PRIMARY KEY AUTOINCREMENT,
...> name TEXT NOT NULL,
...> age INTEGER NOT NULL);
sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age);
sqlite> EXPLAIN QUERY PLAN
...> SELECT * FROM tst WHERE (23=23 OR id=:id);
0|0|0|SCAN TABLE tst
sqlite> EXPLAIN QUERY PLAN
...> SELECT * FROM tst WHERE id=:id;
0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
sqlite> EXPLAIN QUERY PLAN
...> SELECT * FROM tst WHERE name=:name AND age < 42;
0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND age<?)
So, even when we have a constant part(23=23) in the OR clause, this
leads to a full table scan. I think the optimizer cannot detect the
fact that it is a constant boolean value. In the other examples, it is
using the index.
Even this OR-clause with two variables looks better:
SELECT * FROM tst WHERE (id=:id1 OR id=:id2);
0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
I double-checked with Postgresql and it is also performing a full table
scan in the "boolean-constant OR :id=id" case. I could not find any
references on the net about it.
When this would be Java/JPA I would suggest to dynamically create the
query. Can we do something in Scheme-DB too? I.e. pseudo-code
(string-append sql-prefix
(unless (empty? derivation) "AND :derivation=Builds.derivation")
(unless (empty? jobset) "AND :jobset=Builds.jobset)
...)
;;; Should be more some kind of folding, because of the "AND"
;;; Parameter-filling needs to be considered too
Two more things I noticed that are not directly performance oriented:
We are directly relying on the rowid here, there is no explicit
id-column.
This could lead to unpredicted results and reorderings (6th Quirk in
document):
https://www.sqlite.org/rowidtable.html
We should add a column:
id INTEGER PRIMARY KEY AUTOINCREMENT
Problem is that this concept of AUTOINCREMENT does only work for
Primary Keys in Sqlite. So we need to degrade "derivation" to a
secondary key, i.e. make it non-null and unique:
derivation TEXT NOT NULL UNIQUE,
Is there anything speaking against that?
Lastly, the query has a limit and an order-by. The question is: Will
the result be first ordered and then the limit taken? The answer (I know
only for Postgresql and MySql, but I think it is the same for Sqlite,
I haven't found any reference): The order is always executed first, but
it has to be stable. In this case it is, because we order by
Builds.rowid, which is a key. Did this happen intentionally or just by
chance? Should we better add a note about that to the SQL code?
Björn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
next prev parent reply other threads:[~2018-11-12 18:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès
2018-11-10 20:11 ` Björn Höfling
2018-11-11 17:06 ` Ludovic Courtès
2018-11-12 18:50 ` Björn Höfling [this message]
2018-11-12 19:42 ` Amirouche Boubekki
2018-11-12 23:27 ` Danny Milosavljevic
2018-11-14 11:14 ` Ludovic Courtès
2018-11-16 22:31 ` Björn Höfling
2018-11-13 8:10 ` Clément Lassieur
2018-11-16 22:42 ` Björn Höfling
2018-11-12 23:31 ` Danny Milosavljevic
2018-11-13 0:04 ` Danny Milosavljevic
2018-11-14 11:11 ` Ludovic Courtès
2018-11-19 10:44 ` Danny Milosavljevic
2018-12-19 22:45 ` Amirouche Boubekki
2018-11-19 9:47 ` swedebugia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181112195044.6d64f51c@alma-ubu \
--to=bjoern.hoefling@bjoernhoefling.de \
--cc=clement@lassieur.org \
--cc=guix-devel@gnu.org \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).