Hello all, Le lun. 12 nov. 2018 à 19:51, Björn Höfling < bjoern.hoefling@bjoernhoefling.de> a écrit : > Hi Ludo, > > On Sun, 11 Nov 2018 18:06:00 +0100 > ludo@gnu.org (Ludovic Courtès) wrote: > > > Indeed, that solves the problem for this simple example, thanks! > > > > Now, if I go back to the big query that /api/latestbuilds makes¹, the > > result is still pretty bad: > > > > --8<---------------cut here---------------start------------->8--- > > sqlite> EXPLAIN QUERY PLAN SELECT * FROM ( > > ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, > > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status, > > Builds.job_name, Builds.system, ...> Builds.nix_name, > > Specifications.name ...> FROM Builds > > ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id > > ...> INNER JOIN Specifications ON Evaluations.specification = > > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid)) > > ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation)) > > ...> AND (:jobset IS NULL OR (:jobset = Specifications.name)) > > ...> AND (:job IS NULL OR (:job = Builds.job_name)) > > ...> AND (:system IS NULL OR (:system = Builds.system)) > > ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation)) > > ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status > > >= 0) ...> OR (:status = 'pending' AND > > >Builds.status < 0) > > ...> OR (:status = 'succeeded' AND > > Builds.status = 0) ...> OR (:status = 'failed' > > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL > > OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) < > > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL > > OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) > > > (Builds.stoptime, Builds.rowid))) ...> ORDER BY > > ...> CASE WHEN :borderlowtime IS NULL > > ...> OR :borderlowid IS NULL THEN Builds.stoptime > > ...> ELSE -Builds.stoptime > > ...> END DESC, > > ...> CASE WHEN :borderlowtime IS NULL > > ...> OR :borderlowid IS NULL THEN Builds.rowid > > ...> ELSE -Builds.rowid > > ...> END DESC > > ...> LIMIT :nr) > > ...> ORDER BY stoptime, rowid ASC; > > 1|0|0|SCAN TABLE Builds > > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?) > > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX > > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR > > ORDER BY 0|0|0|SCAN SUBQUERY 1 > > 0|0|0|USE TEMP B-TREE FOR ORDER BY > > --8<---------------cut here---------------end--------------->8--- > > > > I don’t really know what additional index to create (and I’d rather > > let SQLite do it for me, if it were possible). > > I don't know if there is any automated process to assist you. I have > the feeling that query optimization is more an art than science. > > Hm. This code smells ... It looks too complicated. > Agreed. > > I don't know if this brings you further concerning performance, here > are some thoughts: > > One problematic part is this construct: > > :variable IS NULL OR :variable=my_column) > > Here is a very simple example: > > sqlite> CREATE TABLE tst ( > ...> id INTEGER PRIMARY KEY AUTOINCREMENT, > ...> name TEXT NOT NULL, > ...> age INTEGER NOT NULL); > sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age); > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE (23=23 OR id=:id); > 0|0|0|SCAN TABLE tst > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE id=:id; > 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE name=:name AND age < 42; > 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND > age > So, even when we have a constant part(23=23) in the OR clause, this > leads to a full table scan. I think the optimizer cannot detect the > fact that it is a constant boolean value. In the other examples, it is > using the index. > > Even this OR-clause with two variables looks better: > > SELECT * FROM tst WHERE (id=:id1 OR id=:id2); > 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) > 0|0|0|EXECUTE LIST SUBQUERY 1 > > I double-checked with Postgresql and it is also performing a full table > scan in the "boolean-constant OR :id=id" case. I could not find any > references on the net about it. > > When this would be Java/JPA I would suggest to dynamically create the > query. Can we do something in Scheme-DB too? I.e. pseudo-code > > (string-append sql-prefix > (unless (empty? derivation) "AND :derivation=Builds.derivation") > (unless (empty? jobset) "AND :jobset=Builds.jobset) > ...) > ;;; Should be more some kind of folding, because of the "AND" > > ;;; Parameter-filling needs to be considered too > > > Two more things I noticed that are not directly performance oriented: > > We are directly relying on the rowid here, there is no explicit > id-column. > > This could lead to unpredicted results and reorderings (6th Quirk in > document): > > https://www.sqlite.org/rowidtable.html > > We should add a column: > > id INTEGER PRIMARY KEY AUTOINCREMENT > > Problem is that this concept of AUTOINCREMENT does only work for > Primary Keys in Sqlite. So we need to degrade "derivation" to a > secondary key, i.e. make it non-null and unique: > > derivation TEXT NOT NULL UNIQUE, > > Is there anything speaking against that? > > > Lastly, the query has a limit and an order-by. The question is: Will > the result be first ordered and then the limit taken? The answer (I know > only for Postgresql and MySql, but I think it is the same for Sqlite, > I haven't found any reference): The order is always executed first, but > it has to be stable. In this case it is, because we order by > Builds.rowid, which is a key. Did this happen intentionally or just by > chance? Should we better add a note about that to the SQL code? > > Björn > What Björn said. I might be wrong but if :derivation :jobset :job :system and :evalutation are exculsive ie. if one is set the other are null. They could all inherit from a "Buildable" class in Object-Oriented terms. Then you'd rather use a Generic Foreign Key pattern where you have two columns 'object_type' and 'object_id' where 'object_type' is one of the "object" type that is buildable and 'object_id' is the identifier of the row in the table named by 'object_type'. Otherwise you can try narrow the search to a given 'stoptime' or 'starttime' slice and index those columns. Something like: SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime, Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system, Builds.nix_name, Specifications.name FROM Builds INNER JOIN ... INNER JOIN ... WHERE Builds.stoptime > yesterday AND Builds.stoptime < now(); HTH