* [Cuirass] Missing database indexes? @ 2018-11-10 17:33 Ludovic Courtès 2018-11-10 20:11 ` Björn Höfling 2018-11-19 9:47 ` swedebugia 0 siblings, 2 replies; 16+ messages in thread From: Ludovic Courtès @ 2018-11-10 17:33 UTC (permalink / raw) To: Danny Milosavljevic, clement; +Cc: guix-devel Hello! I was investigating the slowness of our /api/latestbuilds requests on berlin. I found that if we have just the two indexes currently defined in ‘schema.sql’, basically everything involves a table scan: --8<---------------cut here---------------start------------->8--- sqlite> EXPLAIN QUERY PLAN select * from builds where system = "x"; 0|0|0|SCAN TABLE builds --8<---------------cut here---------------end--------------->8--- I tentatively defined new indexes that seem to help: --8<---------------cut here---------------start------------->8--- sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation); sqlite> CREATE INDEX Builds_index_status ON Builds(status); sqlite> CREATE INDEX Builds_index_system ON Builds(system, evaluation); sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12; 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?) sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and system ="x"; 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_system (system=? AND evaluation=?) sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status = 0; 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_status (status=?) --8<---------------cut here---------------end--------------->8--- Now, ‘db-get-builds’ in Cuirass uses a more complex query. In particular, it orders things, very roughly along these lines: --8<---------------cut here---------------start------------->8--- sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ; 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?) 0|0|0|USE TEMP B-TREE FOR ORDER BY --8<---------------cut here---------------end--------------->8--- I’m pretty much a database newbie so please forgive the naive question, but is there something we can do to avoid this extra B-tree step, which seems costly in space and time? <http://www.sqlite.com/matrix/eqp.html> suggests it’s just a matter of adding yet another index but I couldn’t get that. Anything else we should do? Thanks in advance! :-) Ludo’. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès @ 2018-11-10 20:11 ` Björn Höfling 2018-11-11 17:06 ` Ludovic Courtès 2018-11-19 9:47 ` swedebugia 1 sibling, 1 reply; 16+ messages in thread From: Björn Höfling @ 2018-11-10 20:11 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 1595 bytes --] On Sat, 10 Nov 2018 18:33:23 +0100 ludo@gnu.org (Ludovic Courtès) wrote: > Now, ‘db-get-builds’ in Cuirass uses a more complex query. In > particular, it orders things, very roughly along these lines: > > --8<---------------cut here---------------start------------->8--- > sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 > sqlite> and status > 0 order by stoptime ; > 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation > (evaluation=?) 0|0|0|USE TEMP B-TREE FOR ORDER BY > --8<---------------cut here---------------end--------------->8--- > > I’m pretty much a database newbie so please forgive the naive > question, but is there something we can do to avoid this extra B-tree > step, which seems costly in space and time? > <http://www.sqlite.com/matrix/eqp.html> suggests it’s just a matter > of adding yet another index but I couldn’t get that. > > Anything else we should do? The link you provided explains it: The column over which you are sorting (stoptime) is not indexed. Add it to the (same) index: --8<---------------cut here---------------start------------->8--- sqlite> DROP INDEX Builds_index_evaluation; sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation, stoptime); sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ; 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?) --8<---------------cut here---------------end--------------->8--- If there is more SQL-trouble, I can try to help out. Björn [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-10 20:11 ` Björn Höfling @ 2018-11-11 17:06 ` Ludovic Courtès 2018-11-12 18:50 ` Björn Höfling 2018-11-12 23:31 ` Danny Milosavljevic 0 siblings, 2 replies; 16+ messages in thread From: Ludovic Courtès @ 2018-11-11 17:06 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel, clement Hi Björn, Björn Höfling <bjoern.hoefling@bjoernhoefling.de> skribis: > The link you provided explains it: The column over which you are sorting > (stoptime) is not indexed. Add it to the (same) index: > > sqlite> DROP INDEX Builds_index_evaluation; > sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation, stoptime); > sqlite> EXPLAIN QUERY PLAN select * from builds where evaluation = 12 and status > 0 order by stoptime ; > 0|0|0|SEARCH TABLE builds USING INDEX Builds_index_evaluation (evaluation=?) > > If there is more SQL-trouble, I can try to help out. Indeed, that solves the problem for this simple example, thanks! Now, if I go back to the big query that /api/latestbuilds makes¹, the result is still pretty bad: --8<---------------cut here---------------start------------->8--- sqlite> EXPLAIN QUERY PLAN SELECT * FROM ( ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system, ...> Builds.nix_name, Specifications.name ...> FROM Builds ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id ...> INNER JOIN Specifications ON Evaluations.specification = Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid)) ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation)) ...> AND (:jobset IS NULL OR (:jobset = Specifications.name)) ...> AND (:job IS NULL OR (:job = Builds.job_name)) ...> AND (:system IS NULL OR (:system = Builds.system)) ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation)) ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status >= 0) ...> OR (:status = 'pending' AND Builds.status < 0) ...> OR (:status = 'succeeded' AND Builds.status = 0) ...> OR (:status = 'failed' AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) < (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) > (Builds.stoptime, Builds.rowid))) ...> ORDER BY ...> CASE WHEN :borderlowtime IS NULL ...> OR :borderlowid IS NULL THEN Builds.stoptime ...> ELSE -Builds.stoptime ...> END DESC, ...> CASE WHEN :borderlowtime IS NULL ...> OR :borderlowid IS NULL THEN Builds.rowid ...> ELSE -Builds.rowid ...> END DESC ...> LIMIT :nr) ...> ORDER BY stoptime, rowid ASC; 1|0|0|SCAN TABLE Builds 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?) 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR ORDER BY 0|0|0|SCAN SUBQUERY 1 0|0|0|USE TEMP B-TREE FOR ORDER BY --8<---------------cut here---------------end--------------->8--- I don’t really know what additional index to create (and I’d rather let SQLite do it for me, if it were possible). Thoughts? Thanks, Ludo’. ¹ https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/tree/src/cuirass/database.scm#n550 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-11 17:06 ` Ludovic Courtès @ 2018-11-12 18:50 ` Björn Höfling 2018-11-12 19:42 ` Amirouche Boubekki ` (2 more replies) 2018-11-12 23:31 ` Danny Milosavljevic 1 sibling, 3 replies; 16+ messages in thread From: Björn Höfling @ 2018-11-12 18:50 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 5869 bytes --] Hi Ludo, On Sun, 11 Nov 2018 18:06:00 +0100 ludo@gnu.org (Ludovic Courtès) wrote: > Indeed, that solves the problem for this simple example, thanks! > > Now, if I go back to the big query that /api/latestbuilds makes¹, the > result is still pretty bad: > > --8<---------------cut here---------------start------------->8--- > sqlite> EXPLAIN QUERY PLAN SELECT * FROM ( > ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status, > Builds.job_name, Builds.system, ...> Builds.nix_name, > Specifications.name ...> FROM Builds > ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id > ...> INNER JOIN Specifications ON Evaluations.specification = > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid)) > ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation)) > ...> AND (:jobset IS NULL OR (:jobset = Specifications.name)) > ...> AND (:job IS NULL OR (:job = Builds.job_name)) > ...> AND (:system IS NULL OR (:system = Builds.system)) > ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation)) > ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status > >= 0) ...> OR (:status = 'pending' AND > >Builds.status < 0) > ...> OR (:status = 'succeeded' AND > Builds.status = 0) ...> OR (:status = 'failed' > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL > OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) < > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL > OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) > > (Builds.stoptime, Builds.rowid))) ...> ORDER BY > ...> CASE WHEN :borderlowtime IS NULL > ...> OR :borderlowid IS NULL THEN Builds.stoptime > ...> ELSE -Builds.stoptime > ...> END DESC, > ...> CASE WHEN :borderlowtime IS NULL > ...> OR :borderlowid IS NULL THEN Builds.rowid > ...> ELSE -Builds.rowid > ...> END DESC > ...> LIMIT :nr) > ...> ORDER BY stoptime, rowid ASC; > 1|0|0|SCAN TABLE Builds > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?) > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR > ORDER BY 0|0|0|SCAN SUBQUERY 1 > 0|0|0|USE TEMP B-TREE FOR ORDER BY > --8<---------------cut here---------------end--------------->8--- > > I don’t really know what additional index to create (and I’d rather > let SQLite do it for me, if it were possible). I don't know if there is any automated process to assist you. I have the feeling that query optimization is more an art than science. Hm. This code smells ... It looks too complicated. I don't know if this brings you further concerning performance, here are some thoughts: One problematic part is this construct: :variable IS NULL OR :variable=my_column) Here is a very simple example: sqlite> CREATE TABLE tst ( ...> id INTEGER PRIMARY KEY AUTOINCREMENT, ...> name TEXT NOT NULL, ...> age INTEGER NOT NULL); sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age); sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE (23=23 OR id=:id); 0|0|0|SCAN TABLE tst sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE id=:id; 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) sqlite> EXPLAIN QUERY PLAN ...> SELECT * FROM tst WHERE name=:name AND age < 42; 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND age<?) So, even when we have a constant part(23=23) in the OR clause, this leads to a full table scan. I think the optimizer cannot detect the fact that it is a constant boolean value. In the other examples, it is using the index. Even this OR-clause with two variables looks better: SELECT * FROM tst WHERE (id=:id1 OR id=:id2); 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 I double-checked with Postgresql and it is also performing a full table scan in the "boolean-constant OR :id=id" case. I could not find any references on the net about it. When this would be Java/JPA I would suggest to dynamically create the query. Can we do something in Scheme-DB too? I.e. pseudo-code (string-append sql-prefix (unless (empty? derivation) "AND :derivation=Builds.derivation") (unless (empty? jobset) "AND :jobset=Builds.jobset) ...) ;;; Should be more some kind of folding, because of the "AND" ;;; Parameter-filling needs to be considered too Two more things I noticed that are not directly performance oriented: We are directly relying on the rowid here, there is no explicit id-column. This could lead to unpredicted results and reorderings (6th Quirk in document): https://www.sqlite.org/rowidtable.html We should add a column: id INTEGER PRIMARY KEY AUTOINCREMENT Problem is that this concept of AUTOINCREMENT does only work for Primary Keys in Sqlite. So we need to degrade "derivation" to a secondary key, i.e. make it non-null and unique: derivation TEXT NOT NULL UNIQUE, Is there anything speaking against that? Lastly, the query has a limit and an order-by. The question is: Will the result be first ordered and then the limit taken? The answer (I know only for Postgresql and MySql, but I think it is the same for Sqlite, I haven't found any reference): The order is always executed first, but it has to be stable. In this case it is, because we order by Builds.rowid, which is a key. Did this happen intentionally or just by chance? Should we better add a note about that to the SQL code? Björn [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 18:50 ` Björn Höfling @ 2018-11-12 19:42 ` Amirouche Boubekki 2018-11-12 23:27 ` Danny Milosavljevic 2018-11-13 8:10 ` Clément Lassieur 2 siblings, 0 replies; 16+ messages in thread From: Amirouche Boubekki @ 2018-11-12 19:42 UTC (permalink / raw) To: bjoern.hoefling; +Cc: guix-devel, Clément Lassieur [-- Attachment #1: Type: text/plain, Size: 7181 bytes --] Hello all, Le lun. 12 nov. 2018 à 19:51, Björn Höfling < bjoern.hoefling@bjoernhoefling.de> a écrit : > Hi Ludo, > > On Sun, 11 Nov 2018 18:06:00 +0100 > ludo@gnu.org (Ludovic Courtès) wrote: > > > Indeed, that solves the problem for this simple example, thanks! > > > > Now, if I go back to the big query that /api/latestbuilds makes¹, the > > result is still pretty bad: > > > > --8<---------------cut here---------------start------------->8--- > > sqlite> EXPLAIN QUERY PLAN SELECT * FROM ( > > ...> SELECT Builds.derivation, Builds.rowid, Builds.timestamp, > > Builds.starttime, ...> Builds.stoptime, Builds.log, Builds.status, > > Builds.job_name, Builds.system, ...> Builds.nix_name, > > Specifications.name ...> FROM Builds > > ...> INNER JOIN Evaluations ON Builds.evaluation = Evaluations.id > > ...> INNER JOIN Specifications ON Evaluations.specification = > > Specifications.name ...> WHERE (:id IS NULL OR (:id = Builds.rowid)) > > ...> AND (:derivation IS NULL OR (:derivation = Builds.derivation)) > > ...> AND (:jobset IS NULL OR (:jobset = Specifications.name)) > > ...> AND (:job IS NULL OR (:job = Builds.job_name)) > > ...> AND (:system IS NULL OR (:system = Builds.system)) > > ...> AND (:evaluation IS NULL OR (:evaluation = Builds.evaluation)) > > ...> AND (:status IS NULL OR (:status = 'done' AND Builds.status > > >= 0) ...> OR (:status = 'pending' AND > > >Builds.status < 0) > > ...> OR (:status = 'succeeded' AND > > Builds.status = 0) ...> OR (:status = 'failed' > > AND Builds.status > 0)) ...> AND (:borderlowtime IS NULL > > OR :borderlowid IS NULL ...> OR ((:borderlowtime, :borderlowid) < > > (Builds.stoptime, Builds.rowid))) ...> AND (:borderhightime IS NULL > > OR :borderhighid IS NULL ...> OR ((:borderhightime, :borderhighid) > > > (Builds.stoptime, Builds.rowid))) ...> ORDER BY > > ...> CASE WHEN :borderlowtime IS NULL > > ...> OR :borderlowid IS NULL THEN Builds.stoptime > > ...> ELSE -Builds.stoptime > > ...> END DESC, > > ...> CASE WHEN :borderlowtime IS NULL > > ...> OR :borderlowid IS NULL THEN Builds.rowid > > ...> ELSE -Builds.rowid > > ...> END DESC > > ...> LIMIT :nr) > > ...> ORDER BY stoptime, rowid ASC; > > 1|0|0|SCAN TABLE Builds > > 1|1|1|SEARCH TABLE Evaluations USING INTEGER PRIMARY KEY (rowid=?) > > 1|2|2|SEARCH TABLE Specifications USING COVERING INDEX > > sqlite_autoindex_Specifications_1 (name=?) 1|0|0|USE TEMP B-TREE FOR > > ORDER BY 0|0|0|SCAN SUBQUERY 1 > > 0|0|0|USE TEMP B-TREE FOR ORDER BY > > --8<---------------cut here---------------end--------------->8--- > > > > I don’t really know what additional index to create (and I’d rather > > let SQLite do it for me, if it were possible). > > I don't know if there is any automated process to assist you. I have > the feeling that query optimization is more an art than science. > > Hm. This code smells ... It looks too complicated. > Agreed. > > I don't know if this brings you further concerning performance, here > are some thoughts: > > One problematic part is this construct: > > :variable IS NULL OR :variable=my_column) > > Here is a very simple example: > > sqlite> CREATE TABLE tst ( > ...> id INTEGER PRIMARY KEY AUTOINCREMENT, > ...> name TEXT NOT NULL, > ...> age INTEGER NOT NULL); > sqlite> CREATE INDEX tst_name_age_idx ON tst(name, age); > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE (23=23 OR id=:id); > 0|0|0|SCAN TABLE tst > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE id=:id; > 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) > > sqlite> EXPLAIN QUERY PLAN > ...> SELECT * FROM tst WHERE name=:name AND age < 42; > 0|0|0|SEARCH TABLE tst USING COVERING INDEX tst_name_age_idx (name=? AND > age<?) > > So, even when we have a constant part(23=23) in the OR clause, this > leads to a full table scan. I think the optimizer cannot detect the > fact that it is a constant boolean value. In the other examples, it is > using the index. > > Even this OR-clause with two variables looks better: > > SELECT * FROM tst WHERE (id=:id1 OR id=:id2); > 0|0|0|SEARCH TABLE tst USING INTEGER PRIMARY KEY (rowid=?) > 0|0|0|EXECUTE LIST SUBQUERY 1 > > I double-checked with Postgresql and it is also performing a full table > scan in the "boolean-constant OR :id=id" case. I could not find any > references on the net about it. > > When this would be Java/JPA I would suggest to dynamically create the > query. Can we do something in Scheme-DB too? I.e. pseudo-code > > (string-append sql-prefix > (unless (empty? derivation) "AND :derivation=Builds.derivation") > (unless (empty? jobset) "AND :jobset=Builds.jobset) > ...) > ;;; Should be more some kind of folding, because of the "AND" > > ;;; Parameter-filling needs to be considered too > > > Two more things I noticed that are not directly performance oriented: > > We are directly relying on the rowid here, there is no explicit > id-column. > > This could lead to unpredicted results and reorderings (6th Quirk in > document): > > https://www.sqlite.org/rowidtable.html > > We should add a column: > > id INTEGER PRIMARY KEY AUTOINCREMENT > > Problem is that this concept of AUTOINCREMENT does only work for > Primary Keys in Sqlite. So we need to degrade "derivation" to a > secondary key, i.e. make it non-null and unique: > > derivation TEXT NOT NULL UNIQUE, > > Is there anything speaking against that? > > > Lastly, the query has a limit and an order-by. The question is: Will > the result be first ordered and then the limit taken? The answer (I know > only for Postgresql and MySql, but I think it is the same for Sqlite, > I haven't found any reference): The order is always executed first, but > it has to be stable. In this case it is, because we order by > Builds.rowid, which is a key. Did this happen intentionally or just by > chance? Should we better add a note about that to the SQL code? > > Björn > What Björn said. I might be wrong but if :derivation :jobset :job :system and :evalutation are exculsive ie. if one is set the other are null. They could all inherit from a "Buildable" class in Object-Oriented terms. Then you'd rather use a Generic Foreign Key pattern where you have two columns 'object_type' and 'object_id' where 'object_type' is one of the "object" type that is buildable and 'object_id' is the identifier of the row in the table named by 'object_type'. Otherwise you can try narrow the search to a given 'stoptime' or 'starttime' slice and index those columns. Something like: SELECT Builds.derivation, Builds.rowid, Builds.timestamp, Builds.starttime, Builds.stoptime, Builds.log, Builds.status, Builds.job_name, Builds.system, Builds.nix_name, Specifications.name FROM Builds INNER JOIN ... INNER JOIN ... WHERE Builds.stoptime > yesterday AND Builds.stoptime < now(); HTH [-- Attachment #2: Type: text/html, Size: 9309 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 18:50 ` Björn Höfling 2018-11-12 19:42 ` Amirouche Boubekki @ 2018-11-12 23:27 ` Danny Milosavljevic 2018-11-14 11:14 ` Ludovic Courtès 2018-11-16 22:31 ` Björn Höfling 2018-11-13 8:10 ` Clément Lassieur 2 siblings, 2 replies; 16+ messages in thread From: Danny Milosavljevic @ 2018-11-12 23:27 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 3358 bytes --] Hi Björn, On Mon, 12 Nov 2018 19:50:44 +0100 Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote: > Hm. This code smells ... It looks too complicated. I was trying to cut down the number of prepared statements in use and prevent a combinatorial explosion while keeping the kinds of queries we can do open. Either the value of a parameter is specified, in which case the associated column is filtered for it; or it isn't, then it's not. > So, even when we have a constant part(23=23) in the OR clause, this > leads to a full table scan. I think the optimizer cannot detect the > fact that it is a constant boolean value. Sounds like an easy fix in sqlite. Could you report this upstream? > I double-checked with Postgresql and it is also performing a full table > scan in the "boolean-constant OR :id=id" case. I could not find any > references on the net about it. Something easy to try would be to use the row values in sqlite instead. See also https://www.sqlite.org/rowvalue.html See also https://lists.gnu.org/archive/html/guix-devel/2018-07/msg00101.html > When this would be Java/JPA I would suggest to dynamically create the > query. Can we do something in Scheme-DB too? I.e. pseudo-code (1) Combinatorial explosion of the number of prepared statements (2) Slow to parse all those SQL statements But if we don't use all combinations in practise then it's not so bad and we could generate those statements after all. It's still a workaround if we have to do that. Then we'd have to make sure that the user can't specify arbitrary combinations and/or limit the number of prepared statements that exist at the same time. Cuirass is a web-facing service so these are not theoretical problems - someone *will* break it (maybe on purpose) if it's possible. All in all, generating these SQL statements is like generating a program source code and recompiling it every time you want to change the filter used. It might be the only good solution in this case (maybe) - but in general it's an antipattern. > We should add a column: > > id INTEGER PRIMARY KEY AUTOINCREMENT > > Problem is that this concept of AUTOINCREMENT does only work for > Primary Keys in Sqlite. So we need to degrade "derivation" to a > secondary key, i.e. make it non-null and unique: > > derivation TEXT NOT NULL UNIQUE, > > Is there anything speaking against that? Sounds good. Note that when you use autoincrement, you eventually have to handle the case when the value overflows. The window of used IDs slowly creeps up over the months. > Lastly, the query has a limit and an order-by. The question is: Will > the result be first ordered and then the limit taken? The answer (I know > only for Postgresql and MySql, but I think it is the same for Sqlite, I don't think the documentation at https://www.sqlite.org/lang_select.html specifies, but it's the only thing that makes sense. Otherwise LIMIT would be useless as a pager. Still would be good to have official confirmation by the sqlite authors. > I haven't found any reference): The order is always executed first, but > it has to be stable. In this case it is, because we order by > Builds.rowid, which is a key. >Did this happen intentionally Of course. Otherwise LIMIT would be useless as a pager. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 23:27 ` Danny Milosavljevic @ 2018-11-14 11:14 ` Ludovic Courtès 2018-11-16 22:31 ` Björn Höfling 1 sibling, 0 replies; 16+ messages in thread From: Ludovic Courtès @ 2018-11-14 11:14 UTC (permalink / raw) To: Danny Milosavljevic; +Cc: guix-devel, clement Hello, Danny Milosavljevic <dannym@scratchpost.org> skribis: > On Mon, 12 Nov 2018 19:50:44 +0100 > Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote: > >> Hm. This code smells ... It looks too complicated. > > I was trying to cut down the number of prepared statements in use and prevent a > combinatorial explosion while keeping the kinds of queries we can do open. I understand both arguments. I wonder if there’s a way to conciliate these two issues. What do you both think? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 23:27 ` Danny Milosavljevic 2018-11-14 11:14 ` Ludovic Courtès @ 2018-11-16 22:31 ` Björn Höfling 1 sibling, 0 replies; 16+ messages in thread From: Björn Höfling @ 2018-11-16 22:31 UTC (permalink / raw) To: Danny Milosavljevic; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 2379 bytes --] On Tue, 13 Nov 2018 00:27:15 +0100 Danny Milosavljevic <dannym@scratchpost.org> wrote: > Hi Björn, > > On Mon, 12 Nov 2018 19:50:44 +0100 > Björn Höfling <bjoern.hoefling@bjoernhoefling.de> wrote: > > > Hm. This code smells ... It looks too complicated. > > I was trying to cut down the number of prepared statements in use and > prevent a combinatorial explosion while keeping the kinds of queries > we can do open. > > Either the value of a parameter is specified, in which case the > associated column is filtered for it; or it isn't, then it's not. > > > So, even when we have a constant part(23=23) in the OR clause, this > > leads to a full table scan. I think the optimizer cannot detect the > > fact that it is a constant boolean value. > > Sounds like an easy fix in sqlite. Could you report this upstream? I thought about these things again, with the help of stackoverflow and this was really silly: If the left part of the clause (:jobset IS NULL OR (:jobset = Specifications.name)) is constant, null, then the whole clause is true. And if that is the case for all, then everything is true, of cause resulting in a full table scan. That is it I think what the query optimizer is thinking of here. I guess it will be better if there are some non-null and some null clauses mixed. I have the full DB here now, I will look into it the next days. > > I double-checked with Postgresql and it is also performing a full > > table scan in the "boolean-constant OR :id=id" case. I could not > > find any references on the net about it. > > Something easy to try would be to use the row values in sqlite > instead. > > See also https://www.sqlite.org/rowvalue.html > See also > https://lists.gnu.org/archive/html/guix-devel/2018-07/msg00101.html That would be another idea. > > When this would be Java/JPA I would suggest to dynamically create > > the query. Can we do something in Scheme-DB too? I.e. pseudo-code > > (1) Combinatorial explosion of the number of prepared statements > (2) Slow to parse all those SQL statements > > But if we don't use all combinations in practise then it's not so bad > and we could generate those statements after all. It's still a > workaround if we have to do that. You are right. That sounds reasonable. It just looks ugly to read. Björn [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 18:50 ` Björn Höfling 2018-11-12 19:42 ` Amirouche Boubekki 2018-11-12 23:27 ` Danny Milosavljevic @ 2018-11-13 8:10 ` Clément Lassieur 2018-11-16 22:42 ` Björn Höfling 2 siblings, 1 reply; 16+ messages in thread From: Clément Lassieur @ 2018-11-13 8:10 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel Hi Björn, Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes: > We are directly relying on the rowid here, there is no explicit > id-column. > > This could lead to unpredicted results and reorderings (6th Quirk in > document): > > https://www.sqlite.org/rowidtable.html > > We should add a column: > > id INTEGER PRIMARY KEY AUTOINCREMENT > > Problem is that this concept of AUTOINCREMENT does only work for > Primary Keys in Sqlite. So we need to degrade "derivation" to a > secondary key, i.e. make it non-null and unique: > > derivation TEXT NOT NULL UNIQUE, > > Is there anything speaking against that? We only use that rowid to display a number at the left of every 'build' row. I think it would make more sense to use the derivation name where we currently use the rowid. It would also be more understandable for the users. We don't even need rowid for sorting because we can sort with the timestamps. The only issue is that we get further from hydra, but we are already pretty far away anyway. Clément ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-13 8:10 ` Clément Lassieur @ 2018-11-16 22:42 ` Björn Höfling 0 siblings, 0 replies; 16+ messages in thread From: Björn Höfling @ 2018-11-16 22:42 UTC (permalink / raw) To: Clément Lassieur; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1491 bytes --] On Tue, 13 Nov 2018 09:10:30 +0100 Clément Lassieur <clement@lassieur.org> wrote: > Hi Björn, > > Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes: > > > We are directly relying on the rowid here, there is no explicit > > id-column. > > > > This could lead to unpredicted results and reorderings (6th Quirk in > > document): > > > > https://www.sqlite.org/rowidtable.html > > > > We should add a column: > > > > id INTEGER PRIMARY KEY AUTOINCREMENT > > > > Problem is that this concept of AUTOINCREMENT does only work for > > Primary Keys in Sqlite. So we need to degrade "derivation" to a > > secondary key, i.e. make it non-null and unique: > > > > derivation TEXT NOT NULL UNIQUE, > > > > Is there anything speaking against that? > > We only use that rowid to display a number at the left of every > 'build' row. I think it would make more sense to use the derivation > name where we currently use the rowid. It would also be more > understandable for the users. > > We don't even need rowid for sorting because we can sort with the > timestamps. > > The only issue is that we get further from hydra, but we are already > pretty far away anyway. I'm undecided about this, I don't understand enough of it yet. In some sense I would prefer a numeric id to refer to. On the other hand the derivation is the actual key, and if that is generated a second time, the build is the same and will not be executed again. Björn [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-11 17:06 ` Ludovic Courtès 2018-11-12 18:50 ` Björn Höfling @ 2018-11-12 23:31 ` Danny Milosavljevic 2018-11-13 0:04 ` Danny Milosavljevic 2018-11-14 11:11 ` Ludovic Courtès 1 sibling, 2 replies; 16+ messages in thread From: Danny Milosavljevic @ 2018-11-12 23:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 622 bytes --] Hi Ludo, On Sun, 11 Nov 2018 18:06:00 +0100 ludo@gnu.org (Ludovic Courtès) wrote: > I don’t really know what additional index to create (and I’d rather let > SQLite do it for me, if it were possible). Not exactly what you mean but there's this: https://www.sqlite.org/lang_analyze.html It does statistical analysis of queries that ran and will optimize for that case on subsequent connections. Other than that, I always create indices by hand since it's a one-time effort anyway - so just searching for a solution to automate it takes longer than just adding an index for a column I match on. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 23:31 ` Danny Milosavljevic @ 2018-11-13 0:04 ` Danny Milosavljevic 2018-11-14 11:11 ` Ludovic Courtès 1 sibling, 0 replies; 16+ messages in thread From: Danny Milosavljevic @ 2018-11-13 0:04 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 812 bytes --] On Tue, 13 Nov 2018 00:31:40 +0100 Danny Milosavljevic <dannym@scratchpost.org> wrote: > Hi Ludo, > > On Sun, 11 Nov 2018 18:06:00 +0100 > ludo@gnu.org (Ludovic Courtès) wrote: > > > I don’t really know what additional index to create (and I’d rather let > > SQLite do it for me, if it were possible). > > Not exactly what you mean but there's this: > > https://www.sqlite.org/lang_analyze.html > > It does statistical analysis of queries that ran and will optimize for > that case on subsequent connections. > > Other than that, I always create indices by hand since it's a one-time > effort anyway - so just searching for a solution to automate it takes > longer than just adding an index for a column I match on. Also https://www.sqlite.org/optoverview.html#autoindex [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-12 23:31 ` Danny Milosavljevic 2018-11-13 0:04 ` Danny Milosavljevic @ 2018-11-14 11:11 ` Ludovic Courtès 2018-11-19 10:44 ` Danny Milosavljevic 1 sibling, 1 reply; 16+ messages in thread From: Ludovic Courtès @ 2018-11-14 11:11 UTC (permalink / raw) To: Danny Milosavljevic; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 623 bytes --] Hi, Danny Milosavljevic <dannym@scratchpost.org> skribis: > On Sun, 11 Nov 2018 18:06:00 +0100 > ludo@gnu.org (Ludovic Courtès) wrote: > >> I don’t really know what additional index to create (and I’d rather let >> SQLite do it for me, if it were possible). > > Not exactly what you mean but there's this: > > https://www.sqlite.org/lang_analyze.html > > It does statistical analysis of queries that ran and will optimize for > that case on subsequent connections. That’s close to what I was hoping for. We could do “PRAGMA optimize” before closing the database session as they suggest: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 634 bytes --] diff --git a/src/cuirass/database.scm b/src/cuirass/database.scm index 8b83c18..fb037d1 100644 --- a/src/cuirass/database.scm +++ b/src/cuirass/database.scm @@ -403,7 +403,9 @@ a critical section that allows database operations to be serialized." ;; be costly and may defeat statement caching. (parameterize ((%db-channel (make-critical-section db))) body ...) - (db-close db)))) + (begin + (sqlite-exec db "PRAGMA optimize;") + (db-close db))))) (define* (read-quoted-string #:optional (port (current-input-port))) "Read all of the characters out of PORT and return them as a SQL quoted [-- Attachment #3: Type: text/plain, Size: 1477 bytes --] … though I think we never really close it properly because Cuirass runs “forever.” I tried doing this in a Guile session on berlin: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(cuirass database) scheme@(guile-user)> (%package-database "/var/lib/cuirass/cuirass.db") $6 = "/gnu/store/g1q2lv75a2fibii4y52fndz5zpbmyl12-cuirass-0.0.1-21.0b40dca/var/lib/cuirass/cuirass.db" scheme@(guile-user)> (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;"))) $7 = () scheme@(guile-user)> (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;"))) $8 = () scheme@(guile-user)> ,time (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;"))) $9 = () ;; 13.215291s real time, 13.229189s run time. 0.016093s spent in GC. scheme@(guile-user)> ,time (with-database (db-get-builds `((nr . 200)(order . finish-time)(status . done))) (with-db-critical-section db (sqlite-exec db "PRAGMA optimize;"))) $10 = () ;; 13.204621s real time, 13.230655s run time. 0.029333s spent in GC. --8<---------------cut here---------------end--------------->8--- It doesn’t seem to help much, perhaps because the query is too complex? Thoughts? Ludo’. ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-14 11:11 ` Ludovic Courtès @ 2018-11-19 10:44 ` Danny Milosavljevic 2018-12-19 22:45 ` Amirouche Boubekki 0 siblings, 1 reply; 16+ messages in thread From: Danny Milosavljevic @ 2018-11-19 10:44 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, clement [-- Attachment #1: Type: text/plain, Size: 276 bytes --] Hi Ludo, >It doesn’t seem to help much, perhaps because the query is too complex? Yeah, probably. According to the docs a log message is supposed to appear when it is doing it. We should just special-case the common queries so the optimizer has a easier life. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-19 10:44 ` Danny Milosavljevic @ 2018-12-19 22:45 ` Amirouche Boubekki 0 siblings, 0 replies; 16+ messages in thread From: Amirouche Boubekki @ 2018-12-19 22:45 UTC (permalink / raw) To: Danny Milosavljevic; +Cc: guix-devel, Clément Lassieur [-- Attachment #1: Type: text/plain, Size: 427 bytes --] Did you consider using wiredtiger? Le lun. 19 nov. 2018 à 11:47, Danny Milosavljevic <dannym@scratchpost.org> a écrit : > Hi Ludo, > > >It doesn’t seem to help much, perhaps because the query is too complex? > > Yeah, probably. > > According to the docs a log message is supposed to appear when it is doing > it. > > We should just special-case the common queries so the optimizer has a > easier life. > [-- Attachment #2: Type: text/html, Size: 713 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Cuirass] Missing database indexes? 2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès 2018-11-10 20:11 ` Björn Höfling @ 2018-11-19 9:47 ` swedebugia 1 sibling, 0 replies; 16+ messages in thread From: swedebugia @ 2018-11-19 9:47 UTC (permalink / raw) To: guix-devel Hi On 2018-11-10 18:33, Ludovic Courtès wrote: snip > --8<---------------cut here---------------start------------->8--- > sqlite> CREATE INDEX Builds_index_evaluation ON Builds(evaluation); snip > Anything else we should do? I woke up today with an idea. :) Now on berlin it seems that all jobset are in one enormous table, correct? How about creating a database/sqlite file for every specification and a table for every jobset? This would also solve the problem with overrunning id. (there will never be more ids than jobs in a single jobset in a table) This means cuirass only work on one database per specification (now there are 5 specs on berlin) Every jobset having its own table/file will lower the total number of rows, thus lower the computing cost of a table scan. -- Cheers Swedebugia ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-12-19 22:45 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-11-10 17:33 [Cuirass] Missing database indexes? Ludovic Courtès 2018-11-10 20:11 ` Björn Höfling 2018-11-11 17:06 ` Ludovic Courtès 2018-11-12 18:50 ` Björn Höfling 2018-11-12 19:42 ` Amirouche Boubekki 2018-11-12 23:27 ` Danny Milosavljevic 2018-11-14 11:14 ` Ludovic Courtès 2018-11-16 22:31 ` Björn Höfling 2018-11-13 8:10 ` Clément Lassieur 2018-11-16 22:42 ` Björn Höfling 2018-11-12 23:31 ` Danny Milosavljevic 2018-11-13 0:04 ` Danny Milosavljevic 2018-11-14 11:11 ` Ludovic Courtès 2018-11-19 10:44 ` Danny Milosavljevic 2018-12-19 22:45 ` Amirouche Boubekki 2018-11-19 9:47 ` swedebugia
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.