From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id EGOcDqeGNF8ZUAAA0tVLHw (envelope-from ) for ; Thu, 13 Aug 2020 00:17:43 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id oCp7CqeGNF+JdAAA1q6Kng (envelope-from ) for ; Thu, 13 Aug 2020 00:17:43 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D1BCD940214 for ; Thu, 13 Aug 2020 00:17:42 +0000 (UTC) Received: from localhost ([::1]:38364 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k60w9-0008HY-Lt for larch@yhetil.org; Wed, 12 Aug 2020 20:17:41 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55184) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k60w1-0008HJ-2j for guix-devel@gnu.org; Wed, 12 Aug 2020 20:17:33 -0400 Received: from mugam.systemreboot.net ([139.59.75.54]:54108) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k60vx-0007Kd-P5 for guix-devel@gnu.org; Wed, 12 Aug 2020 20:17:32 -0400 Received: from [192.168.2.1] (helo=steel) by systemreboot.net with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1k60vc-00027J-33; Thu, 13 Aug 2020 05:47:08 +0530 From: Arun Isaac To: Pierre Neidhardt , Ricardo Wurmus Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87r1sbel4f.fsf@ambrevar.xyz> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87y2ml429i.fsf@elephly.net> <87364tgja3.fsf@ambrevar.xyz> <87y2mlf4jw.fsf@ambrevar.xyz> <87pn7x3pyw.fsf@elephly.net> <87r1sbel4f.fsf@ambrevar.xyz> Date: Thu, 13 Aug 2020 05:47:18 +0530 Message-ID: <87eeobh01d.fsf@systemreboot.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=139.59.75.54; envelope-from=arunisaac@systemreboot.net; helo=mugam.systemreboot.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/12 20:17:22 X-ACL-Warn: Detected OS = ??? X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=systemreboot.net (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -3.01 X-TUID: LsXz0QywSUUO --=-=-= Content-Type: text/plain Hi, > 1. I tried to fine-tune the SQL a bit: > - Open/close the database only once for the whole indexing. > - Use "insert" instead of "insert or replace". > - Use numeric ID as key instead of path. > > Result: Still around 15-20 minutes to build. Switching to numeric > indices shrank the database by half. sqlite insert statements can be very fast. sqlite.org claims 50000 or more insert statements per second. But in order to achieve that speed all insert statements have to be grouped together in a single transaction. See https://www.sqlite.org/faq.html#q19 > A string-contains filter takes less than 1 second. Guile's string-contains function uses a naive O(nk) implementation, where 'n' is the length of string s1 and 'k' is the length of string s2. If it was implemented using the Knuth-Morris-Pratt algorithm, it could cost only O(n+k). So, there is some scope for improvement here. In fact, a comment on line 2007 of libguile/srfi-13.c in the guile source tree makes this very point. > I need to measure the time SQL takes for a regexp match. sqlite, by default, does not come with regexp support. You might have to load some external library. See https://www.sqlite.org/lang_expr.html#the_like_glob_regexp_and_match_operators --8<---------------cut here---------------start------------->8--- The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named "regexp" is added at run-time, then the "X REGEXP Y" operator will be implemented as a call to "regexp(Y,X)". --8<---------------cut here---------------end--------------->8--- Regards, Arun --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl80ho4ACgkQLiXui2GA K7NFVQf8Da/cPF0rw3OgAjQCBbq27u5DL4lmepGw+PzoTnbH/6Iif155/8g+U9gd eSepTKZoijWsxCysNKx7B5alWptdCH4dhZyWKC9X0tpsUBj5M+yPdi+NYiwUn9Ar 99tlq+VgRbc/M0W3n9IQ6J9cJH8Aigg5k+iZLIlAiNoMdHqPg98NRYo/9n+xu2Sn yr2L0UUkVJW5ZfXbpT6Lg8E1444aYEaI0miDvXa1SnlIrTdLST6n2ReOSUIuzKNW Nti9Lg+EQ3XN5OzT/HVOSzozJ1HdFhdtt9CRenUv/DuAO1PM8UiI02yIzmRbeT0i Oueh9QtQ8/Zt9jQi9JDdqZuDJP4qoQ== =9AW8 -----END PGP SIGNATURE----- --=-=-=--