From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id OImkF/E8hF+dawAA0tVLHw (envelope-from ) for ; Mon, 12 Oct 2020 11:24:33 +0000 Received: from aspmx2.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 4BF4E/E8hF+ZKgAAB5/wlQ (envelope-from ) for ; Mon, 12 Oct 2020 11:24:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx2.migadu.com (Postfix) with ESMTPS id E9EA468029D for ; Mon, 12 Oct 2020 11:24:31 +0000 (UTC) Received: from localhost ([::1]:43046 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRvwM-0007Ny-Rg for larch@yhetil.org; Mon, 12 Oct 2020 07:24:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37970) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRvvD-0006b2-Em for guix-devel@gnu.org; Mon, 12 Oct 2020 07:23:19 -0400 Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]:39841) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kRvvB-0005Zx-Kg; Mon, 12 Oct 2020 07:23:19 -0400 Received: by mail-wr1-x430.google.com with SMTP id y12so13373229wrp.6; Mon, 12 Oct 2020 04:23:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=Gnq6nKqe59jNak1+ob7wT8ql2tFF4yb3b2C+TjnfCH4=; b=alfcfn3ZIKoiFW7eMea0OhJfUXEcPuvEXwAFs0K9SVgANmhzIgfoSlEvNvW/ulc5br 8f26YMEFgpv6ZI1JkpB/7afp3TNRi4ZTilTYVAQKim2J3F5VY13uE/XxXnWp0lKw+mdE gS1mOOHdsyzzWKnN1SJ80LQk66l1qApXY28/HjWBk29wDo7mM5w4ycPTp9xMYHSWpgLR kRG1k0m5SQXSCCGeH5tRhCTt6YNxrlIgUkOYxQM8gJEsG9+6msdOKAqVGzr3mLRxaxDV /dk5+jhfHNmS4pCJ8bAqa3jkiiuxKbW2fSl0WOJnC4xhCfpRtyAp3WO0gdS6vSsBrCwj SnyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=Gnq6nKqe59jNak1+ob7wT8ql2tFF4yb3b2C+TjnfCH4=; b=XwAFtGScGuLBCzBE58Y2U78iYxAGfsyeNCuPJYog75U9aVjLylmxboi8ltqIL9+ZOz j9Ij6wJu4Ch/DvRyHb9pEAQor3LRjIUxwkxdS6w2g1rutNtlrrRj6uScWeKHv2Hil2Cr CE8FFkReQHx/cBp10aSMzOYC6FFUI70Lj/7g3u9hpROgqrFKquM+4XIjKiaTE/IMIUWC XBu2FAbKn7BDiqymxv2cL+bTRvPUY2DGpvYOCJj+VwsK9gmrn3cdrfYTgGn+qMEI0ySJ U7J3RYULNOOd8sF6ricTwTnQP6cfOMd3X4tZyMmvN+JBLGDdVRV4RuQOCswgmzKv3YMs MKmg== X-Gm-Message-State: AOAM53009QnQtD6vJsGK6WRCe5REWzPfmDlaNjnfRy4bPrcleVb3dXWi yHNkeDAeTRcIW66y7JYyM8vrMme2PMc= X-Google-Smtp-Source: ABdhPJydYZGxfxNue6jQdjxLINYQkflBVd9JmRYPuNI9f9ky8Gb1sCcflzDvRb7/1zaug9subTDzHQ== X-Received: by 2002:adf:cf0c:: with SMTP id o12mr28570496wrj.287.1602501795020; Mon, 12 Oct 2020 04:23:15 -0700 (PDT) Received: from lili ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id y7sm23299004wmg.40.2020.10.12.04.23.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Oct 2020 04:23:14 -0700 (PDT) From: zimoun To: Ludovic =?utf-8?Q?Court=C3=A8s?= , Pierre Neidhardt Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87eem3u4n8.fsf@gnu.org> References: <87sgcuh8rb.fsf@ambrevar.xyz> <86imd4e7cr.fsf@gmail.com> <87eenspcf8.fsf@ambrevar.xyz> <865z94dz83.fsf@gmail.com> <87zh6gns4l.fsf@ambrevar.xyz> <87zh5c7hx6.fsf@ambrevar.xyz> <87k0w4zw8q.fsf@gnu.org> <875z7oijxu.fsf@ambrevar.xyz> <87eem3u4n8.fsf@gnu.org> Date: Mon, 12 Oct 2020 13:23:13 +0200 Message-ID: <86h7qzvgbi.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::430; envelope-from=zimon.toutoune@gmail.com; helo=mail-wr1-x430.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org, Mathieu Othacehe Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx2.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=alfcfn3Z; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx2.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.71 X-TUID: jWepsLcYRi0g On Mon, 12 Oct 2020 at 12:20, Ludovic Court=C3=A8s wrote: >> - Textual database: slow and not lighter than SQLite. Not worth it I be= lieve. >> >> - SQLite without full-text search: fast, supports classic patterns >> (e.g. "foo*bar") but does not support word permutations. >> >> - SQLite with full-text search: fast, supports word permutations but >> does not support suffix-matching (e.g. "bar" won't match "foobar"). >> Size is about the same as without full-text search. >> >> - Include synopsis and descriptions. Maybe we should include all fields >> that are searched by `guix search`. This incurs a cost on the >> database size but it would fix the `guix search` speed issue. Size >> increases by some 10 MiB. > > Oh so this is going beyond file search, right? > > Perhaps it would make sense to focus on file search only as a first > step, and see what can be done with synopses/descriptions (like Arun and > zimoun did before) later, separately? Well, the first patch set that Arun sent for improving =E2=80=9Cguix search= =E2=80=9D was the introduction of a SQLite database, replacing the current =E2=80=99package.cache=E2=80=99. And I quote your wise advice: I would rather keep the current package cache as-is instead of inserting sqlite in here. I don=E2=80=99t expect it to bring much compared performance-wise to the current simple cache (especially if we look at load time), and it does increase complexity quite a bit. However, using sqlite for keyword search as you initially proposed on guix-devel does sound like a great idea to me. Message-ID: <87sgjhx92g.fsf@gnu.org> Therefore, if Pierre is going to introduce a SQL database where the addition of the synopses/descriptions is cheap, it seems a good idea to use it, isn=E2=80=99t it? Keeping the =E2=80=99package.cache=E2=80=99 as-i= s. And in parallel, =E2=80=9Cwe=E2=80=9C can try to use this WIP branch for improving the speed= of =E2=80=9Cguix search=E2=80=9D (by =E2=80=9Cwe=E2=80=9D, I mean that I plan to work on). BTW, somehow, it would be really easy to remove these 2 extra fields if it is not concluding for search, since it is only the function =E2=80=99add-files=E2=80=99: --8<---------------cut here---------------start------------->8--- (with-statement db (string-append "insert into Info (name, synopsis, description, pack= age)" " values (:name, :synopsis, :description, :id)") stmt (sqlite-bind-arguments stmt #:name name #:synopsis synopsis #:description description #:id id)=20=20=20=20=20=20=20=20 --8<---------------cut here---------------end--------------->8--- and used only once by =E2=80=99persist-package-files=E2=80=99. > It would be nice to see whether/how this could be integrated with > third-party channels. Of course it=E2=80=99s not a priority, but while > designing this feature, we should keep in mind that we might want > third-party channel authors to be able to offer such a database for > their packages. If the third-party channels also provides substitutes, then it would be part of the substitutes, or easy to build from the substitute meta-data. >> - Find a way to garbage-collect the database(s). My intuition is that >> we should have 1 database per Guix checkout and when we `guix gc` a >> Guix checkout we collect the corresponding database. > > If we download a fresh database every time, we might as well simply > overwrite the one we have? But you do not want to download it again if you roll-back for example. >From my point of view, it should be the same mechanism as =E2=80=99package.cache=E2=80=99. Cheers, simon