From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id oNO6IXHbgV9BUwAA0tVLHw (envelope-from ) for ; Sat, 10 Oct 2020 16:04:01 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 0BlJHXHbgV/FLQAAbx9fmQ (envelope-from ) for ; Sat, 10 Oct 2020 16:04:01 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 28D4E94077C for ; Sat, 10 Oct 2020 16:04:01 +0000 (UTC) Received: from localhost ([::1]:47128 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRHLk-0005C2-2h for larch@yhetil.org; Sat, 10 Oct 2020 12:04:00 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41976) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRHLZ-0005BB-39 for guix-devel@gnu.org; Sat, 10 Oct 2020 12:03:49 -0400 Received: from mail-wm1-x335.google.com ([2a00:1450:4864:20::335]:37386) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kRHLW-0003Hy-2I; Sat, 10 Oct 2020 12:03:48 -0400 Received: by mail-wm1-x335.google.com with SMTP id j136so12711337wmj.2; Sat, 10 Oct 2020 09:03:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=u41lF5GVOdJfCSuC0HecB7ug8/q7KMFe7y/S+FGr50U=; b=Y308AEHJAEveNO7moY/3Z5sCGMdYaATHAtcpC0vWymOzB2KtKBZYYm51JmwQ4wCKPK YhEiqiHHWRiZMWXHsKm0RFJuUF6Ci6uf8+VHbgmET2RMzmJqcW0aXfpMqN4BFvUjcWGH cEs+Ke+hdF5dSXz8IZR561pW+0ti15Ir3fnXkIJgkaRumSaWb9IKodSMKIgoM8cyfAp/ cOQrLe9jobkk9e6XtmyYUxU+AUVkgPY7I5PLfi/OQOuLJNTv0f19dttnecFvYhUmMXxA 7XTT6b5WOTOAiHU3yLH3oIyGDK5tLtTk9W2ZjrNeLMYO4Is5xO+BejARmVGGkOzVKV+g 8JeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=u41lF5GVOdJfCSuC0HecB7ug8/q7KMFe7y/S+FGr50U=; b=WAoXMMKU1ohRSZpwVDJvVgp923E4fAwL4yUJDfqXzDi5EFh/JyBfyYDbT99WO6LmOs u7R1qECGbQCGvPoTJYVjfawHWcGkDnP5tcftwoyjEqMvZl94t+6zF8EPtGl3SQUrElN4 o09xpIfXTHDv91UdnbNemSLiMzNCqrfBGxbiToVaN5Tg1OBTYUxK729bbnjqe3NF5bsh VSsW8Aim8OpgE3i3ctIHzSIwzMuqDKZN4ANeEwOvlUIAHSy8FAltTmOWjsiC4EGEhAu4 B9cQZ+U2t1KLV2MYN1lXu4MF+dU4Ismq5ulBSsBUvr2kOc0GsuWA8lM87V7iKhjrt4Y5 wPrQ== X-Gm-Message-State: AOAM532b7pZP5kzuWxgaBVl9XUPzaNr+WTqaWT1pvN8HxnTtTIQSw5Bb XtKTXzDWrHWYv7BzaXK0vcNseCpzuPA= X-Google-Smtp-Source: ABdhPJx6dNWOS+ywecUuOW027Q9XAT42BgogvNG7la4yVoBpnkq0Hnq2wZ0nRPAGTDrgPYBH9Dg3Xw== X-Received: by 2002:a1c:2bc2:: with SMTP id r185mr3166499wmr.53.1602345823001; Sat, 10 Oct 2020 09:03:43 -0700 (PDT) Received: from lili ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id h76sm16516799wme.10.2020.10.10.09.03.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Oct 2020 09:03:42 -0700 (PDT) From: zimoun To: Pierre Neidhardt , Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: File search progress: database review and question on triggers In-Reply-To: <875z7oijxu.fsf@ambrevar.xyz> References: <87sgcuh8rb.fsf@ambrevar.xyz> <86imd4e7cr.fsf@gmail.com> <87eenspcf8.fsf@ambrevar.xyz> <865z94dz83.fsf@gmail.com> <87zh6gns4l.fsf@ambrevar.xyz> <87zh5c7hx6.fsf@ambrevar.xyz> <87k0w4zw8q.fsf@gnu.org> <875z7oijxu.fsf@ambrevar.xyz> Date: Sat, 10 Oct 2020 18:03:40 +0200 Message-ID: <865z7iqd9f.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::335; envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x335.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org, Mathieu Othacehe Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=Y308AEHJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -0.21 X-TUID: ++8tEp2a7/cG Hi, On Mon, 05 Oct 2020 at 20:53, Pierre Neidhardt wrote: > - Textual database: slow and not lighter than SQLite. Not worth it I bel= ieve. Maybe I am out-of-scope, but re-reading *all* the discussion about =E2=80=9Cfileserch=E2=80=9D, is it possible to really do better than =E2=80= =9Clocate=E2=80=9D? As Ricardo mentioned. --8<---------------cut here---------------start------------->8--- echo 3 > /proc/sys/vm/drop_caches time updatedb --output=3D/tmp/store.db --database-root=3D/gnu/store/ real 0m19.903s user 0m1.549s sys 0m4.500s du -sh /gnu/store /tmp/store.db 30G /gnu/store 56M /tmp/store.db guix gc -F XXG echo 3 > /proc/sys/vm/drop_caches time updatedb --output=3D/tmp/store.db --database-root=3D/gnu/store/ real 0m10.105s user 0m0.865s sys 0m2.020s du -sh /gnu/store /tmp/store.db 28G /gnu/store 52M /tmp/store.db --8<---------------cut here---------------end--------------->8--- And then =E2=80=9Clocate=E2=80=9D support regexp and regex and it is fast e= nough. --8<---------------cut here---------------start------------->8--- echo 3 > /proc/sys/vm/drop_caches time locate -d /tmp/store.db --regex "emacs-ma[a-z0-9\.\-]+\/[^.]+.el$" | t= ail -n5 /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/sh= are/emacs/site-lisp/magit-transient.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/sh= are/emacs/site-lisp/magit-utils.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/sh= are/emacs/site-lisp/magit-wip.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/sh= are/emacs/site-lisp/magit-worktree.el /gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/sh= are/emacs/site-lisp/magit.el real 0m3.601s user 0m3.528s sys 0m0.061s --8<---------------cut here---------------end--------------->8--- The only point is that regexp is always cumbersome for me. Well: =C2=ABSome people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.=C2=BB :-) [1] [1] https://en.wikiquote.org/wiki/Jamie_Zawinski > - Include synopsis and descriptions. Maybe we should include all fields > that are searched by `guix search`. This incurs a cost on the > database size but it would fix the `guix search` speed issue. Size > increases by some 10 MiB. >From my point of view, yes. Somehow =E2=80=9Cfilesearch=E2=80=9D is a subp= art of =E2=80=9Csearch=E2=80=9D. So it should be the machinery. > I say we go with SQLite full-text search for now with all package > details. Switching to without full-text search is just a matter of a > minor adjustment, which we can decide later when merging the final > patch. Same if we decide not to include the description, synopsis, etc. [...] > - Populate the database on demand, either after a `guix build` or from a > `guix filesearch...`. This is important so that `guix filesearch` > works on packages built locally. If `guix build`, I need help to know > where to plug it in. [...] > - Sync the databases from the substitute server to the client when > running `guix filesearch`. For this I suggest we send the compressed > database corresponding to a guix generation over the network (around > 10 MiB). Not sure sending just the delta is worth it. >From my point of view, how to transfer the database from substitutes to users and how to locally update (custom channels or custom load path) are not easy. Maybe the core issues. For example, I just did =E2=80=9Cguix pull=E2=80=9D and =E2=80=9C=E2=80=93l= ist-generation=E2=80=9D says from f6dfe42 (Sept. 15) to 4ec2190 (Oct. 10):: 39.9 MB will be download more the tiny bits before =E2=80=9CComputing Guix derivation=E2=80=9D. Say= 50MB max. Well, the =E2=80=9Clocate=E2=80=9D database for my =E2=80=9C/gnu/store=E2= =80=9D (~30GB) is already to ~50MB, and ~20MB when compressed with gzip. And Pierre said: The database will all package descriptions and synopsis is 46 MiB and compresses down to 11 MiB in zstd. which is better but still something. Well, it is not affordable to fetch the database with =E2=80=9Cguix pull=E2=80=9D, IMHO. Therefore, the database would be fetched at the first =E2=80=9Cguix search= =E2=80=9D (assuming point above). But now, how =E2=80=9Csearch=E2=80=9D could know w= hat is custom build and what is not? Somehow, =E2=80=9Csearch=E2=80=9D should scan all t= he store to be able to update the database. And what happens each time I am doing a custom build then =E2=80=9Cfilesear= ch=E2=80=9D. The database should be updated, right? Well, it seems almost unusable. The model =E2=80=9Cupdatedb/locate=E2=80=9D seems better. The user updates= =E2=80=9Cmanually=E2=80=9D if required and then location is fast. Most of the cases are searching files in packages that are not my custom packages. IMHO. To me, each time I am using =E2=80=9Cfilesearch=E2=80=9D: - first time: fetch the database corresponding the Guix commit and then update it with my local store - otherwise: use this database - optionally update the database if the user wants to include new custom items. We could imagine a hook or option to =E2=80=9Cguix pull=E2=80=9D specifying= to also fetch the database and update it at pull time instead of =E2=80=9Csearch=E2= =80=9D time. Personally, I prefer longer =E2=80=9Cguix pull=E2=80=9D because it is alrea= dy a bit long and then fast =E2=80=9Csearch=E2=80=9D than half/half (not so long pull and= longer search). WDYT? > - Find a way to garbage-collect the database(s). My intuition is that > we should have 1 database per Guix checkout and when we `guix gc` a > Guix checkout we collect the corresponding database. Well, the exact same strategy as ~/.config/guix/current/lib/guix/package.cache can be used. BTW, thanks Pierre for improving the Guix discoverability. :-) Cheers, simon