From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id KCF8HGlR62HPKwAAgWs5BA (envelope-from ) for ; Sat, 22 Jan 2022 01:35:53 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id ILTWGGlR62FA6QAAauVa8A (envelope-from ) for ; Sat, 22 Jan 2022 01:35:53 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 06BC21CF2C for ; Sat, 22 Jan 2022 01:35:53 +0100 (CET) Received: from localhost ([::1]:49620 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nB4Nk-0002Hu-62 for larch@yhetil.org; Fri, 21 Jan 2022 19:35:52 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42170) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nB4Na-0002H2-El for guix-devel@gnu.org; Fri, 21 Jan 2022 19:35:42 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:51754) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nB4NY-0006og-GE; Fri, 21 Jan 2022 19:35:42 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id CF0227BA; Sat, 22 Jan 2022 01:35:37 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xR8TfSDE3ui6; Sat, 22 Jan 2022 01:35:36 +0100 (CET) Received: from ribbon (91-160-117-201.subs.proxad.net [91.160.117.201]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 4881038C; Sat, 22 Jan 2022 01:35:36 +0100 (CET) From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: File search References: <8735lh5ukw.fsf@inria.fr> <87czklwf47.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 3 =?utf-8?Q?Pluvi=C3=B4se?= an 230 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sat, 22 Jan 2022 01:35:35 +0100 In-Reply-To: <87czklwf47.fsf@gnu.org> (Mathieu Othacehe's message of "Fri, 21 Jan 2022 11:35:36 +0100") Message-ID: <87a6fo38vc.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: / X-Rspamd-Server: hera X-Rspamd-Queue-Id: CF0227BA X-Spamd-Result: default: False [0.90 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_MIXED_CHARSET(1.00)[subject]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] Received-SPF: softfail client-ip=185.233.100.1; envelope-from=ludo@gnu.org; helo=hera.aquilenet.fr X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1642811753; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=CHiE8ClM9LwW0F040q9gWNPN5goCwj314NHZU/bcArQ=; b=ByoXmjdddzMTI4iKp3D6JPp6qHhslloNLJl2rcMfaqaCDAE+BVtJYktWjyWg7STFUiEwgn VXJOVN9wttQ+Tfll6eOey9KqwDaIKToWIpVWSjFai+8xiCAGHLVOpX2hUJgjbLlnbMGSIn khgxnZvt3kK+UETA/fXAgCyd7sZAaCQkT0ut4YOefCQnrDNKB1+NWnTmn6odCP5ZBIgrkz wRvpe+9EEwHuLxPvPxGSMmZQ8/KA+2nvx9o6QjN03h8lBJ6Auq/tKD6ak3mzYWmqAlGB9B 0s89khmiBcmBTJTAR7N0WDeJuMIBQCJQJ6ENf+uofekjRtDVvc80qhDU6CPY1g== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1642811753; a=rsa-sha256; cv=none; b=IPHnbdtjuqBTISaE1XL2DeLUNEsLN5eY9l05cDXwT3Brm0hbYmmdiWzCdvn2dctga+2T0B zpsn48cJ+kWBltL9FZTWcG0/QZFPvqNWNeHPSXs0SUOJ9NMc1A17jmnV8EXQrCR4Re3RXj Pv2KbuaEVEfb04li5YuQ2rjbXsLKMwoF8UvPcEwbXbpl1Y4X6bm83cIWLcrm/hTloSCCnZ SKP8prEdaK40kGaeJ+dYOXJvD9bKhsS0T87YHIg56g5zZBNgWA19vr1qDLBGHy2em9LZX4 jNjy0nR94oL+JkutqZSZtrh7nFF5L/WfkJ/h9uvjDw9xFNsn2KCeknzhJICZtg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.43 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 06BC21CF2C X-Spam-Score: -4.43 X-Migadu-Scanner: scn0.migadu.com X-TUID: kxV6karBe+5L Hi! Mathieu Othacehe skribis: >> I think accuracy (making sure you get results that correspond precisely >> to, say, your current channel revisions and your current system) is not >> a high priority: some result is better than no result. Likewise for >> freshness: results for an older version of a given package may still be >> valid now. > > Agreed. > >> In terms of privacy, I think it=E2=80=99s better if we can avoid making = one >> request per file searched for. Off-line operation would be sweet, and >> it comes with responsiveness; fast off-line search is necessary for >> things like =E2=80=98command-not-found=E2=80=99 (where the shell tells y= ou what package >> to install when a command is not found). > > Yeah, that's the tricky part. In term of maintenance, it would probably > be easier to have Cuirass index the packages it's building, store the > results in the PostgreSQL database and serve them using the Cuirass web > server. The pros are that we only rely on one database which is very > important in my opinion. It's also relatively easy to setup. The cons > are that you need to be online to access this API. Like I wrote, I don=E2=80=99t think we should do on-line only; we need user= s to be able to download a database at least for =E2=80=98command-not-found=E2= =80=99. > If we instead decide to build periodically an sqlite database indexing > all the packages in a cronjob or so, it would still be needed for the > users to download it, which would be an expensive operation as you > mentioned. It would also be difficult to index custom Guix channels with > that approach. True! Though I for this matter I=E2=80=99d be very pragmatic and wouldn=E2= =80=99t mind sacrificing third-party channels until we have a better idea. > Another solution could be to have guix publish index the files from the > NAR in its cache and provide a file searching API. That would still > require to be online, but it would allow to search from multiple publish > servers hence possibly multiple Guix channels. The packages that do not > have substitutes couldn't be searched which is a strong cons. I would > still maybe have a preference for that option. I also thought about doing it in =E2=80=98guix publish=E2=80=99. One probl= em is that it=E2=80=99s not the right level of abstraction: it publishes everything, n= ot just packages, and it can only guess whether something is a package and what its name and version are. Another option would be to have =E2=80=98guix publish=E2=80=99 provide dige= sts (file lists), similar to what I did in: https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00080.html https://git.savannah.gnu.org/cgit/guix.git/log?h=3Dwip-digests That way, =E2=80=98file-database.scm populate=E2=80=99 could fetch those di= gests instead of whole nars (or solely local info). Users would have to run it regularly. The Nix folks have , which apparently creates a database based on substitutes. Looking at , it seems Hydra provides =E2=80=9Cfile listings=E2=80=9D, similar to digests. Then it=E2=80=99s up to users to regularly run the indexer so they can use =E2=80=98nix-locate=E2=80=99, which is typically done via a local cron job = (similar to how one would use =E2=80=98updatedb=E2=80=99.) Populating the database the= first time may be rather costly though. Maybe that=E2=80=99s a more reasonable approach? All that said, I think we could very much have, in parallel, a fancier database, be it in the Data Service or in Cuirass, that one could query on-line. Its implementation would actually be less constrained. Ludo=E2=80=99.