all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Improve package search
@ 2019-03-14 18:31 mikadoZero
  2019-03-14 20:49 ` Leo Famulari
  0 siblings, 1 reply; 47+ messages in thread
From: mikadoZero @ 2019-03-14 18:31 UTC (permalink / raw)
  To: Guix-devel

# Motivation

From Ludovic Courtès response to bug#34828:
> "I would recommend against turning descriptions into lists of commands
> just for the sake of package search (we should instead have another
> mechanism to determine which package provides a given command) ..."

`guix package -s` often returns no useful results for a program that is
part of a larger multi program package with a different name.  This is
heightened by the very reasonable desire to prevent descriptions form
turning into lists of commands.

# Examples

Here two examples of programs that do not have useful package search
results:

`as` in `gcc-toolchain`
`recsel` in `recutils`

There are other programs that also have this issue.

# Proposed idea

* Add a "programs" field to package definitions that list the programs
  that are included in a package. 
* Include this field in search results.
* Have this field factor into the search result relevance scores.

# Implementation

I am not familiar with how package search works and do not know how
much work this would be to implement.

A requirement for a "programs" field could be included in package
linting.  I am not familiar with the inner workings of linting and do
not know how much work this would be to implement.

# Roll out

* New packages could be given the "programs" field when they are
  created.
  
* Existing packages that are being updated could be given the "programs"
  field.

* Existing packages with relevant irc questions or bug reports could
  be given the "programs" field.

* Existing packages without relevant irc questions or bug reports that
  are not being updated could remain unchanged.  This could save
  significant effort as many programs may never require the "programs"
  field to be added.

# Advantage

Allow users to better find what package includes the program they want
to install.

# Disadvantage

More effort required to package multi program packages.

I know that the coreutils package includes a very large number of
programs.  I do not know if there are many other packages that are also
as large.

# Feedback

This is an initial idea that would benefit from the input of others.

Given the uncertainties I mention in the Implementation and
Disadvantages sections this may not be a good solution for the
Motivation section.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 18:31 Improve package search mikadoZero
@ 2019-03-14 20:49 ` Leo Famulari
  2019-03-14 22:01   ` Tobias Geerinckx-Rice
  0 siblings, 1 reply; 47+ messages in thread
From: Leo Famulari @ 2019-03-14 20:49 UTC (permalink / raw)
  To: mikadoZero; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On Thu, Mar 14, 2019 at 02:31:36PM -0400, mikadoZero wrote:
> # Proposed idea
> 
> * Add a "programs" field to package definitions that list the programs
>   that are included in a package. 
> * Include this field in search results.
> * Have this field factor into the search result relevance scores.

> # Feedback
> 
> This is an initial idea that would benefit from the input of others.

For me, it would be better to have this "program listing" built
automatically, rather than relying on packagers to get it right and keep
it up to date. It would be a great feature once it is in place.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 20:49 ` Leo Famulari
@ 2019-03-14 22:01   ` Tobias Geerinckx-Rice
  2019-03-14 22:09     ` Tobias Geerinckx-Rice
                       ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Tobias Geerinckx-Rice @ 2019-03-14 22:01 UTC (permalink / raw)
  To: Leo Famulari; +Cc: Guix-devel

Leo, mikadoZero,

This has been suggested many times and is a good idea.  Now all we 
need is someone to do the work, but that's the easy part, right?

Leo Famulari wrote:
> On Thu, Mar 14, 2019 at 02:31:36PM -0400, mikadoZero wrote:
>> # Proposed idea
>> 
>> * Add a "programs" field to package definitions that list the 
>> programs
>>   that are included in a package. 
>> * Include this field in search results.
>> * Have this field factor into the search result relevance 
>> scores.

We should also expose it directly like other package managers do. 
‘guix which’ would be very handy, and allows 
‘command-not-found’-style suggestions for those who like that kind 
of thing.

>> # Feedback
>> 
>> This is an initial idea that would benefit from the input of 
>> others.
>
> For me, it would be better to have this "program listing" built
> automatically, rather than relying on packagers to get it right 
> and keep
> it up to date. It would be a great feature once it is in place.

Absolutely.  Adding this to the package record manually is a 
maintenance nightmare.  It's data that can be trivially 
auto-generated (ls …/{,s}bin, basically), and storing it in-line 
takes up too much screen and mind space for my taste.

People have suggested using the build farm for this, but that adds 
disproportionate complexity and decouples/delays updates from the 
commits that caused them.  It's just not the right place.

A separate simple (text) database (included in the git repository, 
and updated in the same commit) would be faster to search and stay 
out of our way.

On the other hand, we'd be able to map commands only to package 
names, not to specific objects.

Just thinking out loud,

T G-R

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 22:01   ` Tobias Geerinckx-Rice
@ 2019-03-14 22:09     ` Tobias Geerinckx-Rice
  2019-03-14 22:46     ` Pierre Neidhardt
  2019-03-16  2:11     ` Improve package search mikadoZero
  2 siblings, 0 replies; 47+ messages in thread
From: Tobias Geerinckx-Rice @ 2019-03-14 22:09 UTC (permalink / raw)
  To: Leo Famulari; +Cc: Guix-devel

Tobias Geerinckx-Rice wrote:
> On the other hand, we'd be able to map commands only to package 
> names,

s names specs .

Kind regards,

T G-R

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 22:01   ` Tobias Geerinckx-Rice
  2019-03-14 22:09     ` Tobias Geerinckx-Rice
@ 2019-03-14 22:46     ` Pierre Neidhardt
  2019-03-14 23:09       ` Tobias Geerinckx-Rice
  2019-03-23 16:27       ` Package file indexing Ludovic Courtès
  2019-03-16  2:11     ` Improve package search mikadoZero
  2 siblings, 2 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2019-03-14 22:46 UTC (permalink / raw)
  To: Tobias Geerinckx-Rice; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 2530 bytes --]


> Absolutely.  Adding this to the package record manually is a maintenance
> nightmare.  It's data that can be trivially auto-generated (ls …/{,s}bin,
> basically), and storing it in-line takes up too much screen and mind space for
> my taste.

"Program names" might even be too limited, if not too shortsighted.  At the end
of the day, programs are not necessarily stored in /bin/.  More importantly,
users don't necessarily look for programs, they could very well be looking for a
library.

So instead of including a program name listing, I suggest to we index the
complete file listing of all packages.    This would allow us to add the only
feature that's missing in Guix that other package managers have (dpkg, portage,
pacman...): find file names in non-installed packages.

Extending the search with file listings might result in too much noise, so
instead we could have a separate command.  "guix which" for instance, as
suggested Tobias, but since we are not just talking about executables, maybe
"guix filesearch" would be more appropriate.

Example:

--8<---------------cut here---------------start------------->8---
$ guix filesearch foo
74i7r7qp1km0gw1i22fnq3szbgc9mpdx-foobar-1.0/bin/foo
74i7r7qp1km0gw1i22fnq3szbgc9mpdx-foobar-1.0/lib/libfoo.a
109sdfvp1km0gwjdksl982fjidsfji9s-…-foobar-1.2/bin/foo
109sdfvp1km0gwjdksl982fjidsfji9s-…-foobar-1.2/lib/libfoo.a
--8<---------------cut here---------------end--------------->8---

And full paths would be supported, so that we can "filter" by executables for instance:

--8<---------------cut here---------------start------------->8---
$ guix filesearch bin/foo
74i7r7qp1km0gw1i22fnq3szbgc9mpdx-foobar-1.0/bin/foo
109sdfvp1km0gwjdksl982fjidsfji9s-…-foobar-1.2/bin/foo
--8<---------------cut here---------------end--------------->8---

This has been discussed before, if someone can find the threads... :p

> People have suggested using the build farm for this, but that adds
> disproportionate complexity and decouples/delays updates from the commits that
> caused them.  It's just not the right place.

I haven't though through the details, but I am under the impression that the
file listing could be retrieve with the same mechanism as "guix size", i.e. from
the substitute index.  I think it would work well on the build farm, without
more complexity than just another entry to the substitute index.

I'd really love to work on this Very Soon™ ;)

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 22:46     ` Pierre Neidhardt
@ 2019-03-14 23:09       ` Tobias Geerinckx-Rice
  2019-03-23 16:27       ` Package file indexing Ludovic Courtès
  1 sibling, 0 replies; 47+ messages in thread
From: Tobias Geerinckx-Rice @ 2019-03-14 23:09 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Pierre!

Pierre Neidhardt wrote:
>> Absolutely.  Adding this to the package record manually is a 
>> maintenance
>> nightmare.  It's data that can be trivially auto-generated (ls 
>> …/{,s}bin,
>> basically), and storing it in-line takes up too much screen and 
>> mind space for
>> my taste.
>
> "Program names" might even be too limited, if not too 
> shortsighted.  At the end
> of the day, programs are not necessarily stored in /bin/.  More 
> importantly,
> users don't necessarily look for programs, they could very well 
> be looking for a
> library.

Oh, sure, {,s}bin was just an example.  I don't really think 
there's a package manager that indexes only those two directories. 
'T would be silly.

> Extending the search with file listings might result in too much 
> noise, so
> instead we could have a separate command.  "guix which" for 
> instance, as

Or ‘guix where’ or guix whatever.

(16 days left for someone to implement ‘guix whatever’.)

> suggested Tobias, but since we are not just talking about 
> executables, maybe
> "guix filesearch" would be more appropriate.

Well, we can deathmatch about the name and the number of hyphens 
once it does something.  ;-)

> I haven't though through the details, but I am under the 
> impression that the
> file listing could be retrieve with the same mechanism as "guix 
> size", i.e. from
> the substitute index.  I think it would work well on the build 
> farm, without
> more complexity than just another entry to the substitute index.

What do you mean by substitute index?

By complexity, I meant: if it would depend on a network connection 
to the build farm (or elsewhere), at all.

And would this information be provided by a naked ‘guix publish’? 
I guess that depends on the meaning of ‘substitute index’.

Kind regards,

T G-R

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve package search
  2019-03-14 22:01   ` Tobias Geerinckx-Rice
  2019-03-14 22:09     ` Tobias Geerinckx-Rice
  2019-03-14 22:46     ` Pierre Neidhardt
@ 2019-03-16  2:11     ` mikadoZero
  2 siblings, 0 replies; 47+ messages in thread
From: mikadoZero @ 2019-03-16  2:11 UTC (permalink / raw)
  To: Guix-devel


Tobias Geerinckx-Rice writes:

> ...  Now all we
> need is someone to do the work, but that's the easy part, right?
> ...

I do not yet know enough about Guix or Guile to work on implementing
this.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Package file indexing
  2019-03-14 22:46     ` Pierre Neidhardt
  2019-03-14 23:09       ` Tobias Geerinckx-Rice
@ 2019-03-23 16:27       ` Ludovic Courtès
  2019-03-25  8:46         ` Pierre Neidhardt
  2020-01-15 16:23         ` Pierre Neidhardt
  1 sibling, 2 replies; 47+ messages in thread
From: Ludovic Courtès @ 2019-03-23 16:27 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Hello,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> I haven't though through the details, but I am under the impression that the
> file listing could be retrieve with the same mechanism as "guix size", i.e. from
> the substitute index.  I think it would work well on the build farm, without
> more complexity than just another entry to the substitute index.

‘guix size’ uses substitute info (“narinfos”) to determine the size of
store items that are unavailable locally.  However, there’s currently no
source of information for file indexes.

My suggestion would be to couple the distribution of file indexes with
the substitute mechanism: if you’ve authorized a given substitute
server, you’d also allow downloads of file lists signed by that server.

An index could look like, say, a list of store item/file pairs.  It
would grow very quickly, which may not be very practical.  ‘guix
publish’ could update that list every time it “bakes” a nar.

The daemon could have a special RPC: you give it a file name and it
returns a store item (or package+version?) or #f.  Internally it’d call
‘guix substitute’ to fetch the file index from the substitute server,
check its signature, cache it locally, and then look up the file.

You should look at how NixOS does it for its ‘command-not-found’ support
(I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
database, but it’s a pretty ad-hoc mechanism without authentication.

Ludo’.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2019-03-23 16:27       ` Package file indexing Ludovic Courtès
@ 2019-03-25  8:46         ` Pierre Neidhardt
  2019-03-26 12:41           ` Ludovic Courtès
  2020-01-15 16:23         ` Pierre Neidhardt
  1 sibling, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2019-03-25  8:46 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]

Hi!

Thanks for the details, Ludo!

Ludovic Courtès <ludo@gnu.org> writes:
> An index could look like, say, a list of store item/file pairs.  It
> would grow very quickly, which may not be very practical.

I think we might need some form of rotation and discard the old indexes
to avoid growing up indefinitely.  Well, we will see once we've started
using it!

> ‘guix publish’ could update that list every time it “bakes” a nar.
>
> The daemon could have a special RPC: you give it a file name and it
> returns a store item (or package+version?) or #f.

I think you meant "store itemS" (plural), no?

> Internally it’d call ‘guix substitute’ to fetch the file index from
> the substitute server, check its signature, cache it locally, and then
> look up the file.
>
> You should look at how NixOS does it for its ‘command-not-found’ support
> (I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
> database, but it’s a pretty ad-hoc mechanism without authentication.

I could work on this, but that seems like a lot of work, especially for
me who knows nothing about the daemon (but hey, it's a great opportunity
to learn!).

Would anyone  else like to pick this up?

Otherwise I'll keep this on my todo list.

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2019-03-25  8:46         ` Pierre Neidhardt
@ 2019-03-26 12:41           ` Ludovic Courtès
  2020-01-02 17:12             ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Ludovic Courtès @ 2019-03-26 12:41 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> The daemon could have a special RPC: you give it a file name and it
>> returns a store item (or package+version?) or #f.
>
> I think you meant "store itemS" (plural), no?

Yes.

>> Internally it’d call ‘guix substitute’ to fetch the file index from
>> the substitute server, check its signature, cache it locally, and then
>> look up the file.
>>
>> You should look at how NixOS does it for its ‘command-not-found’ support
>> (I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
>> database, but it’s a pretty ad-hoc mechanism without authentication.
>
> I could work on this, but that seems like a lot of work, especially for
> me who knows nothing about the daemon (but hey, it's a great opportunity
> to learn!).

Note that the daemon would act as an intermediary, but in practice the
functionality would be very much peripheral to the daemon.  IOW, you
don’t need to know about the daemon internals.

Ludo’.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2019-03-26 12:41           ` Ludovic Courtès
@ 2020-01-02 17:12             ` Pierre Neidhardt
  2020-01-02 19:15               ` Christopher Baines
  2020-01-02 22:50               ` zimoun
  0 siblings, 2 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-02 17:12 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 3729 bytes --]

Hello again!

I'm resurrecting this since I've just started working on this as part of
the NGI application! :)

>>> Internally it’d call ‘guix substitute’ to fetch the file index from
>>> the substitute server, check its signature, cache it locally, and then
>>> look up the file.

What about storing the file listing in the narinfo instead?
Is this doable?  If so, then it should be quite simple to implement, it
would basically mimic "guix size."

>>> You should look at how NixOS does it for its ‘command-not-found’ support
>>> (I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
>>> database, but it’s a pretty ad-hoc mechanism without authentication.

I'll see if I can find the code for this.

If we embed the file listing in the narinfo as I suggested above, then
there we be no point in maintaining a separate database.

>> I could work on this, but that seems like a lot of work, especially for
>> me who knows nothing about the daemon (but hey, it's a great opportunity
>> to learn!).
>
> Note that the daemon would act as an intermediary, but in practice the
> functionality would be very much peripheral to the daemon.  IOW, you
> don’t need to know about the daemon internals.

Any files you recommend looking at to get started?
I suppose that "guix/scripts/size.scm" is a good start.



Last but not least: previously we suggested adding a subcommand like
"guix which" or "guix filesearch".  In another thread, Simon suggested
that this would be a bad idea and factoring the file search into "guix
search" is probably better.  For instance, we could do

  guix search bin/foo

and it would report the packages containing the "bin/foo" path.  This
could mean that we need to adapt the output to display the file listing
as well.  If listing all files would be too verbose, we can list only
the matching files:

--8<---------------cut here---------------start------------->8---
name: jami
version: 20191101.3.67671e7
outputs: out
systems: x86_64-linux i686-linux
dependencies: adwaita-icon-theme@3.32.0 clutter-gtk@1.8.4 clutter@1.26.2 doxygen@1.8.15
+ evolution-data-server@3.32.4 gettext@0.20.1 glib@2.60.6 gtk+@3.24.12 libcanberra@0.30
+ libnotify@0.7.7 libring@20191101.3.67671e7 libringclient@20191101.3.67671e7
+ pkg-config@0.29.2 qrencode@4.0.2 sqlite-with-column-metadata@3.28.0 webkitgtk@2.26.2
location: gnu/packages/telephony.scm:890:2
homepage: https://jami.net
license: GPL 3+
synopsis: Distributed, privacy-respecting communication program  
description: Jami (formerly GNU Ring) is a secure and distributed voice, video and chat
+ communication platform that requires no centralized server and leaves the power of privacy
+ in the hands of the user.  It supports the SIP and IAX protocols, as well as decentralized
+ calling using P2P-DHT.
+ 
+ This package provides the Jami client for the GNOME desktop.
filepaths:
+ bin/foo
+ share/bar/bin/foo-blah
relevance: 24
--8<---------------cut here---------------end--------------->8---

That said, some terms may match too frequently.  For instance, "guix
search lib" would match almost all packages that have libraries and
result in a huge, useless output.

I suggest the following:

- Add a "--search-file-paths=[auto|on|off]" option.
- When --search-file-paths is "auto", file paths are automatically
  searched for against terms that contain a slash.  E.g. "lib" won't
  return file paths but "lib/" will.

Another feature that could be nice: list the file paths for the given
packages.
I think we need a separate subcommand for this, e.g. "guix list-files".

Thoughts?

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-02 17:12             ` Pierre Neidhardt
@ 2020-01-02 19:15               ` Christopher Baines
  2020-01-03 11:26                 ` Ludovic Courtès
  2020-01-02 22:50               ` zimoun
  1 sibling, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-02 19:15 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1902 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Hello again!
>
> I'm resurrecting this since I've just started working on this as part of
> the NGI application! :)
>
>>>> Internally it’d call ‘guix substitute’ to fetch the file index from
>>>> the substitute server, check its signature, cache it locally, and then
>>>> look up the file.
>
> What about storing the file listing in the narinfo instead?
> Is this doable?  If so, then it should be quite simple to implement, it
> would basically mimic "guix size."

I haven't followed this thread particularly well, but at least from my
recent experience messing with nar and narinfo stuff in the Guix Data
Service, I'd be cautious about trying to adapt narinfo files for this
purpose.

It seems to me that the narinfo file is a good at capturing the
information about the hash, size, location and signature of the
nar. They're small, and human readable.

I think making information about the contents of Guix store items more
available is great, but even in the average case, it seems like that's
too much information to pack in to a narinfo file. Imagining a manifest
in abstract, having a list of the files and directories as well as the
hashes and sizes of the files could be really useful, but that for most
store items, all that information is much larger than the narinfo
files. A separate file might be more flexible.

Additionally, now that I'm thinking about this, having information about
each store item is great, but if you want to know which store items in a
particular revision of Guix contain files called foo, then it might take
a while to download and search them all. Having something that's focused
around the packages in a channel, and acts as an index for all of the
files in all of the available outputs might be faster to search, by
doing the combining of the data upfront.

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-02 17:12             ` Pierre Neidhardt
  2020-01-02 19:15               ` Christopher Baines
@ 2020-01-02 22:50               ` zimoun
  2020-01-03 16:00                 ` raingloom
  2020-01-09 12:55                 ` Pierre Neidhardt
  1 sibling, 2 replies; 47+ messages in thread
From: zimoun @ 2020-01-02 22:50 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Hi,

On Thu, 2 Jan 2020 at 18:12, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> Last but not least: previously we suggested adding a subcommand like
> "guix which" or "guix filesearch".  In another thread, Simon suggested
> that this would be a bad idea and factoring the file search into "guix
> search" is probably better.

It appears to me better for 2 reasons:
 1. because obviously "filesearch" is a kind of "search" ;-) so it
adds consistency.
 2. because it allows (in the near future) mixed research: "guix
search bin/hg python" applying the "python" filter only to the
packages returned by "bin/hg". And "guix search python bin/hg" search
the binary file "hg" only to the packages matching "python.


> For instance, we could do
>
>   guix search bin/foo
>
> and it would report the packages containing the "bin/foo" path.  This
> could mean that we need to adapt the output to display the file listing
> as well.  If listing all files would be too verbose, we can list only
> the matching files:
>
> --8<---------------cut here---------------start------------->8---
> name: jami

[...]

> filepaths:
> + bin/foo
> + share/bar/bin/foo-blah
> relevance: 24
> --8<---------------cut here---------------end--------------->8---

How do you compute the relevance/score?

Currently, when searching with regexp, the relevance is computed by
counting the number of matches applying different weights depending on
if the match is about name, synopsis, description, etc. It is not
perfect and there is room of improvements as discussed elsewhere, but
it works (nicely when you know what you are searching ;-).

For example, let consider 2 packages:

 a- 'bin/foo'
 b- 'share/baz/bin/foo'

How to do you order/score the result? What do you expect first? The
package a- I guess.
Therefore, weight should be applied, isn't it?



> That said, some terms may match too frequently.  For instance, "guix
> search lib" would match almost all packages that have libraries and
> result in a huge, useless output.
>
> I suggest the following:
>
> - Add a "--search-file-paths=[auto|on|off]" option.

I do not find this option name explaining by itself. Personally, I am
inclined to provide a path to the option and not a boolean.

> - When --search-file-paths is "auto", file paths are automatically
>   searched for against terms that contain a slash.  E.g. "lib" won't
>   return file paths but "lib/" will.

This should be cool.
With regexp too.

Time to time, I am looking for header C file or latex style but I do
not know the path. I would like to have something like:

  guix search gmsh.h
or
 guix search ieee*.sty


> Another feature that could be nice: list the file paths for the given
> packages.
> I think we need a separate subcommand for this, e.g. "guix list-files".

Yes, cool!


IMHO, it should be included under "guix package", i.e.,

  guix package gmsh --list-files

should returns something like:

--8<---------------cut here---------------start------------->8---
bin/gmsh
share/applications/gmsh.desktop
share/doc/gmsh/README.Debian
share/doc/gmsh/TODO.Debian
share/doc/gmsh/changelog.Debian.gz
share/doc/gmsh/changelog.gz
share/doc/gmsh/copyright
share/info/gmsh.info.gz
share/man/man1/gmsh.1.gz
share/pixmaps/gmsh_16x16.xpm
share/pixmaps/gmsh_32x32.xpm
--8<---------------cut here---------------end--------------->8---

(list from https://packages.debian.org/buster/amd64/gmsh/filelist)


All the best,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-02 19:15               ` Christopher Baines
@ 2020-01-03 11:26                 ` Ludovic Courtès
  2020-01-09 11:19                   ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Ludovic Courtès @ 2020-01-03 11:26 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hello!

Christopher Baines <mail@cbaines.net> skribis:

> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>
>> Hello again!
>>
>> I'm resurrecting this since I've just started working on this as part of
>> the NGI application! :)
>>
>>>>> Internally it’d call ‘guix substitute’ to fetch the file index from
>>>>> the substitute server, check its signature, cache it locally, and then
>>>>> look up the file.
>>
>> What about storing the file listing in the narinfo instead?
>> Is this doable?  If so, then it should be quite simple to implement, it
>> would basically mimic "guix size."
>
> I haven't followed this thread particularly well, but at least from my
> recent experience messing with nar and narinfo stuff in the Guix Data
> Service, I'd be cautious about trying to adapt narinfo files for this
> purpose.
>
> It seems to me that the narinfo file is a good at capturing the
> information about the hash, size, location and signature of the
> nar. They're small, and human readable.
>
> I think making information about the contents of Guix store items more
> available is great, but even in the average case, it seems like that's
> too much information to pack in to a narinfo file. Imagining a manifest
> in abstract, having a list of the files and directories as well as the
> hashes and sizes of the files could be really useful, but that for most
> store items, all that information is much larger than the narinfo
> files. A separate file might be more flexible.

I concur!  Actually, there’s a separate file already: the nar itself.

  wget -q -O - https://ci.guix.gnu.org/nar/lzip/1gyi4i5lbpr7apm74p08dwy11fhzh4j7-sed-4.7 \
     | lzip -d | guix archive -t

But…

> Additionally, now that I'm thinking about this, having information about
> each store item is great, but if you want to know which store items in a
> particular revision of Guix contain files called foo, then it might take
> a while to download and search them all. Having something that's focused
> around the packages in a channel, and acts as an index for all of the
> files in all of the available outputs might be faster to search, by
> doing the combining of the data upfront.

… I agree.  I think file search has to be a service providing access to
a fast database.

I think the Guix Data Service is a good fit since it knows about
packages, derivations, commits, and how they map to each other.  :-)  It
could download nars and do the equivalent of ‘guix archive -t’ to get
the list of file names.

There’s an argument that it would be nice if file search were
implemented as part of ‘guix publish’ because that would immediately
benefit everyone without going through complex setups.  However, ‘guix
publish’ wouldn’t really know what to index upfront, or maybe it could
index lazily like it does with “baking”.

Food for thought!

Ludo’.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-02 22:50               ` zimoun
@ 2020-01-03 16:00                 ` raingloom
  2020-01-06 16:56                   ` zimoun
  2020-01-09 12:57                   ` Pierre Neidhardt
  2020-01-09 12:55                 ` Pierre Neidhardt
  1 sibling, 2 replies; 47+ messages in thread
From: raingloom @ 2020-01-03 16:00 UTC (permalink / raw)
  To: zimoun, Pierre Neidhardt; +Cc: Guix-devel

On Thu, 2020-01-02 at 23:50 +0100, zimoun wrote:
> Hi,
> 
> On Thu, 2 Jan 2020 at 18:12, Pierre Neidhardt <mail@ambrevar.xyz>
> wrote:
> 
> > Last but not least: previously we suggested adding a subcommand
> > like
> > "guix which" or "guix filesearch".  In another thread, Simon
> > suggested
> > that this would be a bad idea and factoring the file search into
> > "guix
> > search" is probably better.
> 
> It appears to me better for 2 reasons:
>  1. because obviously "filesearch" is a kind of "search" ;-) so it
> adds consistency.
>  2. because it allows (in the near future) mixed research: "guix
> search bin/hg python" applying the "python" filter only to the
> packages returned by "bin/hg". And "guix search python bin/hg" search
> the binary file "hg" only to the packages matching "python.
> 

What about files in root (so, ones with no slashes in their path, at
least in your syntax) and files you don't know the full path of, only
their basename?

Do you search for every word as a file path, just in case it might be
one?

To avoid confusion, I think this should be an option/subcommand of
search. Something like -path and -name in find(1).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-03 16:00                 ` raingloom
@ 2020-01-06 16:56                   ` zimoun
  2020-01-09 13:01                     ` Pierre Neidhardt
  2020-01-09 12:57                   ` Pierre Neidhardt
  1 sibling, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-06 16:56 UTC (permalink / raw)
  To: raingloom; +Cc: Guix-devel

Dear,


On Fri, 3 Jan 2020 at 17:01, raingloom <raingloom@riseup.net> wrote:

> On Thu, 2020-01-02 at 23:50 +0100, zimoun wrote:

> >  2. because it allows (in the near future) mixed research: "guix
> > search bin/hg python" applying the "python" filter only to the
> > packages returned by "bin/hg". And "guix search python bin/hg" search
> > the binary file "hg" only to the packages matching "python.

> What about files in root (so, ones with no slashes in their path, at
> least in your syntax) and files you don't know the full path of, only
> their basename?

I agree.
This second bullet was about composing the "regular package" search
and the "file" search; not really about the syntax to switch between
the two kind of search. :-)
Below the quoting you did, I also described something like "guix
search gmsh.h". ;-)

The syntax '/' should be an option but not the only one, IMHO. We can imagine:

 - guix search file:gmsh.h gimp
 - guix search bin/gmsh gimp
 - guix search file:ieee*.sty bin/gmsh latex
 - guix search file:bin/gmsh
 - guix search package:gimp
etc.


> To avoid confusion, I think this should be an option/subcommand of
> search. Something like -path and -name in find(1).

I agree that explicit keywords, e.g., "file:" and "package:", avoid confusion.


All the best,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-03 11:26                 ` Ludovic Courtès
@ 2020-01-09 11:19                   ` Pierre Neidhardt
  2020-01-09 12:24                     ` zimoun
  2020-01-09 16:49                     ` Christopher Baines
  0 siblings, 2 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 11:19 UTC (permalink / raw)
  To: Ludovic Courtès, Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1257 bytes --]

> … I agree.  I think file search has to be a service providing access to
> a fast database.

Good point.  Let's go in that direction then.

> I think the Guix Data Service is a good fit since it knows about
> packages, derivations, commits, and how they map to each other.  :-)  It
> could download nars and do the equivalent of ‘guix archive -t’ to get
> the list of file names.

Are you suggesting that guix "filesearch" polls a specific instance of
the Guix Data Service (e.g. data.guix.gnu.org) to download the file
index fro the current Guix revision?

What if the file index for a specific Guix commit (e.g. a very recent
one) is not yet available?  I suggest we fall back to the first older
index that's available, with a warning.  Thoughts?

> There’s an argument that it would be nice if file search were
> implemented as part of ‘guix publish’ because that would immediately
> benefit everyone without going through complex setups.  However, ‘guix
> publish’ wouldn’t really know what to index upfront, or maybe it could
> index lazily like it does with “baking”.

I don't understand why `guix publish' wouldn't work here.  Can you detail?

Thanks!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 11:19                   ` Pierre Neidhardt
@ 2020-01-09 12:24                     ` zimoun
  2020-01-09 13:01                       ` Pierre Neidhardt
  2020-01-09 16:49                     ` Christopher Baines
  1 sibling, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 12:24 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix Devel

On Thu, 9 Jan 2020 at 12:20, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>
> > … I agree.  I think file search has to be a service providing access to
> > a fast database.
>
> Good point.  Let's go in that direction then.

But it should be possible to build this database locally without using
any network connection.

Something like: "guix search --build-database" and also "guix pull
--build-search-db".

And using an external database fetched from ci.guix.gnu.org or
data.guix.gnu.org should work as the substitutes do.




All the best,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-02 22:50               ` zimoun
  2020-01-03 16:00                 ` raingloom
@ 2020-01-09 12:55                 ` Pierre Neidhardt
  2020-01-09 14:05                   ` zimoun
  1 sibling, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 12:55 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> It appears to me better for 2 reasons:
>  1. because obviously "filesearch" is a kind of "search" ;-) so it
> adds consistency.
>  2. because it allows (in the near future) mixed research: "guix
> search bin/hg python" applying the "python" filter only to the
> packages returned by "bin/hg". And "guix search python bin/hg" search
> the binary file "hg" only to the packages matching "python.

Agreed.

>> --8<---------------cut here---------------start------------->8---
>> name: jami
>
> [...]
>
>> filepaths:
>> + bin/foo
>> + share/bar/bin/foo-blah
>> relevance: 24
>> --8<---------------cut here---------------end--------------->8---
>
> How do you compute the relevance/score?

I've copy-pasted the current output for Jami, I did not touch it in a
particular way.

> For example, let consider 2 packages:
>
>  a- 'bin/foo'
>  b- 'share/baz/bin/foo'
>
> How to do you order/score the result? What do you expect first? The
> package a- I guess.
> Therefore, weight should be applied, isn't it?

Agreed.

>> I suggest the following:
>>
>> - Add a "--search-file-paths=[auto|on|off]" option.
>
> I do not find this option name explaining by itself. Personally, I am
> inclined to provide a path to the option and not a boolean.

If I understand you correctly, you are suggesting this syntax to return
packages matching "python-" and with files matching "foo.*bar".

  guix search --file-path="foo.*bar" python-

What I originally suggested is that we could equivalently do:

  guix search "/foo.*bar" python-

Forget about the --search-file-paths option, it's probably not necessary.

> Time to time, I am looking for header C file or latex style but I do
> not know the path. I would like to have something like:
>
>   guix search gmsh.h
> or
>  guix search ieee*.sty

That's OK, if you know the basename then "/gmsh.h" will match.
If you only know a substring of the basename, then "/.*gmsh.h" will
match too.

> IMHO, it should be included under "guix package", i.e.,
>
>   guix package gmsh --list-files

Why not, but then this does not match the interface we have with "guix
size".

Could we also have "guix package gmsh --size"?  Would we deprecate "guix
size" then?

If not, then for the sake of consistency I'd prefer to have "guix list-files".

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-03 16:00                 ` raingloom
  2020-01-06 16:56                   ` zimoun
@ 2020-01-09 12:57                   ` Pierre Neidhardt
  1 sibling, 0 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 12:57 UTC (permalink / raw)
  To: raingloom, zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]

raingloom <raingloom@riseup.net> writes:

> What about files in root (so, ones with no slashes in their path, at
> least in your syntax) and files you don't know the full path of, only
> their basename?

For a file at root, e.g. the "bin" folder, you can match with "/bin".

If you only know the basename, same:  "/hg" will match "/bin/hg".

If you only know a substring, then you can use a regexp:

"/.*my-substring"

> Do you search for every word as a file path, just in case it might be
> one?

Yes.

> To avoid confusion, I think this should be an option/subcommand of
> search. Something like -path and -name in find(1).

I believe that there is no point in matching slashes ("/") in the
synopsis / description, so it's safe enough to use as a filter meant to
match only file names.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-06 16:56                   ` zimoun
@ 2020-01-09 13:01                     ` Pierre Neidhardt
  2020-01-09 13:53                       ` zimoun
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 13:01 UTC (permalink / raw)
  To: zimoun, raingloom; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 1111 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> The syntax '/' should be an option but not the only one, IMHO. We can imagine:
>
>  - guix search file:gmsh.h gimp
>  - guix search bin/gmsh gimp
>  - guix search file:ieee*.sty bin/gmsh latex
>  - guix search file:bin/gmsh
>  - guix search package:gimp
> etc.
>
>
>> To avoid confusion, I think this should be an option/subcommand of
>> search. Something like -path and -name in find(1).
>
> I agree that explicit keywords, e.g., "file:" and "package:", avoid confusion.

I disagree.  What about matching a path which contains "file:" or
"package:"?  Then you end up with confusing commands.

Using "/" as a filter makes sense because it's the only character that's
not allowed in filenames (with \0) and it's safe to assume that it's not
useful to match against "/" in description / synopsis.

Simon, regarding your examples:

>  - guix search bin/gmsh gimp
>  - guix search file:ieee*.sty bin/gmsh latex
>  - guix search file:bin/gmsh

why mixing both the "file:" prefix and the "/"?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 12:24                     ` zimoun
@ 2020-01-09 13:01                       ` Pierre Neidhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 13:01 UTC (permalink / raw)
  To: zimoun; +Cc: Guix Devel

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> On Thu, 9 Jan 2020 at 12:20, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>>
>> > … I agree.  I think file search has to be a service providing access to
>> > a fast database.
>>
>> Good point.  Let's go in that direction then.
>
> But it should be possible to build this database locally without using
> any network connection.

Yes, a bit like "guix size" permits.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 13:01                     ` Pierre Neidhardt
@ 2020-01-09 13:53                       ` zimoun
  2020-01-09 14:14                         ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 13:53 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Hi Pierre,

On Thu, 9 Jan 2020 at 14:01, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>
> zimoun <zimon.toutoune@gmail.com> writes:

> >> To avoid confusion, I think this should be an option/subcommand of
> >> search. Something like -path and -name in find(1).
> >
> > I agree that explicit keywords, e.g., "file:" and "package:", avoid confusion.
>
> I disagree.  What about matching a path which contains "file:" or
> "package:"?  Then you end up with confusing commands.

About "file:", no issue:
    guix search file:file:


However, yes there is an ambiguous behaviour of:

  guix search package:


Currently, the command

  guix search

returns an error.


Does "guix search package:" return an error as "guix search"? Meaning
search about 'empty'.
Or does it return the packages matching the term "package:"? For
example the package "perl-package-stash-xs" containing "Package:" in
its description or the package "r-vctrs" containing "package:" in its
description too. Note it is the only two packages.


For backward compatibility, the ambiguity needs to be fixed by the latter.



> Using "/" as a filter makes sense because it's the only character that's
> not allowed in filenames (with \0) and it's safe to assume that it's not
> useful to match against "/" in description / synopsis.
>
> Simon, regarding your examples:
>
> >  - guix search bin/gmsh gimp
> >  - guix search file:ieee*.sty bin/gmsh latex
> >  - guix search file:bin/gmsh
>
> why mixing both the "file:" prefix and the "/"?

Yes, I am suggesting to mix both.

I would like to have all this syntax:

>  - guix search file:gmsh.h gimp
>  - guix search bin/gmsh gimp
>  - guix search file:ieee*.sty bin/gmsh latex
>  - guix search file:bin/gmsh
>  - guix search package:gimp



Now, if we speak about the "search" command-line syntax, today the way
is to write a regexp and then to filter with 'recsel'. It is UNIX
philosophy to compose via pipes but the drawback is: one *has to*
first (read the Guix manual [1] to) know the existence of 'recsel' and
second read the documentation of 'recutils' for complex filtering. So,
long time ago, I was thinking to add more syntax [2]. Today, the
syntax is:

   guix search "" | recsel -C -e 'name ~ "agda"  && !(name ~ "mode")'
-p synopsis

and I find more welcoming something avoiding the pipe, e.g.,

  guix search 'name ~ "agda" && !(name ~ "mode") -p synopsis'


Cheers,
simon

[1] http://guix.gnu.org/manual/devel/en/html_node/Invoking-guix-package.html#Invoking-guix-package
[2] https://lists.gnu.org/archive/html/guix-devel/2018-12/msg00480.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 12:55                 ` Pierre Neidhardt
@ 2020-01-09 14:05                   ` zimoun
  2020-01-09 14:21                     ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 14:05 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

Hi again, :-)

On Thu, 9 Jan 2020 at 13:55, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> What I originally suggested is that we could equivalently do:
>
>   guix search "/foo.*bar" python-

[...]

> > Time to time, I am looking for header C file or latex style but I do
> > not know the path. I would like to have something like:
> >
> >   guix search gmsh.h
> > or
> >  guix search ieee*.sty
>
> That's OK, if you know the basename then "/gmsh.h" will match.
> If you only know a substring of the basename, then "/.*gmsh.h" will
> match too.

My point is just I do not like the key '/' to turn on the file-search
and I prefer "file:". As said elsewhere. :-)

Otherwise, I am on the same wavelength about you are proposing and it
is really cool!


> > IMHO, it should be included under "guix package", i.e.,
> >
> >   guix package gmsh --list-files
>
> Why not, but then this does not match the interface we have with "guix
> size".

Maybe we not not talking about the same thing.

What is the purpose of this "list-files" for you?

To me, it should return all the files of the package gmsh. For example
with Debian, when I install the package gmsh, the package manager adds
all these files [1].

[1] https://packages.debian.org/buster/amd64/gmsh/filelist

Nothing similar exists in Guix, right?
If yes, cool, please tell me. :-)
Because I am not aware of such thing, then I use "ls -l" and "which"
to locate them, which is not friendly.



> Could we also have "guix package gmsh --size"?  Would we deprecate "guix
> size" then?

No. It seems better to keep "guix size", IMHO.


> If not, then for the sake of consistency I'd prefer to have "guix list-files".

I think it is a bad idea to add another subcommand. Because it is not
so common, I guess.
Well, could you elaborate on what this "list-files" will do?


All the best,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 13:53                       ` zimoun
@ 2020-01-09 14:14                         ` Pierre Neidhardt
  2020-01-09 14:36                           ` zimoun
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 14:14 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> Hi Pierre,
>
> On Thu, 9 Jan 2020 at 14:01, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>>
>> zimoun <zimon.toutoune@gmail.com> writes:
>
>> >> To avoid confusion, I think this should be an option/subcommand of
>> >> search. Something like -path and -name in find(1).
>> >
>> > I agree that explicit keywords, e.g., "file:" and "package:", avoid confusion.
>>
>> I disagree.  What about matching a path which contains "file:" or
>> "package:"?  Then you end up with confusing commands.
>
> About "file:", no issue:
>     guix search file:file:

It might not be ambiguous for the machine, but it is to the human
reader! :)

--8<---------------cut here---------------start------------->8---
  guix search /file:
--8<---------------cut here---------------end--------------->8---

is more readable in my opinion.

>> Simon, regarding your examples:
>>
>> >  - guix search bin/gmsh gimp
>> >  - guix search file:ieee*.sty bin/gmsh latex
>> >  - guix search file:bin/gmsh
>>
>> why mixing both the "file:" prefix and the "/"?
>
> Yes, I am suggesting to mix both.
>
> I would like to have all this syntax:
>
>>  - guix search file:gmsh.h gimp
>>  - guix search bin/gmsh gimp
>>  - guix search file:ieee*.sty bin/gmsh latex
>>  - guix search file:bin/gmsh
>>  - guix search package:gimp

But for which benefit?  It seems that this single example

>>  - guix search bin/gmsh gimp

covers all your needs.

> Now, if we speak about the "search" command-line syntax, today the way
> is to write a regexp and then to filter with 'recsel'. It is UNIX
> philosophy to compose via pipes but the drawback is: one *has to*
> first (read the Guix manual [1] to) know the existence of 'recsel' and
> second read the documentation of 'recutils' for complex filtering. So,
> long time ago, I was thinking to add more syntax [2]. Today, the
> syntax is:
>
>    guix search "" | recsel -C -e 'name ~ "agda"  && !(name ~ "mode")'
> -p synopsis
>
> and I find more welcoming something avoiding the pipe, e.g.,
>
>   guix search 'name ~ "agda" && !(name ~ "mode") -p synopsis'

This is still rather arcanic (understand: too hard to be useful to the
general user) in my opinion.  Every time I use a program that has some
search semantic, I need to look up the manual because I forgot the
syntax and other intricacies.  So I end up not doing it often.

For advanced search, we could provide "explorable" features with a
graphical user interface (which I plan to work on later) or Emacs (a big
like `guix-packages-by-name', but more general).  Those interface would
allow the user to chain searches by narrowing down lists.  What you
print in the end is irrelevant since you can have an interactive
presentation (unlike the shell).

Example:

- Display list of all packages.
- Run "agda" search against names.
- Narrow down.
- Run "mode" search against names.
- Narrow down to the complement.
- Run a general search against "foo bar".
- Print the result.
- Display synopsis only of the result.

For the general case, a "do what I mean" search field that works like
Internet search engines is a better approach in my opinion.

What do you think?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 14:05                   ` zimoun
@ 2020-01-09 14:21                     ` Pierre Neidhardt
  2020-01-09 14:51                       ` zimoun
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 14:21 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> Hi again, :-)
>
> On Thu, 9 Jan 2020 at 13:55, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>
>> What I originally suggested is that we could equivalently do:
>>
>>   guix search "/foo.*bar" python-
>
> [...]
>
>> > Time to time, I am looking for header C file or latex style but I do
>> > not know the path. I would like to have something like:
>> >
>> >   guix search gmsh.h
>> > or
>> >  guix search ieee*.sty
>>
>> That's OK, if you know the basename then "/gmsh.h" will match.
>> If you only know a substring of the basename, then "/.*gmsh.h" will
>> match too.
>
> My point is just I do not like the key '/' to turn on the file-search
> and I prefer "file:". As said elsewhere. :-)

Why don't you like it?
I don't like "file:" because:

- It can make for ambiguous command line to the human read
  (e.g. "file:file:").
- It's a new arbitrary syntax which the user must learn to use it, which
  means they probably won't.

The benefit of "/" is that it works _incidentally_.  If you are looking
for "bin/hg", then `guix search bin/hg` will do the right thing.

> What is the purpose of this "list-files" for you?

Listing the files of a package like in the example you gave.

What I meant is that we already have a subcommand that outputs a
property of the given packages, i.e. "guix size".  If I'm not mistaken,
there is no "guix package" flag that displays any property for the given
packages.

I am just thinking about keeping consistency across the various
subcommands of Guix.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 14:14                         ` Pierre Neidhardt
@ 2020-01-09 14:36                           ` zimoun
  2020-01-09 15:38                             ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 14:36 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

On Thu, 9 Jan 2020 at 15:14, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> >> > I agree that explicit keywords, e.g., "file:" and "package:", avoid confusion.
> >>
> >> I disagree.  What about matching a path which contains "file:" or
> >> "package:"?  Then you end up with confusing commands.
> >
> > About "file:", no issue:
> >     guix search file:file:
>
> It might not be ambiguous for the machine, but it is to the human
> reader! :)
>
> --8<---------------cut here---------------start------------->8---
>   guix search /file:
> --8<---------------cut here---------------end--------------->8---
>
> is more readable in my opinion.

I disagree. Ah cheese and wine taste... ;-)

As I said, I am suggesting to have the both syntax.


> >> Simon, regarding your examples:
> >>
> >> >  - guix search bin/gmsh gimp
> >> >  - guix search file:ieee*.sty bin/gmsh latex
> >> >  - guix search file:bin/gmsh
> >>
> >> why mixing both the "file:" prefix and the "/"?
> >
> > Yes, I am suggesting to mix both.
> >
> > I would like to have all this syntax:
> >
> >>  - guix search file:gmsh.h gimp
> >>  - guix search bin/gmsh gimp
> >>  - guix search file:ieee*.sty bin/gmsh latex
> >>  - guix search file:bin/gmsh
> >>  - guix search package:gimp
>
> But for which benefit?  It seems that this single example
>
> >>  - guix search bin/gmsh gimp
>
> covers all your needs.

No.

For example:

> >>  - guix search file:gmsh.h gimp

I am looking for the file gmsh.h and I do not know nothing more.

> >>  - guix search bin/gmsh gimp

I need to know the name of the file and the path.

> >>  - guix search file:ieee*.sty bin/gmsh latex

I know nothing about the path of the file ieee*.sty and it does not
belong to the package gmsh.


Whatever!

To summary, I think:

 - the syntax '/' is cool to turn on the "file-search" feature
 - I find more meaningful the syntax "file:" to turn on "file-search"
 - I find more meaningful to have "file:foo.h package:bar" than "/.foo.h bar"



> > Now, if we speak about the "search" command-line syntax, today the way
> > is to write a regexp and then to filter with 'recsel'. It is UNIX
> > philosophy to compose via pipes but the drawback is: one *has to*
> > first (read the Guix manual [1] to) know the existence of 'recsel' and
> > second read the documentation of 'recutils' for complex filtering. So,
> > long time ago, I was thinking to add more syntax [2]. Today, the
> > syntax is:
> >
> >    guix search "" | recsel -C -e 'name ~ "agda"  && !(name ~ "mode")'
> > -p synopsis
> >
> > and I find more welcoming something avoiding the pipe, e.g.,
> >
> >   guix search 'name ~ "agda" && !(name ~ "mode") -p synopsis'
>
> This is still rather arcanic (understand: too hard to be useful to the
> general user) in my opinion.  Every time I use a program that has some
> search semantic, I need to look up the manual because I forgot the
> syntax and other intricacies.  So I end up not doing it often.

I agree...

> For advanced search, we could provide "explorable" features with a
> graphical user interface (which I plan to work on later) or Emacs (a big
> like `guix-packages-by-name', but more general).  Those interface would
> allow the user to chain searches by narrowing down lists.  What you
> print in the end is irrelevant since you can have an interactive
> presentation (unlike the shell).

...but at some point you need some semantic for filtering, at least for regexp.

Graphical presentation does not change the issue.

> Example:
>
> - Display list of all packages.
> - Run "agda" search against names.
> - Narrow down.
> - Run "mode" search against names.
> - Narrow down to the complement.
> - Run a general search against "foo bar".
> - Print the result.
> - Display synopsis only of the result.

Well you did kind of some semantic. ;-)

(You have right that it is more easy to remember how to do when it is
graphical and step by step. :-))


> For the general case, a "do what I mean" search field that works like
> Internet search engines is a better approach in my opinion.

I agree.

On my side, as I explained elsewhere I would like to try to improve
the 'relevance' function by applying well-known NLP stuff. :-)


Cheers,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 14:21                     ` Pierre Neidhardt
@ 2020-01-09 14:51                       ` zimoun
  2020-01-09 15:41                         ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 14:51 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

On Thu, 9 Jan 2020 at 15:21, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> Why don't you like it?

You are like Haskellers or Perlers asking why ">>=" is not clear. :-)

I do not find meaningful "/.*gmsh.h" to search the file named "gmsh.h".
I find clearer "file:gmsh.h".

Taste of cheese and wine... :-)

> I don't like "file:" because:
>
> - It can make for ambiguous command line to the human read
>   (e.g. "file:file:").

Bad faith? ;-)
I do not know how many user will search for the term "file:".

> - It's a new arbitrary syntax which the user must learn to use it, which
>   means they probably won't.

Hum? I am not convinced.


> The benefit of "/" is that it works _incidentally_.  If you are looking
> for "bin/hg", then `guix search bin/hg` will do the right thing.

I agree.

To be clear, to search the binary 'hg', I find clearer "guix search bin/hg".
However, to search any file which you do not the path, I find clearer
"guix search file:foo.h".


Well, it is enough of bikeshedding, isn't it? :-)


> > What is the purpose of this "list-files" for you?
>
> Listing the files of a package like in the example you gave.

Ok.

> What I meant is that we already have a subcommand that outputs a
> property of the given packages, i.e. "guix size".  If I'm not mistaken,
> there is no "guix package" flag that displays any property for the given
> packages.

You are suggesting "guix size emacs --list-files", right?


> I am just thinking about keeping consistency across the various
> subcommands of Guix.

I do not have a strong opinion. :-)
To me, the right place is "guix package --list-files" but I am not
convinced. :-)


Cheers,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 14:36                           ` zimoun
@ 2020-01-09 15:38                             ` Pierre Neidhardt
  2020-01-09 16:59                               ` zimoun
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 15:38 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

>> But for which benefit?  It seems that this single example
>>
>> >>  - guix search bin/gmsh gimp
>>
>> covers all your needs.
>
> No.
>
> For example:
>
>> >>  - guix search file:gmsh.h gimp
>
> I am looking for the file gmsh.h and I do not know nothing more.

--8<---------------cut here---------------start------------->8---
guix search /gmsh.h 
--8<---------------cut here---------------end--------------->8---

would work.  You don't need the full path.

>> >>  - guix search file:ieee*.sty bin/gmsh latex
>
> I know nothing about the path of the file ieee*.sty and it does not
> belong to the package gmsh.

I don't understand what you are trying to search.  Is it the 'bin/gmsh'
file or the files matching 'ieee*.sty'?

>> For advanced search, we could provide "explorable" features with a
>> graphical user interface (which I plan to work on later) or Emacs (a big
>> like `guix-packages-by-name', but more general).  Those interface would
>> allow the user to chain searches by narrowing down lists.  What you
>> print in the end is irrelevant since you can have an interactive
>> presentation (unlike the shell).
>
> ...but at some point you need some semantic for filtering, at least for regexp.

We can have regexp of course, that's not a problem.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 14:51                       ` zimoun
@ 2020-01-09 15:41                         ` Pierre Neidhardt
  2020-01-09 17:04                           ` zimoun
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 15:41 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 991 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

>> The benefit of "/" is that it works _incidentally_.  If you are looking
>> for "bin/hg", then `guix search bin/hg` will do the right thing.
>
> I agree.
>
> To be clear, to search the binary 'hg', I find clearer "guix search bin/hg".
> However, to search any file which you do not the path, I find clearer
> "guix search file:foo.h".

To be clear, you don't need to know the path.  It's enough to know the
basename, e.g. `guix search /foo.h`.

>> What I meant is that we already have a subcommand that outputs a
>> property of the given packages, i.e. "guix size".  If I'm not mistaken,
>> there is no "guix package" flag that displays any property for the given
>> packages.
>
> You are suggesting "guix size emacs --list-files", right?

No, I'm saying that if we follow the current approach for printing our
package properties, we should have

  guix list-files emacs


-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 11:19                   ` Pierre Neidhardt
  2020-01-09 12:24                     ` zimoun
@ 2020-01-09 16:49                     ` Christopher Baines
  2020-01-10 12:35                       ` Pierre Neidhardt
  1 sibling, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-09 16:49 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

>> I think the Guix Data Service is a good fit since it knows about
>> packages, derivations, commits, and how they map to each other.  :-)  It
>> could download nars and do the equivalent of ‘guix archive -t’ to get
>> the list of file names.
>
> Are you suggesting that guix "filesearch" polls a specific instance of
> the Guix Data Service (e.g. data.guix.gnu.org) to download the file
> index fro the current Guix revision?

So, to elaborate a bit more on the architecture I've had in mind for
dealing with the actual nars…

I see the scope of the Guix Data Service extending as far as what nars
are available for outputs, and what outputs are associated with each
revision, but I don't think it should store the actual nar files.

What you could have is another service, which subscribes to the Guix
Data Service to find out about new revisions and nars (from build
servers). When this new service finds out about Guix revisions, it would
ask this Guix Data Service for all the outputs, and store this away in a
database. When it finds out about nars, it would download them, and
maybe extract out the list of files.

I think this setup would allow this new service to construct a file
containing information about all files in all the outputs for a
revision, which it has nars available for. This file could then be
downloaded, and searched through when you want to find which output
contains a file.

> What if the file index for a specific Guix commit (e.g. a very recent
> one) is not yet available?  I suggest we fall back to the first older
> index that's available, with a warning.  Thoughts?

Sounds sensible.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 15:38                             ` Pierre Neidhardt
@ 2020-01-09 16:59                               ` zimoun
  0 siblings, 0 replies; 47+ messages in thread
From: zimoun @ 2020-01-09 16:59 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

On Thu, 9 Jan 2020 at 16:38, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> > I am looking for the file gmsh.h and I do not know nothing more.
>
> --8<---------------cut here---------------start------------->8---
> guix search /gmsh.h
> --8<---------------cut here---------------end--------------->8---
>
> would work.  You don't need the full path.

I do not like because it is not meaningful. It appears to me more
confusing than the rare case of "file:file:". ;-)


> >> >>  - guix search file:ieee*.sty bin/gmsh latex
> >
> > I know nothing about the path of the file ieee*.sty and it does not
> > belong to the package gmsh.
>
> I don't understand what you are trying to search.  Is it the 'bin/gmsh'
> file or the files matching 'ieee*.sty'?

In this (virtual and non-sensical) example, I am looking for packages
that contains any file matching ieee*.sty *and* any file matching
"bin/gmsh" *and* any package that matches latex in its
name/synopsis/description.

I do not see why the both syntax can live together. To me, sometimes
"file:" is clearer, sometimes instead "/" is.



All the best,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 15:41                         ` Pierre Neidhardt
@ 2020-01-09 17:04                           ` zimoun
  2020-01-09 17:27                             ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: zimoun @ 2020-01-09 17:04 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel

On Thu, 9 Jan 2020 at 16:41, Pierre Neidhardt <mail@ambrevar.xyz> wrote:
>
> zimoun <zimon.toutoune@gmail.com> writes:
>
> >> The benefit of "/" is that it works _incidentally_.  If you are looking
> >> for "bin/hg", then `guix search bin/hg` will do the right thing.
> >
> > I agree.
> >
> > To be clear, to search the binary 'hg', I find clearer "guix search bin/hg".
> > However, to search any file which you do not the path, I find clearer
> > "guix search file:foo.h".
>
> To be clear, you don't need to know the path.  It's enough to know the
> basename, e.g. `guix search /foo.h`.

I do not find "/foo.h" clear. I prefer "file:foo.h".

What I naturally do is:

 - guix search bin/hg
 - guix search file:hg

It appears to me awkward to type "guix search /hg". But I can live with. :-)


> >> What I meant is that we already have a subcommand that outputs a
> >> property of the given packages, i.e. "guix size".  If I'm not mistaken,
> >> there is no "guix package" flag that displays any property for the given
> >> packages.
> >
> > You are suggesting "guix size emacs --list-files", right?
>
> No, I'm saying that if we follow the current approach for printing our
> package properties, we should have
>
>   guix list-files emacs

Sorry to be slow but I do not understand why a complete subcommand is required?

To me, it seems better to add an "--list-files" to "guix package" or
"guix show".


Cheers,
simon

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 17:04                           ` zimoun
@ 2020-01-09 17:27                             ` Pierre Neidhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-09 17:27 UTC (permalink / raw)
  To: zimoun; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

zimoun <zimon.toutoune@gmail.com> writes:

> Sorry to be slow but I do not understand why a complete subcommand is required?
>
> To me, it seems better to add an "--list-files" to "guix package" or
> "guix show".

Oh, forgot that we had `guix show`, a.k.a. `guix package --show=`.
Fair enough :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-09 16:49                     ` Christopher Baines
@ 2020-01-10 12:35                       ` Pierre Neidhardt
  2020-01-10 13:30                         ` Christopher Baines
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-10 12:35 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1744 bytes --]

Christopher Baines <mail@cbaines.net> writes:

> So, to elaborate a bit more on the architecture I've had in mind for
> dealing with the actual nars…
>
> I see the scope of the Guix Data Service extending as far as what nars
> are available for outputs, and what outputs are associated with each
> revision, but I don't think it should store the actual nar files.
>
> What you could have is another service, which subscribes to the Guix
> Data Service to find out about new revisions and nars (from build
> servers). When this new service finds out about Guix revisions, it would
> ask this Guix Data Service for all the outputs, and store this away in a
> database. When it finds out about nars, it would download them, and
> maybe extract out the list of files.
>
> I think this setup would allow this new service to construct a file
> containing information about all files in all the outputs for a
> revision, which it has nars available for. This file could then be
> downloaded, and searched through when you want to find which output
> contains a file.

Tell me if I understood you correctly: in this scenario we would modify
the Guix derivation process to store the file list in the nars.  Is this correct?

Question about the Guix Data Service: I suppose that the information about the outputs
of a given revision is built incrementally, i.e. as they get published
by the build farm.  Is this correct?

If so, then the file index service needs to update the database
incrementally as well.  So we need some entry point to fetch the
information delta between now and the last time we fetch the information.

Please correct me if I got it all wrong! :D

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-10 12:35                       ` Pierre Neidhardt
@ 2020-01-10 13:30                         ` Christopher Baines
  2020-01-11 18:26                           ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-10 13:30 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2986 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Christopher Baines <mail@cbaines.net> writes:
>
>> So, to elaborate a bit more on the architecture I've had in mind for
>> dealing with the actual nars…
>>
>> I see the scope of the Guix Data Service extending as far as what nars
>> are available for outputs, and what outputs are associated with each
>> revision, but I don't think it should store the actual nar files.
>>
>> What you could have is another service, which subscribes to the Guix
>> Data Service to find out about new revisions and nars (from build
>> servers). When this new service finds out about Guix revisions, it would
>> ask this Guix Data Service for all the outputs, and store this away in a
>> database. When it finds out about nars, it would download them, and
>> maybe extract out the list of files.
>>
>> I think this setup would allow this new service to construct a file
>> containing information about all files in all the outputs for a
>> revision, which it has nars available for. This file could then be
>> downloaded, and searched through when you want to find which output
>> contains a file.
>
> Tell me if I understood you correctly: in this scenario we would modify
> the Guix derivation process to store the file list in the nars.  Is this correct?

Not quite. As Ludo mentioned, you can trivially extract out the file
list from nar files already (like guix archive -t). So this new service
I'm thinking about which stores the nar files, would be able to read the
list of files from the nar.

> Question about the Guix Data Service: I suppose that the information about the outputs
> of a given revision is built incrementally, i.e. as they get published
> by the build farm.  Is this correct?

So the information about what outputs the derivations produce is
completely available once the revision has been loaded.

However, nar files for those outputs become available as build farms
build them, and the Guix Data Service hopefully finds out about these
soon after. Also, since the build could be non-deterministic, it's
possible for multiple different nar files to be generated for the same
output (maybe even containing different files in the extreme case!).

> If so, then the file index service needs to update the database
> incrementally as well.  So we need some entry point to fetch the
> information delta between now and the last time we fetch the information.

Yeah, you might fetch the file list database, and then another
derivation is built, revealing which files are in the associated
outputs. A new database can then be constructed containing that
additional information.

In the trivial case, the new file could be downloaded to replace the old
one. This could maybe be optimised by just downloading the changes,
maybe by using something like xdelta.

> Please correct me if I got it all wrong! :D

All great questions, hopefully I've managed to clear things up!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-10 13:30                         ` Christopher Baines
@ 2020-01-11 18:26                           ` Pierre Neidhardt
  2020-01-12 13:29                             ` Christopher Baines
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-11 18:26 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 816 bytes --]

Christopher Baines <mail@cbaines.net> writes:

> Not quite. As Ludo mentioned, you can trivially extract out the file
> list from nar files already (like guix archive -t). So this new service
> I'm thinking about which stores the nar files, would be able to read the
> list of files from the nar.

To clarify, you mean the hypothetical new service for file indexing?
Or did you mean for the Guix Data Service?

>> Please correct me if I got it all wrong! :D
>
> All great questions, hopefully I've managed to clear things up!

You did, thanks!

Another practical question: what would be the preferred format for the
database?

SQLite?  Custom binary?  Plain text?

Any pointer to how Nix does it?

Where would we store this database?  In /var?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-11 18:26                           ` Pierre Neidhardt
@ 2020-01-12 13:29                             ` Christopher Baines
  2020-01-13 14:28                               ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-12 13:29 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Christopher Baines <mail@cbaines.net> writes:
>
>> Not quite. As Ludo mentioned, you can trivially extract out the file
>> list from nar files already (like guix archive -t). So this new service
>> I'm thinking about which stores the nar files, would be able to read the
>> list of files from the nar.
>
> To clarify, you mean the hypothetical new service for file indexing?
> Or did you mean for the Guix Data Service?

This new hypothetical service. The Guix Data Service just knows about
the existence of some nar files, and the hashes, but I'm not thinking of
having it store the contents, hence having another service to do that.

>>> Please correct me if I got it all wrong! :D
>>
>> All great questions, hopefully I've managed to clear things up!
>
> You did, thanks!
>
> Another practical question: what would be the preferred format for the
> database?
>
> SQLite?  Custom binary?  Plain text?

Architecturally, I see this part of the problem as something that's more
flexible as you could export multiple formats.

Ideally, you'd have some data structure optimised for searching, then
maybe some way of looking up attributes of the outputs (like what
package generates them) and files (what's the size).

Maybe sqlite is one to try initially. There's guile-sqlite3 for reading
and writing, and it can contain multiple tables as well as indexes for
fast searching.

> Where would we store this database?  In /var?

Per user is probably most flexible, so in the home directory somewhere.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-12 13:29                             ` Christopher Baines
@ 2020-01-13 14:28                               ` Pierre Neidhardt
  2020-01-13 17:57                                 ` Christopher Baines
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-13 14:28 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

Christopher Baines <mail@cbaines.net> writes:

> Maybe sqlite is one to try initially. There's guile-sqlite3 for reading
> and writing, and it can contain multiple tables as well as indexes for
> fast searching.
>
>> Where would we store this database?  In /var?
>
> Per user is probably most flexible, so in the home directory somewhere.

Hmm, but this data relates to items in the store and online, they are
global.  Per-user would mean redundant packages and redundant (remote) queries.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-13 14:28                               ` Pierre Neidhardt
@ 2020-01-13 17:57                                 ` Christopher Baines
  2020-01-13 18:21                                   ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-13 17:57 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 636 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Christopher Baines <mail@cbaines.net> writes:
>
>> Maybe sqlite is one to try initially. There's guile-sqlite3 for reading
>> and writing, and it can contain multiple tables as well as indexes for
>> fast searching.
>>
>>> Where would we store this database?  In /var?
>>
>> Per user is probably most flexible, so in the home directory somewhere.
>
> Hmm, but this data relates to items in the store and online, they are
> global.  Per-user would mean redundant packages and redundant (remote) queries.

Yeah, maybe there's some way of optimising things for systems with
multiple users.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-13 17:57                                 ` Christopher Baines
@ 2020-01-13 18:21                                   ` Pierre Neidhardt
  2020-01-13 19:45                                     ` Christopher Baines
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-13 18:21 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 469 bytes --]

Christopher Baines <mail@cbaines.net> writes:

>> Hmm, but this data relates to items in the store and online, they are
>> global.  Per-user would mean redundant packages and redundant (remote) queries.
>
> Yeah, maybe there's some way of optimising things for systems with
> multiple users.

Note: I meant "redundant database", not "packages".

Can you think of good reasons not to store it globally in /var?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-13 18:21                                   ` Pierre Neidhardt
@ 2020-01-13 19:45                                     ` Christopher Baines
  2020-01-14  9:21                                       ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Christopher Baines @ 2020-01-13 19:45 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]


Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Christopher Baines <mail@cbaines.net> writes:
>
>>> Hmm, but this data relates to items in the store and online, they are
>>> global.  Per-user would mean redundant packages and redundant (remote) queries.
>>
>> Yeah, maybe there's some way of optimising things for systems with
>> multiple users.
>
> Note: I meant "redundant database", not "packages".
>
> Can you think of good reasons not to store it globally in /var?

Not specifically, it just becomes more complex as you have to consider
more issues. Like the mundane "what if one user is using a file, and
another deletes it" to the more unusual "what if you can modify the file
to crash or exploit processes run by other users".

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-13 19:45                                     ` Christopher Baines
@ 2020-01-14  9:21                                       ` Pierre Neidhardt
  0 siblings, 0 replies; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-14  9:21 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

Christopher Baines <mail@cbaines.net> writes:

> Not specifically, it just becomes more complex as you have to consider
> more issues. Like the mundane "what if one user is using a file, and
> another deletes it" to the more unusual "what if you can modify the file
> to crash or exploit processes run by other users".

What I have in mind is that only the daemon could update the database.
Regarding the concurrency issue, a mutex should do it I think.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2019-03-23 16:27       ` Package file indexing Ludovic Courtès
  2019-03-25  8:46         ` Pierre Neidhardt
@ 2020-01-15 16:23         ` Pierre Neidhardt
  2020-01-15 17:27           ` Nicolò Balzarotti
  1 sibling, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-15 16:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> You should look at how NixOS does it for its ‘command-not-found’ support
> (I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
> database, but it’s a pretty ad-hoc mechanism without authentication.

I haven't found this yet, but I found this instead:

https://github.com/bennofs/nix-index/pulls

Tobias, are you sure Nix has such a feature?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-15 16:23         ` Pierre Neidhardt
@ 2020-01-15 17:27           ` Nicolò Balzarotti
  2020-01-15 18:02             ` Pierre Neidhardt
  0 siblings, 1 reply; 47+ messages in thread
From: Nicolò Balzarotti @ 2020-01-15 17:27 UTC (permalink / raw)
  To: Pierre Neidhardt, Ludovic Courtès; +Cc: Guix-devel

Hi Pierre,
on NixOS, if you try to run the name of a program that you don't have
installed (eg: $ endlessh) you get:

The program ‘endlessh’ is currently not installed. You can install it by typing:
nix-env -iA nixos.endlessh

program-not-found is a perl script that uses an sqlite file placed
under:
/nix/var/nix/profiles/per-user/root/channels/nixos/programs.sqlite

I don't know how this database is created. Table structure:

CREATE TABLE Programs (
    name        text not null,
    system      text not null,
    package     text not null,
    primary key (name, system, package)
  );

like kbdinfo|i686-linux|kbd

(a nice thing I just found reading it: if you set NIX_AUTO_INSTALL=1 it
automatically spawns a nix-shell with the required package and starts
the program)

Nicolò

Pierre Neidhardt <mail@ambrevar.xyz> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> You should look at how NixOS does it for its ‘command-not-found’ support
>> (I think it’s part of NixOS, not Nix).  IIRC they distribute an SQLite
>> database, but it’s a pretty ad-hoc mechanism without authentication.
>
> I haven't found this yet, but I found this instead:
>
> https://github.com/bennofs/nix-index/pulls
>
> Tobias, are you sure Nix has such a feature?
>
> -- 
> Pierre Neidhardt
> https://ambrevar.xyz/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-15 17:27           ` Nicolò Balzarotti
@ 2020-01-15 18:02             ` Pierre Neidhardt
  2020-01-15 22:14               ` Ludovic Courtès
  0 siblings, 1 reply; 47+ messages in thread
From: Pierre Neidhardt @ 2020-01-15 18:02 UTC (permalink / raw)
  To: Nicolò Balzarotti, Ludovic Courtès; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 564 bytes --]

Thanks Nicolò, your feedback was very useful!

So it's not program-not-found but command-not-found and it's define
here:

  https://github.com/NixOS/nixpkgs/tree/master/nixos/modules/programs/command-not-found

Then I found this

  https://github.com/NixOS/nixos-channel-scripts

All this is pretty clear and simple.

The main thing I wonder at this point is when the
"generate-programs-index.cc" file is run.

What would be the good entry point in substitute servers to populate
such a database?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Package file indexing
  2020-01-15 18:02             ` Pierre Neidhardt
@ 2020-01-15 22:14               ` Ludovic Courtès
  0 siblings, 0 replies; 47+ messages in thread
From: Ludovic Courtès @ 2020-01-15 22:14 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: Guix-devel, Nicolò Balzarotti

Hello,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> Thanks Nicolò, your feedback was very useful!
>
> So it's not program-not-found but command-not-found and it's define
> here:
>
>   https://github.com/NixOS/nixpkgs/tree/master/nixos/modules/programs/command-not-found
>
> Then I found this
>
>   https://github.com/NixOS/nixos-channel-scripts
>
> All this is pretty clear and simple.
>
> The main thing I wonder at this point is when the
> "generate-programs-index.cc" file is run.
>
> What would be the good entry point in substitute servers to populate
> such a database?

The database could be updated upon reception of a build-completion
notification from build machines or from the Data Service, like Chris
proposed.  (Or, if it were ‘guix publish’, it could do that on demand,
the first time a narinfo is requested.)

Ludo’.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2020-01-15 22:15 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14 18:31 Improve package search mikadoZero
2019-03-14 20:49 ` Leo Famulari
2019-03-14 22:01   ` Tobias Geerinckx-Rice
2019-03-14 22:09     ` Tobias Geerinckx-Rice
2019-03-14 22:46     ` Pierre Neidhardt
2019-03-14 23:09       ` Tobias Geerinckx-Rice
2019-03-23 16:27       ` Package file indexing Ludovic Courtès
2019-03-25  8:46         ` Pierre Neidhardt
2019-03-26 12:41           ` Ludovic Courtès
2020-01-02 17:12             ` Pierre Neidhardt
2020-01-02 19:15               ` Christopher Baines
2020-01-03 11:26                 ` Ludovic Courtès
2020-01-09 11:19                   ` Pierre Neidhardt
2020-01-09 12:24                     ` zimoun
2020-01-09 13:01                       ` Pierre Neidhardt
2020-01-09 16:49                     ` Christopher Baines
2020-01-10 12:35                       ` Pierre Neidhardt
2020-01-10 13:30                         ` Christopher Baines
2020-01-11 18:26                           ` Pierre Neidhardt
2020-01-12 13:29                             ` Christopher Baines
2020-01-13 14:28                               ` Pierre Neidhardt
2020-01-13 17:57                                 ` Christopher Baines
2020-01-13 18:21                                   ` Pierre Neidhardt
2020-01-13 19:45                                     ` Christopher Baines
2020-01-14  9:21                                       ` Pierre Neidhardt
2020-01-02 22:50               ` zimoun
2020-01-03 16:00                 ` raingloom
2020-01-06 16:56                   ` zimoun
2020-01-09 13:01                     ` Pierre Neidhardt
2020-01-09 13:53                       ` zimoun
2020-01-09 14:14                         ` Pierre Neidhardt
2020-01-09 14:36                           ` zimoun
2020-01-09 15:38                             ` Pierre Neidhardt
2020-01-09 16:59                               ` zimoun
2020-01-09 12:57                   ` Pierre Neidhardt
2020-01-09 12:55                 ` Pierre Neidhardt
2020-01-09 14:05                   ` zimoun
2020-01-09 14:21                     ` Pierre Neidhardt
2020-01-09 14:51                       ` zimoun
2020-01-09 15:41                         ` Pierre Neidhardt
2020-01-09 17:04                           ` zimoun
2020-01-09 17:27                             ` Pierre Neidhardt
2020-01-15 16:23         ` Pierre Neidhardt
2020-01-15 17:27           ` Nicolò Balzarotti
2020-01-15 18:02             ` Pierre Neidhardt
2020-01-15 22:14               ` Ludovic Courtès
2019-03-16  2:11     ` Improve package search mikadoZero

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.