unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Getting rid of the mandb profile hook?
@ 2020-12-05 18:31 Ludovic Courtès
  2020-12-05 18:51 ` Pierre Neidhardt
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-05 18:31 UTC (permalink / raw)
  To: Guix Devel

Hi Guix!

I was inspired by Michael Stapelberg’s talk recently shared on IRC¹
(well worth watching!).  One of the takeaways for me is that many
actions should be done lazily, in particular populating caches.

‘guix install’ & co. spend a significant time populating such caches, in
particular the XDG caches² and the manual page database (mandb).

I’m thinking we could get rid of the mandb hook.  However, the
functionality matters IMO (we need good tools so users can browse local
documentation; mandb is not that good but better than no search
mechanism.)  Here are several options that come to mind:

  1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
     the database gets built on the first use of ‘man -k’, unless it’s
     already up-to-date.

  2. Add a phase in gnu-build-system.scm that creates a per-package
     database.  Change the mandb profile hook such that all it needs to
     do is “concatenate” all these GDBM databases (which should be much
     faster than browsing all the man pages as it currently does).

There are crazier option that came to mind but let’s ignore them for
now.

Thoughts?  :-)

Ludo’.

¹ “distri: researching fast Linux package management”
  https://media.ccc.de/v/arch-conf-online-2020-6387-distri-researching-fast-linux-package-management

² https://issues.guix.gnu.org/44053#4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2020-12-05 18:31 Getting rid of the mandb profile hook? Ludovic Courtès
@ 2020-12-05 18:51 ` Pierre Neidhardt
  2020-12-05 20:18 ` Ryan Prior
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Pierre Neidhardt @ 2020-12-05 18:51 UTC (permalink / raw)
  To: Ludovic Courtès, Guix Devel

[-- Attachment #1: Type: text/plain, Size: 107 bytes --]

I think it's a very good idea, I'm all for it!

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2020-12-05 18:31 Getting rid of the mandb profile hook? Ludovic Courtès
  2020-12-05 18:51 ` Pierre Neidhardt
@ 2020-12-05 20:18 ` Ryan Prior
  2020-12-06  0:37 ` Ricardo Wurmus
  2021-02-27 13:05 ` Maxim Cournoyer
  3 siblings, 0 replies; 10+ messages in thread
From: Ryan Prior @ 2020-12-05 20:18 UTC (permalink / raw)
  To: Development of GNU Guix and the GNU System distribution,
	Ludovic Courtès

[-- Attachment #1: Type: text/plain, Size: 731 bytes --]

On December 5, 2020, "Ludovic Courtès" <ludo@gnu.org> wrote:
> many actions should be done lazily, in particular populating caches.

Absolutely.

> I’m thinking we could get rid of the mandb hook.

Please.

>  1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>  the database gets built on the first use of ‘man -k’, unless it’s
>  already up-to-date.

I vote for this one. Anything work we can defer to make package
operations near-instantaneous will help me make Guix a seamless part of
my computing workflows. As things stand, adopting Guix comes along with
regular "pause to wait for Guix to think about something you probably
don't care about" breaks.

[-- Attachment #2: Type: text/html, Size: 3121 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2020-12-05 18:31 Getting rid of the mandb profile hook? Ludovic Courtès
  2020-12-05 18:51 ` Pierre Neidhardt
  2020-12-05 20:18 ` Ryan Prior
@ 2020-12-06  0:37 ` Ricardo Wurmus
  2020-12-08 10:47   ` Ludovic Courtès
  2021-02-27 13:05 ` Maxim Cournoyer
  3 siblings, 1 reply; 10+ messages in thread
From: Ricardo Wurmus @ 2020-12-06  0:37 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Ludovic Courtès <ludo@gnu.org> writes:

> I’m thinking we could get rid of the mandb hook.

Yes, please!

>   1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>      the database gets built on the first use of ‘man -k’, unless it’s
>      already up-to-date.
>
>   2. Add a phase in gnu-build-system.scm that creates a per-package
>      database.  Change the mandb profile hook such that all it needs to
>      do is “concatenate” all these GDBM databases (which should be much
>      faster than browsing all the man pages as it currently does).

Either of these seem fine to me.  I think option 2 would be nicer as we
don’t need to modify “man” and most of the work is done ahead of time.
I don’t know if these individual mandb database *can* simply be
concatenated.  If this turns out to be much more complicated I think we
should just go with option 1.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2020-12-06  0:37 ` Ricardo Wurmus
@ 2020-12-08 10:47   ` Ludovic Courtès
  0 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2020-12-08 10:47 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> I’m thinking we could get rid of the mandb hook.
>
> Yes, please!

I see consensus against that hook.  :-)

>>   1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>>      the database gets built on the first use of ‘man -k’, unless it’s
>>      already up-to-date.
>>
>>   2. Add a phase in gnu-build-system.scm that creates a per-package
>>      database.  Change the mandb profile hook such that all it needs to
>>      do is “concatenate” all these GDBM databases (which should be much
>>      faster than browsing all the man pages as it currently does).
>
> Either of these seem fine to me.  I think option 2 would be nicer as we
> don’t need to modify “man” and most of the work is done ahead of time.
> I don’t know if these individual mandb database *can* simply be
> concatenated.  If this turns out to be much more complicated I think we
> should just go with option 1.

It’s “concatenated” in the sense of building the union of all the
key/value entries.  So it’s not for free either but certainly much less
expensive than what we’re doing.

I’ll probably take a look at some point, but if another person tired of
waiting for the hook would like to give it a try, please do!  :-)

Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2020-12-05 18:31 Getting rid of the mandb profile hook? Ludovic Courtès
                   ` (2 preceding siblings ...)
  2020-12-06  0:37 ` Ricardo Wurmus
@ 2021-02-27 13:05 ` Maxim Cournoyer
  2021-03-03 14:13   ` Ludovic Courtès
  3 siblings, 1 reply; 10+ messages in thread
From: Maxim Cournoyer @ 2021-02-27 13:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hi Guix!
>
> I was inspired by Michael Stapelberg’s talk recently shared on IRC¹
> (well worth watching!).  One of the takeaways for me is that many
> actions should be done lazily, in particular populating caches.
>
> ‘guix install’ & co. spend a significant time populating such caches, in
> particular the XDG caches² and the manual page database (mandb).
>
> I’m thinking we could get rid of the mandb hook.  However, the
> functionality matters IMO (we need good tools so users can browse local
> documentation; mandb is not that good but better than no search
> mechanism.)  Here are several options that come to mind:
>
>   1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>      the database gets built on the first use of ‘man -k’, unless it’s
>      already up-to-date.

That would mean the database would live in some user-specific writable
area of the file system correct (where?), right?  And could use the
common 'update' mechanism of man-db to make it as fast as possible.

This sounds good from a performance perpective, but could introduce
cache issues every now and then (if man-db changes a lot).  I wouldn't
expect much problem given how mature man-db is, but that's one thing to
consider.

>   2. Add a phase in gnu-build-system.scm that creates a per-package
>      database.  Change the mandb profile hook such that all it needs to
>      do is “concatenate” all these GDBM databases (which should be much
>      faster than browsing all the man pages as it currently does).

I like that idea better, but I don't know how feasible it would be.

> There are crazier option that came to mind but let’s ignore them for
> now.

What is taking so much time anyway?  Why is generating this database so
compute intensive?  I don't grok why it should be so inefficient to scan
a union'd tree for expected prefixes and append a bunch of file names
together.

> Thoughts?  :-)

Lazily doing things seems a good idea in general to make the experience
more snappy.  Thanks for looking into it!

Maxim


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2021-02-27 13:05 ` Maxim Cournoyer
@ 2021-03-03 14:13   ` Ludovic Courtès
  2021-03-03 20:50     ` Brice Waegeneire
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-03 14:13 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Guix Devel

Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi Guix!
>>
>> I was inspired by Michael Stapelberg’s talk recently shared on IRC¹
>> (well worth watching!).  One of the takeaways for me is that many
>> actions should be done lazily, in particular populating caches.
>>
>> ‘guix install’ & co. spend a significant time populating such caches, in
>> particular the XDG caches² and the manual page database (mandb).
>>
>> I’m thinking we could get rid of the mandb hook.  However, the
>> functionality matters IMO (we need good tools so users can browse local
>> documentation; mandb is not that good but better than no search
>> mechanism.)  Here are several options that come to mind:
>>
>>   1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>>      the database gets built on the first use of ‘man -k’, unless it’s
>>      already up-to-date.
>
> That would mean the database would live in some user-specific writable
> area of the file system correct (where?), right?  And could use the
> common 'update' mechanism of man-db to make it as fast as possible.
>
> This sounds good from a performance perpective, but could introduce
> cache issues every now and then (if man-db changes a lot).  I wouldn't
> expect much problem given how mature man-db is, but that's one thing to
> consider.

I looked a bit at man-db, thinking it must have that already done more
or less.  Indeed, one can run “mandb -uc” to create the database.

The problem is that it insists on writing databases and ‘CACHEDIR.TAG’
files in the same directory as man pages.  In our case, these are all
read-only, so just prints a warning for each directory and keeps going.

It looks like man-db is not written with a situation like ours in mind.

>>   2. Add a phase in gnu-build-system.scm that creates a per-package
>>      database.  Change the mandb profile hook such that all it needs to
>>      do is “concatenate” all these GDBM databases (which should be much
>>      faster than browsing all the man pages as it currently does).
>
> I like that idea better, but I don't know how feasible it would be.

Yeah, dunno.

> What is taking so much time anyway?  Why is generating this database so
> compute intensive?  I don't grok why it should be so inefficient to scan
> a union'd tree for expected prefixes and append a bunch of file names
> together.

‘mandb-entries’ in (guix man-db) needs to open all the man pages in the
profile, decompress them, and read their header.  When there are many
man pages, that’s a lot of I/O and CPU usage.

One option I contemplated at one point is to simply have fewer man pages
in the first place.  :-)  There were packages that install man pages
when they shouldn’t.  This led to commits like
305eefc0627eb1d047e6fc4320d7e56897719ab8 and
4b797193d7508ddc53bb1ff7a267a0d50c1fe298 (and parent commits).

But even with that, this mandb hook will always get in the way, even
though few people use ‘man -k’.

I think we need a better solution to the whole “search for
documentation” problem.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2021-03-03 14:13   ` Ludovic Courtès
@ 2021-03-03 20:50     ` Brice Waegeneire
  2021-03-10 10:20       ` Ludovic Courtès
  2021-04-02 16:56       ` Ludovic Courtès
  0 siblings, 2 replies; 10+ messages in thread
From: Brice Waegeneire @ 2021-03-03 20:50 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel, Maxim Cournoyer

Hello Ludovic,

On 2021-03-03 15:13, Ludovic Courtès wrote:
>>> I’m thinking we could get rid of the mandb hook.  However, the
>>> functionality matters IMO (we need good tools so users can browse 
>>> local
>>> documentation; mandb is not that good but better than no search
>>> mechanism.)  Here are several options that come to mind:
>>> 
>>>   1. Provide a ‘man’ wrapper or modify the ‘man-db’ package such that
>>>      the database gets built on the first use of ‘man -k’, unless 
>>> it’s
>>>      already up-to-date.
>> 
>> That would mean the database would live in some user-specific writable
>> area of the file system correct (where?), right?  And could use the
>> common 'update' mechanism of man-db to make it as fast as possible.
>> 
>> This sounds good from a performance perpective, but could introduce
>> cache issues every now and then (if man-db changes a lot).  I wouldn't
>> expect much problem given how mature man-db is, but that's one thing 
>> to
>> consider.
> 
> I looked a bit at man-db, thinking it must have that already done more
> or less.  Indeed, one can run “mandb -uc” to create the database.
> 
> The problem is that it insists on writing databases and ‘CACHEDIR.TAG’
> files in the same directory as man pages.  In our case, these are all
> read-only, so just prints a warning for each directory and keeps going.
> 
> It looks like man-db is not written with a situation like ours in mind.

What about using mandoc¹, the manpage compiler from OpenBSD, instead of
man-db? As from it's manual it support specifying the database location:

“makewhatis -d dir [file ...]”²

It isn't packaged in Guix yet, but other Linux distros have done it, 
some
  are even using it as their default.

> [...]
> 
> One option I contemplated at one point is to simply have fewer man 
> pages
> in the first place.  :-)  There were packages that install man pages
> when they shouldn’t.  This led to commits like
> 305eefc0627eb1d047e6fc4320d7e56897719ab8 and
> 4b797193d7508ddc53bb1ff7a267a0d50c1fe298 (and parent commits).

More outputs would be great tho having a way to force the installation 
of
specifics outputs for every installed package would improve quality of
live.  For a specific example in that case, when installing ncurses from
  the cli it would install it's man output too if you always want man 
page
  to be installed.

¹ https://mandoc.bsd.lv/
² https://www.mankier.com/8/makewhatis.mandoc

Cheers,
- Brice


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2021-03-03 20:50     ` Brice Waegeneire
@ 2021-03-10 10:20       ` Ludovic Courtès
  2021-04-02 16:56       ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-03-10 10:20 UTC (permalink / raw)
  To: Brice Waegeneire; +Cc: Guix Devel, Maxim Cournoyer

Hi Brice,

Brice Waegeneire <brice@waegenei.re> skribis:

> On 2021-03-03 15:13, Ludovic Courtès wrote:

[...]

>> I looked a bit at man-db, thinking it must have that already done
>> more
>> or less.  Indeed, one can run “mandb -uc” to create the database.
>> The problem is that it insists on writing databases and
>> ‘CACHEDIR.TAG’
>> files in the same directory as man pages.  In our case, these are all
>> read-only, so just prints a warning for each directory and keeps going.
>> It looks like man-db is not written with a situation like ours in
>> mind.
>
> What about using mandoc¹, the manpage compiler from OpenBSD, instead of
> man-db? As from it's manual it support specifying the database location:
>
> “makewhatis -d dir [file ...]”²
>
> It isn't packaged in Guix yet, but other Linux distros have done it,
> some
>  are even using it as their default.

Sounds like a plan!  We’d need to update the “Documentation” node in the
manual accordingly.

Do you want to give it a try?

>> [...]
>> One option I contemplated at one point is to simply have fewer man 
>> pages
>> in the first place.  :-)  There were packages that install man pages
>> when they shouldn’t.  This led to commits like
>> 305eefc0627eb1d047e6fc4320d7e56897719ab8 and
>> 4b797193d7508ddc53bb1ff7a267a0d50c1fe298 (and parent commits).
>
> More outputs would be great tho having a way to force the installation
> of
> specifics outputs for every installed package would improve quality of
> live.  For a specific example in that case, when installing ncurses from
>  the cli it would install it's man output too if you always want man
>  page
>  to be installed.

Hmm sounds tricky (and kinda unpredictable, too).

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Getting rid of the mandb profile hook?
  2021-03-03 20:50     ` Brice Waegeneire
  2021-03-10 10:20       ` Ludovic Courtès
@ 2021-04-02 16:56       ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2021-04-02 16:56 UTC (permalink / raw)
  To: Brice Waegeneire; +Cc: Guix Devel, Maxim Cournoyer

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

Hello!

Brice Waegeneire <brice@waegenei.re> skribis:

> On 2021-03-03 15:13, Ludovic Courtès wrote:
>>>> I’m thinking we could get rid of the mandb hook.  However, the

[...]

> What about using mandoc¹, the manpage compiler from OpenBSD, instead of
> man-db? As from it's manual it support specifying the database location:
>
> “makewhatis -d dir [file ...]”²

I recently packaged it, but I’m not impressed; I’m not even sure how
that’s supposed to work:

  $ guix environment --ad-hoc mandoc -- makewhatis -d /tmp/foo $(find -L ~/.guix-profile/share/man -name \*.[0-9].gz)

exits successfully but does nothing.

At this point my preference would be to build a custom tool (I’m not
aware of any existing tool like that, but if you do, please share) that
would lazily build a database, ideally full-text, and search through it;
attached a super rough example that uses Guile-Xapian and inserts man-db
synopses into a Xapian database.

The tool would index man pages and Info pages.  It would be smart enough
to index only info/man files that have actually changed (it could look
at the inode number to determine in a way that avoids unnecessary cache
invalidation).  I’m not sure how to implement this part though.  It
sounds like a good hack for our Xapian experts—I’m looking at you Arun,
Ricardo, zimoun.  :-)

Thoughts?

I’d really like to have a rough solution so we can remove the
‘manual-database’ hook in time for the release.

Thoughts?

Ludo’.


[-- Attachment #2: Xapian as a man-db replacement --]
[-- Type: text/plain, Size: 3425 bytes --]

(use-modules (xapian wrap)
             (xapian xapian)
             (ice-9 match)
             (guix man-db)
             (srfi srfi-1)
             (srfi srfi-26))

;; eval: (put 'call-with-writable-database 'scheme-indent-function 1)

(define (index-mandb-entry db entry)
  (define (mandb-entry-id-term entry)
    (string-append "Q" "man:" (mandb-entry-name entry) "."
                   (number->string
                    (mandb-entry-section entry))))

  (when (mandb-entry-name entry)
    (let* ((idterm (mandb-entry-id-term entry))
           (doc (make-document
                 #:data (object->string
                         `((name . ,(mandb-entry-name entry))
                           (section . ,(number->string
                                        (mandb-entry-section entry)))
                           (file . ,(canonicalize-path
                                     (mandb-entry-file-name entry)))))
                 #:terms `((,idterm . 0))))
           (term-generator
            (make-term-generator #:stem (make-stem "en")
                                 #:document doc)))
      (index-text! term-generator (mandb-entry-name entry) #:prefix "A")
      (index-text! term-generator
                   (number->string (mandb-entry-section entry))
                   #:prefix "B")
      (index-text! term-generator (mandb-entry-synopsis entry))
      (replace-document! db idterm doc))))

(define (index-mandb-entries)
  (call-with-writable-database "/tmp/db"
    (lambda (db)
      (for-each (cut index-mandb-entry db <>)
                ;; (mandb-entries "/run/current-system/profile/share/man")
                (append-map mandb-entries
                            (string-split (getenv "MANPATH") #\:))
                ))))

(define* (parse-query* querystring #:key stemmer stemming-strategy
                       (prefixes '())
                       (boolean-prefixes '()))
  (let ((queryparser (new-QueryParser)))
    (QueryParser-set-stemmer queryparser stemmer)
    (when stemming-strategy
      (QueryParser-set-stemming-strategy queryparser stemming-strategy))
    (for-each (match-lambda
                ((field . prefix)
                 (QueryParser-add-prefix queryparser field prefix)))
              prefixes)
    (for-each (match-lambda
                ((field . prefix)
                 (QueryParser-add-boolean-prefix queryparser field prefix)))
              boolean-prefixes)
    (let ((query (QueryParser-parse-query queryparser querystring)))
      (delete-QueryParser queryparser)
      query)))

(define* (search querystring #:key (pagesize 100))
  (call-with-database "/tmp/db"
    (lambda (db)
      (let* ((query (parse-query querystring
                                  #:stemmer (make-stem "en")
                                  #:prefixes
                                  '(("name"    . "A")
                                    ("section" . "B"))))
             (enq (enquire db query)))
        ;; (Enquire-set-sort-by-value enq 0 #f)
        (reverse (mset-fold (lambda (item acc)
                              (cons (call-with-input-string
                                        (document-data (mset-item-document item))
                                      read)
                                    acc))
                            '()
                            (enquire-mset enq
                                          #:maximum-items pagesize)))))))

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-02 16:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-05 18:31 Getting rid of the mandb profile hook? Ludovic Courtès
2020-12-05 18:51 ` Pierre Neidhardt
2020-12-05 20:18 ` Ryan Prior
2020-12-06  0:37 ` Ricardo Wurmus
2020-12-08 10:47   ` Ludovic Courtès
2021-02-27 13:05 ` Maxim Cournoyer
2021-03-03 14:13   ` Ludovic Courtès
2021-03-03 20:50     ` Brice Waegeneire
2021-03-10 10:20       ` Ludovic Courtès
2021-04-02 16:56       ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).