unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* How many bytes do we add (closure of guix) when adding one new package?
       [not found]         ` <878rddooy4.fsf@gnu.org>
@ 2023-05-25 18:24           ` Simon Tournier
  2023-05-26 16:21             ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Tournier @ 2023-05-25 18:24 UTC (permalink / raw)
  To: Guix Devel; +Cc: Ludovic Courtès, Andreas Enge

Hi,

The initial discussion was about the closure of Guix and that “guix pull”
brings graphical libraries.  See #63050 [1].

Here, I would like to open a discussion about how Guix scales,
i.e. about its size.  I am trying to answer the question I am asking as
the subject. ;-)

It’s another angle to view Andreas and Ludo discussion: :-)

>> Note that I do not care so much about the closure size, but about the
>> number of packages that are needed to just build guix (although of course
>> the two are related). Or otherwise said, the dependencies for "guix pull".
>
> Yes, understood.  Graphviz is not in the closure anyway, it’s a
> build-only dependency.

Somehow, the closure is increasing:

--8<---------------cut here---------------start------------->8---
$ for i in $(seq 1 4); do guix time-machine --commit=v1.$i.0 -- size guix | grep 'total:' ;done
total: 410.9 MiB
total: 496.0 MiB
total: 564.8 MiB
total: 637.2 MiB

$ guix size guix | grep 'total:'
total: 611.2 MiB
--8<---------------cut here---------------end--------------->8---

(Yeah, the package guix is not exactly the same as guix itself, but it
appears to me a good enough approximation.  And my current revision is
14c0380.)

Compare:

--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=v1.1.0 -- size guix --sort=self | wc -l
44

$ guix time-machine --commit=v1.4.0 -- size guix --sort=self | wc -l
72

$ guix size guix --sort=self | wc -l
70
--8<---------------cut here---------------end--------------->8---

which is the Andreas’s concern for “exotic” architectures.  Moreover,
the inflation (in size) is about some packages that are just becoming
bigger.

--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=v1.1.0 -- size guix --sort=self | head
store item                                                       total    self
/gnu/store/fp16m5hkzql7jwhvnkm1j1i5qch0arhx-guix-1.1.0rc2-1.9d0d27f   410.9   221.6  53.9%
/gnu/store/1mkkv2caiqbdbbd256c4dirfi4kwsacv-guile-2.2.6            123.9    44.4  10.8%
/gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29              37.4    35.8   8.7%
/gnu/store/2plcy91lypnbbysb18ymnhaw3zwk8pg1-gcc-7.4.0-lib           70.0    32.6   7.9%
/gnu/store/n79cf8bvy3k96gjk1rf18d36w40lkwlr-glibc-utf8-locales-2.29    13.9    13.9   3.4%
/gnu/store/k2m4q2av9hw73hw2jx6qrxqdyh855398-openssl-1.1.1c          76.4     6.4   1.6%
/gnu/store/gzp4ig4rdb1qf4i5dy1d9nl0zmj5q09y-ncurses-6.1-20190609    75.9     5.9   1.4%
/gnu/store/hfvz18igm68p5yz7z4asn6ph363blp1z-gnutls-3.6.9           130.6     5.1   1.2%
/gnu/store/b5vjmib411m74lbpf051fnwz3s9zcw79-guile-git-0.3.0         98.8     4.4   1.1%

$ guix time-machine --commit=v1.4.0 -- size guix --sort=self | head
store item                                                       total    self
/gnu/store/9nvx97hr8kkr26gzwni2fblfn0yq0xjw-guix-1.4.0rc2          637.2   330.1  51.8%
/gnu/store/qlmpcy5zi84m6dikq3fnx5dz38qpczlc-guile-3.0.8            130.0    53.0   8.3%
/gnu/store/cnfsv9ywaacyafkqdqsv2ry8f01yr7a9-guile-3.0.7            129.1    52.0   8.2%
/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33              38.3    36.6   5.7%
/gnu/store/094bbaq6glba86h1d4cj16xhdi6fk2jl-gcc-10.3.0-lib          71.7    33.4   5.2%
/gnu/store/96srhmpmxa20wmsck95g3iq4hb3lz4a0-glib-2.70.2             98.1    15.3   2.4%
/gnu/store/mw3py6smb1pk8yx298hd9ivz9lzbksqi-glibc-utf8-locales-2.33    13.9    13.9   2.2%
/gnu/store/5583c2za2jsn9g6az79rnksgvigwnsk7-util-linux-2.37.2-lib    80.7     9.0   1.4%
/gnu/store/9rrnm5hdjw7cy96a2a9rfgh6y08wsbmf-ncurses-6.2.20210619    77.6     5.9   0.9%

$ guix size guix --sort=self | head
store item                                                       total    self
/gnu/store/cgjddvw9zay626z8hyxl0zmn1354c24k-guix-1.4.0-6.dc5430c   611.2   350.2  57.3%
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9            135.0    53.1   8.7%
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35              40.6    38.8   6.3%
/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib          75.3    34.7   5.7%
/gnu/store/nb40pwd37v6i1g4b1fq4l6q4h9px3asr-glib-2.72.3            101.3    14.9   2.4%
/gnu/store/5fmqijrs5f7vx8mc2q2pmq26yvhb74sm-glibc-utf8-locales-2.35    13.9    13.9   2.3%
/gnu/store/gwx2sf5wl9bsl21lwv35g5la63bwyy95-util-linux-2.37.4-lib    84.3     9.0   1.5%
/gnu/store/69wd3pd1hd3j84xr965jj2fk2qmxn0hl-openssl-3.0.8           83.4     8.1   1.3%
/gnu/store/bcc053jvsbspdjr17gnnd9dg85b3a0gy-ncurses-6.2.20210619    81.2     5.9   1.0%
--8<---------------cut here---------------end--------------->8---

Considering Guix itself, one explanation for the increase is the number
of packages – assuming the services and other are negligible; “git diff
--shortstat” is a good indicator at first sight.  Well, we could be more
precise about the documentation.  Hum, this ugly,

--8<---------------cut here---------------start------------->8---
$ for doc in $(for ci in $(for t in $(git tag | grep v1 | grep -v rc ); do git --no-pager show $t | grep commit ;done); do for d in $(find ~/.cache/guix/inferiors/ -type l -print); do printf "$d "; $d/bin/guix --version 2>/dev/null ;done | grep $ci ;done | cut -f1 -d' '); do du -sh $(readlink -f $doc)/share/* ;done | grep info
172K	/gnu/store/5pa1706ckwhn6x4mn5kl2b7h15k3in9x-profile/share/info
200K	/gnu/store/z1icpkfbz59dr7k7rnb0jd8j1ii8mdph-profile/share/info
376K	/gnu/store/hm0rwgcvrs85y3hgjsw8616cxy61h6si-profile/share/info
304K	/gnu/store/zbrgzk7l0j7805i82sl3gmx6y2b0iz9q-profile/share/info
--8<---------------cut here---------------end--------------->8---

is probably providing a clue about the assumption.

Ok, let assume that the packages are the main source of size increasing.

The question is: can we evaluate the size for one package?  How many
bytes do we add to the whole Guix when we add one package?  On average
and roughly.

We have the number of packages and the whole size for successive
versions.  Therefore, we can do the difference between the two.  We get
[2078, 1848, 4704, 1532] which means 2078 packages had been added
between v1.1.0 and v1.2.0, 1848 between v1.2.0 and v1.3.0, etc.

We can do the same for the size, [22.9, 19.4, 66.2, 20.1] and then we
can compute the ratio: the size per package.  Something like:

[0.011020211742059676, 0.010497835497835485, 0.01407312925170069, 0.013120104438642276]

Let get an average: 0.012177820232559531.

Now, let take the number of packages for v1.1.0 and do the
multiplication.  We get: 159.6 MiB.

Ok, it means that the difference is more or less the core of Guix – what
we are assuming that is slowly growing.  It reads 62 MiB.

Therefore, we can predict the size for the other versions using this
linear model based on the previous evaluated average.

    size = mean * number_packages + core

It reads:

[246.9, 269.4, 326.7, 345.3] compared to [244.5, 263.9, 330.1, 350.2].

Hum, this quick back-to-the-envelope computation does not seem too bad.
I guess.

Conclusions:

 1. the addition of one package leads to an increase of ~ 12 KiB

 2. the core of Guix is about ~ 62 MiB

 3. doubling the number of packages is doubling the size to download at
    “guix pull” time.


Maybe, we should re-think (guix self).  Especially the *package-modules*
part and re-discuss if we could split that part.  From my understanding.


Cheers,
simon

1: https://issues.guix.gnu.org/issue/63050


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-25 18:24           ` How many bytes do we add (closure of guix) when adding one new package? Simon Tournier
@ 2023-05-26 16:21             ` Ludovic Courtès
  2023-05-30 12:10               ` Simon Tournier
  0 siblings, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2023-05-26 16:21 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Guix Devel, Andreas Enge

Hello!

Thanks for the detailed analysis!

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> Conclusions:
>
>  1. the addition of one package leads to an increase of ~ 12 KiB
>
>  2. the core of Guix is about ~ 62 MiB
>
>  3. doubling the number of packages is doubling the size to download at
>     “guix pull” time.

I agree that .go files are quite big (.scm files as well, but we’ve
improved information density somewhat by removing input labels :-)).

The size of .go files went down when we switch to the baseline compiler
(aka. -O1):

  https://lists.gnu.org/archive/html/guix-devel/2020-06/msg00071.html

That thread has ideas of things to do to further reduce .go size.

Download size has to be treated separately though.  For example, ‘git
pull’ doesn’t redownload all of the repo or directory, and it uses
compression heavily.  Thus, a few hundred bytes of additional .scm text
translate in less than that.

As for the rest, download size can be reduced for example by choosing a
content-address transport, like something based on ERIS.

I think we must look precisely at what we want to optimize—on-disk size,
or bandwidth requirement, in particular—and look at the whole solution
space.

My 2¢!

Ludo’.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-26 16:21             ` Ludovic Courtès
@ 2023-05-30 12:10               ` Simon Tournier
  2023-05-30 19:10                 ` Csepp
  2023-05-30 20:55                 ` How many bytes do we add (closure of guix) when adding one new package? Jack Hill
  0 siblings, 2 replies; 10+ messages in thread
From: Simon Tournier @ 2023-05-30 12:10 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel, Andreas Enge

Hi,

On ven., 26 mai 2023 at 18:21, Ludovic Courtès <ludo@gnu.org> wrote:

> I agree that .go files are quite big (.scm files as well, but we’ve
> improved information density somewhat by removing input labels :-)).
>
> The size of .go files went down when we switch to the baseline compiler
> (aka. -O1):
>
>   https://lists.gnu.org/archive/html/guix-devel/2020-06/msg00071.html
>
> That thread has ideas of things to do to further reduce .go size.

Just to put a figure on what means “big”: currently the .go files are 5
times bigger than their associated .scm.

Somehow, it’s the trap of DSL. :-) Packages are declarative and the
information they declare is not dense.  However, because they are
bytecompiled to a general programming language, their specificity is not
exploited.  In an ideal world, the compiled binary representation of the
packages should be smaller than their human-readable text-file
counterpart.

The mentioned improvement is nice.  And it’s visible:

--8<---------------cut here---------------start------------->8---
145M /gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
117M /gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu
127M /gnu/store/ndii4bpyzh2rc05ya61s89rig9hdrl4k-guix-a0178d34f-modules/lib/guile/3.0/site-ccache/gnu
164M /gnu/store/ni63a203jf61dwxlv8kr9b8x3vb1pdsp-guix-8e2f32cee-modules/lib/guile/3.0/site-ccache/gnu
--8<---------------cut here---------------end--------------->8---

However, it has almost no impact on the whole size; scaled by the number
of packages.

> Download size has to be treated separately though.  For example, ‘git
> pull’ doesn’t redownload all of the repo or directory, and it uses
> compression heavily.  Thus, a few hundred bytes of additional .scm text
> translate in less than that.
>
> As for the rest, download size can be reduced for example by choosing a
> content-address transport, like something based on ERIS.
>
> I think we must look precisely at what we want to optimize—on-disk size,
> or bandwidth requirement, in particular—and look at the whole solution
> space.

I think one direction is to tackle the way *package-modules* is built.
Because of that, Guix is building too much and the design is not optimal
– whatever technical solutions we implement for improving after that.

On my poor laptop, Guix is becoming unusable because many operations are
becoming so slow – when it’s still acceptable with APT of Debian.  For
instance, it’s something like 20 minutes for running “guix pull” without
substitutes.  And when I am traveling without a fast Internet
connection, it’s often too much for the network at hand.

Currently, “guix pull” is either building too much and downloading too
much; by design.


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-30 12:10               ` Simon Tournier
@ 2023-05-30 19:10                 ` Csepp
  2023-05-31  8:05                   ` Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?) Simon Tournier
  2023-05-30 20:55                 ` How many bytes do we add (closure of guix) when adding one new package? Jack Hill
  1 sibling, 1 reply; 10+ messages in thread
From: Csepp @ 2023-05-30 19:10 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, Andreas Enge, guix-devel


Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On ven., 26 mai 2023 at 18:21, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> I agree that .go files are quite big (.scm files as well, but we’ve
>> improved information density somewhat by removing input labels :-)).
>>
>> The size of .go files went down when we switch to the baseline compiler
>> (aka. -O1):
>>
>>   https://lists.gnu.org/archive/html/guix-devel/2020-06/msg00071.html
>>
>> That thread has ideas of things to do to further reduce .go size.
>
> Just to put a figure on what means “big”: currently the .go files are 5
> times bigger than their associated .scm.
>
> Somehow, it’s the trap of DSL. :-) Packages are declarative and the
> information they declare is not dense.  However, because they are
> bytecompiled to a general programming language, their specificity is not
> exploited.  In an ideal world, the compiled binary representation of the
> packages should be smaller than their human-readable text-file
> counterpart.
>
> The mentioned improvement is nice.  And it’s visible:
>
> --8<---------------cut here---------------start------------->8---
> 145M /gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
> 117M /gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu
> 127M /gnu/store/ndii4bpyzh2rc05ya61s89rig9hdrl4k-guix-a0178d34f-modules/lib/guile/3.0/site-ccache/gnu
> 164M /gnu/store/ni63a203jf61dwxlv8kr9b8x3vb1pdsp-guix-8e2f32cee-modules/lib/guile/3.0/site-ccache/gnu
> --8<---------------cut here---------------end--------------->8---
>
> However, it has almost no impact on the whole size; scaled by the number
> of packages.
>
>> Download size has to be treated separately though.  For example, ‘git
>> pull’ doesn’t redownload all of the repo or directory, and it uses
>> compression heavily.  Thus, a few hundred bytes of additional .scm text
>> translate in less than that.
>>
>> As for the rest, download size can be reduced for example by choosing a
>> content-address transport, like something based on ERIS.
>>
>> I think we must look precisely at what we want to optimize—on-disk size,
>> or bandwidth requirement, in particular—and look at the whole solution
>> space.
>
> I think one direction is to tackle the way *package-modules* is built.
> Because of that, Guix is building too much and the design is not optimal
> – whatever technical solutions we implement for improving after that.
>
> On my poor laptop, Guix is becoming unusable because many operations are
> becoming so slow – when it’s still acceptable with APT of Debian.  For
> instance, it’s something like 20 minutes for running “guix pull” without
> substitutes.  And when I am traveling without a fast Internet
> connection, it’s often too much for the network at hand.
>
> Currently, “guix pull” is either building too much and downloading too
> much; by design.
>
>
> Cheers,
> simon

Something I've been considering is if Guix could make use of database
optimizations on its packages.  Having access to Scheme for everything
is nice, but using it as a storage solution is kind of silly when we are
mostly just storing structs.  Some kind of struct-of-arrays optimization
could definitely reduce their size by a lot, might even speed up some
operations.  It makes zero sense to load full package definitions from
disk for most queries, such as guix search, with an SoA representation
we could load only the fields that we care about.

ps.: Now I'm even more glad that I'm using a file system with
transparent compression on all my Guix systems.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-30 12:10               ` Simon Tournier
  2023-05-30 19:10                 ` Csepp
@ 2023-05-30 20:55                 ` Jack Hill
  2023-05-31  8:27                   ` Simon Tournier
  1 sibling, 1 reply; 10+ messages in thread
From: Jack Hill @ 2023-05-30 20:55 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Ludovic Courtès, Guix Devel, Andreas Enge

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

On Tue, 30 May 2023, Simon Tournier wrote:

> Just to put a figure on what means “big”: currently the .go files are 5
> times bigger than their associated .scm.
>
> Somehow, it’s the trap of DSL. :-) Packages are declarative and the
> information they declare is not dense.  However, because they are
> bytecompiled to a general programming language, their specificity is not
> exploited.  In an ideal world, the compiled binary representation of the
> packages should be smaller than their human-readable text-file
> counterpart.
>
> The mentioned improvement is nice.  And it’s visible:
>
> --8<---------------cut here---------------start------------->8---
> 145M /gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
> 117M /gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu
> 127M /gnu/store/ndii4bpyzh2rc05ya61s89rig9hdrl4k-guix-a0178d34f-modules/lib/guile/3.0/site-ccache/gnu
> 164M /gnu/store/ni63a203jf61dwxlv8kr9b8x3vb1pdsp-guix-8e2f32cee-modules/lib/guile/3.0/site-ccache/gnu
> --8<---------------cut here---------------end--------------->8---

This is probably a tagent, sorry, but I was curious how well the .go files 
compressed. It seems quite well, actually:

jackhill@mimolette ~/.config/guix/current/lib/guile/3.0/site-ccache/gnu [env]$ sudo compsize .
Processed 595 files, 1659 regular extents (1659 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced
TOTAL       21%       36M         173M         173M
none       100%       16K          16K          16K
zstd        21%       36M         173M         173M

Best,
Jack

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?)
  2023-05-30 19:10                 ` Csepp
@ 2023-05-31  8:05                   ` Simon Tournier
  2023-05-31 11:10                     ` Csepp
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Tournier @ 2023-05-31  8:05 UTC (permalink / raw)
  To: Csepp; +Cc: Ludovic Courtès, Andreas Enge, guix-devel

Hi,

On Tue, 30 May 2023 at 21:10, Csepp <raingloom@riseup.net> wrote:

>              It makes zero sense to load full package definitions from
> disk for most queries, such as guix search, with an SoA representation
> we could load only the fields that we care about.

That’s already the case; see
~/.config/guix/current/lib/guix/package.cache.

For instance, “guix package -A” exploits it and the performances are
acceptable.  Two past summers, wow already! I tried to augment it and
exploit it for “guix search”.  The implementation and benchmark is in
#39258 [1].  Well, the whole thread of #39258 appears to me worth to
consider because it spots various bottleneck specific to “guix search”
and explains why the improvement is not straightforward.

Well, I have started months ago to write a Guix extension using
guile-xapian.  My aim is to tackle two annoyances: 1. the speed and
2. the relevance.

About the relevance #2, the issue is that the current scoring considers
only the local information of one package without considering the global
information of all the others.  Well, see [2,3,4] for some details. :-)

1: https://issues.guix.gnu.org/39258#119
2: https://yhetil.org/guix/CAJ3okZ3E3bhZ5pROZS68wEKdKOcZ8SpXsvdi-bnB=9Jz3mPahA@mail.gmail.com
3: https://yhetil.org/guix/CAJ3okZ3+hn0nJP98OhnZYLWJvhLGpdTUK+jB0hoM5JArQxO=zw@mail.gmail.com
4: https://yhetil.org/guix/CAJ3okZ0LaJzWDBA7bjqZew_jAmtt1rj9PJhevwrtBiA_COXENg@mail.gmail.com


> ps.: Now I'm even more glad that I'm using a file system with
> transparent compression on all my Guix systems.

Did you benchmarked the performances for some Guix operations on these
compressed vs uncompressed file system?


Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-30 20:55                 ` How many bytes do we add (closure of guix) when adding one new package? Jack Hill
@ 2023-05-31  8:27                   ` Simon Tournier
  2023-05-31 12:47                     ` Guillaume Le Vaillant
  0 siblings, 1 reply; 10+ messages in thread
From: Simon Tournier @ 2023-05-31  8:27 UTC (permalink / raw)
  To: Jack Hill; +Cc: Ludovic Courtès, Guix Devel, Andreas Enge

Hi Jack,

On Tue, 30 May 2023 at 16:55, Jack Hill <jackhill@jackhill.us> wrote:

> $ ~/.config/guix/current/lib/guile/3.0/site-ccache/gnu [env]$ sudo compsize .
> Processed 595 files, 1659 regular extents (1659 refs), 0 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced
> TOTAL       21%       36M         173M         173M
> none       100%       16K          16K          16K
> zstd        21%       36M         173M         173M

Cool!  Could you do (or anyone else with btrfs),

    guix time-machine --commit=d62c9b2671be55ae0305bebfda17b595f33797f2 \
         -- describe
    guix time-machine --commit=a099685659b4bfa6b3218f84953cbb7ff9e88063 \
         -- describe

then report the size (compsize) of

/gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
/gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu

?

Cheers,
simon


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?)
  2023-05-31  8:05                   ` Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?) Simon Tournier
@ 2023-05-31 11:10                     ` Csepp
  2023-05-31 11:55                       ` Attila Lendvai
  0 siblings, 1 reply; 10+ messages in thread
From: Csepp @ 2023-05-31 11:10 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Csepp, Ludovic Courtès, Andreas Enge, guix-devel


Simon Tournier <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Tue, 30 May 2023 at 21:10, Csepp <raingloom@riseup.net> wrote:
>
>>              It makes zero sense to load full package definitions from
>> disk for most queries, such as guix search, with an SoA representation
>> we could load only the fields that we care about.
>
> That’s already the case; see
> ~/.config/guix/current/lib/guix/package.cache.
>
> For instance, “guix package -A” exploits it and the performances are
> acceptable.  Two past summers, wow already! I tried to augment it and
> exploit it for “guix search”.  The implementation and benchmark is in
> #39258 [1].  Well, the whole thread of #39258 appears to me worth to
> consider because it spots various bottleneck specific to “guix search”
> and explains why the improvement is not straightforward.

That's a good improvement, but it's in addition to the ELF files, so it
doesn't save any space, and as far as I know it doesn't speed up
non-textual queries, like searching for packages that use a specific
build system.

> Well, I have started months ago to write a Guix extension using
> guile-xapian.  My aim is to tackle two annoyances: 1. the speed and
> 2. the relevance.
>
> About the relevance #2, the issue is that the current scoring considers
> only the local information of one package without considering the global
> information of all the others.  Well, see [2,3,4] for some details. :-)
>
> 1: https://issues.guix.gnu.org/39258#119
> 2: https://yhetil.org/guix/CAJ3okZ3E3bhZ5pROZS68wEKdKOcZ8SpXsvdi-bnB=9Jz3mPahA@mail.gmail.com
> 3: https://yhetil.org/guix/CAJ3okZ3+hn0nJP98OhnZYLWJvhLGpdTUK+jB0hoM5JArQxO=zw@mail.gmail.com
> 4: https://yhetil.org/guix/CAJ3okZ0LaJzWDBA7bjqZew_jAmtt1rj9PJhevwrtBiA_COXENg@mail.gmail.com

Thanks for the links, gonna read them in more detail later.

>> ps.: Now I'm even more glad that I'm using a file system with
>> transparent compression on all my Guix systems.
>
> Did you benchmarked the performances for some Guix operations on these
> compressed vs uncompressed file system?

I haven't, but I have recently tried to move to a larger drive and
accidentally did a btrfs send without compression and the system didn't
fit.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?)
  2023-05-31 11:10                     ` Csepp
@ 2023-05-31 11:55                       ` Attila Lendvai
  0 siblings, 0 replies; 10+ messages in thread
From: Attila Lendvai @ 2023-05-31 11:55 UTC (permalink / raw)
  To: Csepp; +Cc: Simon Tournier, Ludovic Courtès, Andreas Enge, guix-devel

> It makes zero sense to load full package definitions from disk for
> most queries, such as guix search, with an SoA representation we
> could load only the fields that we care about.


i'd like to quickly point out something while we are discussing this:

when i came to guix it was rather confusing that there two namespaces through which one can get hold of a package object:

1) module-global variables in the scheme module system

2) a reified repository of packages (i.e. a scheme hash table, and some lookup functions).

these two namespaces are quite indepenedent from each other, and there are no formal rules that govern the relationship between the two.

there are scheme variable reference in some places, some others issue a call to SPECIFICATION->PACKAGE, manifests do that implicitly (?), etc. my gut feeling is that there is potential here for unification and simplification, without limiting composability.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The greatest gift you can give someone is your own personal development. I used to say, 'If you’ll take care of me, I’ll take care of you'. Now I say, 'I will take care of me for you, if you will take care of you for me'.”
	— Jim Rohn



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: How many bytes do we add (closure of guix) when adding one new package?
  2023-05-31  8:27                   ` Simon Tournier
@ 2023-05-31 12:47                     ` Guillaume Le Vaillant
  0 siblings, 0 replies; 10+ messages in thread
From: Guillaume Le Vaillant @ 2023-05-31 12:47 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Jack Hill, Ludovic Courtès, Andreas Enge, guix-devel

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

Simon Tournier <zimon.toutoune@gmail.com> skribis:

> Hi Jack,
>
> On Tue, 30 May 2023 at 16:55, Jack Hill <jackhill@jackhill.us> wrote:
>
>> $ ~/.config/guix/current/lib/guile/3.0/site-ccache/gnu [env]$ sudo compsize .
>> Processed 595 files, 1659 regular extents (1659 refs), 0 inline.
>> Type       Perc     Disk Usage   Uncompressed Referenced
>> TOTAL       21%       36M         173M         173M
>> none       100%       16K          16K          16K
>> zstd        21%       36M         173M         173M
>
> Cool!  Could you do (or anyone else with btrfs),
>
>     guix time-machine --commit=d62c9b2671be55ae0305bebfda17b595f33797f2 \
>          -- describe
>     guix time-machine --commit=a099685659b4bfa6b3218f84953cbb7ff9e88063 \
>          -- describe
>
> then report the size (compsize) of
>
> /gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
> /gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu
>
> ?
>
> Cheers,
> simon

Hi,

With a BTRFS filesystem compressed with zstd:3, I get:

--8<---------------cut here---------------start------------->8---
# compsize /gnu/store/nqrb3g4l59wd74w8mr9v0b992bj2sd1w-guix-d62c9b267-modules/lib/guile/3.0/site-ccache/gnu
Processed 503 files, 1317 regular extents (1317 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       27%       40M         144M         144M       
none       100%       10M          10M          10M       
zstd        22%       30M         133M         133M

# compsize /gnu/store/s6rqlhqr750k44ynkqqj5mwjj2cs2yln-guix-a09968565-modules/lib/guile/3.0/site-ccache/gnu
Processed 530 files, 1169 regular extents (1169 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       19%       22M         116M         116M       
none       100%       32K          32K          32K       
zstd        19%       22M         116M         116M
--8<---------------cut here---------------end--------------->8---

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 247 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-31 12:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ZEZWS/h9xa/ZX3/E@jurong>
     [not found] ` <875y9jzl9m.fsf@gnu.org>
     [not found]   ` <874jot19fd.fsf_-_@gnu.org>
     [not found]     ` <87fs7rvv5s.fsf_-_@gnu.org>
     [not found]       ` <ZGj3hGKGwu3mQklT@jurong>
     [not found]         ` <878rddooy4.fsf@gnu.org>
2023-05-25 18:24           ` How many bytes do we add (closure of guix) when adding one new package? Simon Tournier
2023-05-26 16:21             ` Ludovic Courtès
2023-05-30 12:10               ` Simon Tournier
2023-05-30 19:10                 ` Csepp
2023-05-31  8:05                   ` Faster “guix search” (was Re: How many bytes do we add (closure of guix) when adding one new package?) Simon Tournier
2023-05-31 11:10                     ` Csepp
2023-05-31 11:55                       ` Attila Lendvai
2023-05-30 20:55                 ` How many bytes do we add (closure of guix) when adding one new package? Jack Hill
2023-05-31  8:27                   ` Simon Tournier
2023-05-31 12:47                     ` Guillaume Le Vaillant

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).