unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: zimoun <zimon.toutoune@gmail.com>
Cc: 42162@debbugs.gnu.org, "Maurice Brémond" <Maurice.Bremond@inria.fr>
Subject: bug#42162: Recovering source tarballs
Date: Wed, 26 Aug 2020 17:11:50 -0400	[thread overview]
Message-ID: <87k0xlaz8p.fsf@ngyro.com> (raw)
In-Reply-To: <86blixyb7c.fsf@gmail.com> (zimoun's message of "Wed, 26 Aug 2020 12:04:55 +0200")

Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

> One question is how this database scales?
>
> For example, a quick back-to-envelop estimation leads to ~1.2GB metadata
> for ~14k packages and then an increase of ~700MB per year, both with the
> Ludo’s code [1].
>
> [1] <http://issues.guix.gnu.org/issue/42162#11>

It’s a good question.  A good part of the size comes from the
representation rather than the data.  Compression helps a lot here.  I
have a database of 3,912 packages.  It’s 295M uncompressed (which is a
little better than your estimation).  If I pass each file through Lzip,
it shrinks down to 60M.  That’s more like 15.5K per package, which is
almost an order of magnitude smaller than the estimation you used
(120K).  I think that makes the numbers rather pleasant, but it comes at
the expense of easy storing in Git.

> As mentioned [2], should this service be part of SWH (download cooking
> task)?  Or project side?
>
> [2] <https://forge.softwareheritage.org/T2430#47486>

It would be interesting to just have SWH absorb the project.  Since
other distros already know how to produce a “sources.json” and how to
query the SWH archive, it would mean that they benefit for free (and so
would Guix, for that matter).  I’m open to that, but right now having
the freedom to experiment is important.


-- Tim




  reply	other threads:[~2020-08-26 21:12 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-02  7:29 bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Ludovic Courtès
2020-07-02  8:50 ` zimoun
2020-07-02 10:03   ` Ludovic Courtès
2020-07-11 15:50     ` bug#42162: Recovering source tarballs Ludovic Courtès
2020-07-13 19:20       ` Christopher Baines
2020-07-20 21:27         ` zimoun
2020-07-15 16:55       ` zimoun
2020-07-20  8:39         ` Ludovic Courtès
2020-07-20 15:52           ` zimoun
2020-07-20 17:05             ` Dr. Arne Babenhauserheide
2020-07-20 19:59               ` zimoun
2020-07-21 21:22             ` Ludovic Courtès
2020-07-22  0:27               ` zimoun
2020-07-22 10:28                 ` Ludovic Courtès
2020-08-03 21:10         ` Ricardo Wurmus
2020-07-30 17:36       ` Timothy Sample
2020-07-31 14:41         ` Ludovic Courtès
2020-08-03 16:59           ` Timothy Sample
2020-08-05 17:14             ` Ludovic Courtès
2020-08-05 18:57               ` Timothy Sample
2020-08-23 16:21                 ` Ludovic Courtès
2020-11-03 14:26                 ` Ludovic Courtès
2020-11-03 16:37                   ` zimoun
2020-11-03 19:20                   ` Timothy Sample
2020-11-04 16:49                     ` Ludovic Courtès
2022-09-29  0:32                       ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Maxim Cournoyer
2022-09-29 10:56                         ` zimoun
2022-09-29 15:00                           ` Ludovic Courtès
2022-09-30  3:10                             ` Maxim Cournoyer
2022-09-30 12:13                               ` zimoun
2022-10-01 22:04                                 ` Ludovic Courtès
2022-10-03 15:20                                 ` Maxim Cournoyer
2022-10-04 21:26                                   ` Ludovic Courtès
2022-09-30 18:17                               ` Maxime Devos
2020-08-26 10:04         ` bug#42162: Recovering source tarballs zimoun
2020-08-26 21:11           ` Timothy Sample [this message]
2020-08-27  9:41             ` zimoun
2020-08-27 12:49               ` Ludovic Courtès
2020-08-27 18:06               ` Bengt Richter
2021-01-10 19:32 ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 Maxim Cournoyer
2021-01-13 10:39   ` Ludovic Courtès
2021-01-13 12:27     ` Andreas Enge
2021-01-13 15:07     ` Andreas Enge
     [not found] ` <handler.42162.D42162.16105343699609.notifdone@debbugs.gnu.org>
2021-01-13 14:28   ` Ludovic Courtès
2021-01-14 14:21     ` Maxim Cournoyer
2021-10-04 15:59     ` bug#42162: gforge.inria.fr is off-line Ludovic Courtès
2021-10-04 17:50       ` bug#42162: gforge.inria.fr to be taken off-line in Dec. 2020 zimoun
2021-10-07 16:07         ` Ludovic Courtès
2021-10-09 17:29           ` raingloom
2021-10-11  8:41           ` zimoun
2021-10-12  9:24             ` Ludovic Courtès
2021-10-12 10:50               ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0xlaz8p.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=42162@debbugs.gnu.org \
    --cc=Maurice.Bremond@inria.fr \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).