unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: Efraim Flashner <efraim@flashner.co.il>, guix-science@gnu.org
Subject: Re: GC strategy on clusters
Date: Thu, 1 Apr 2021 15:29:38 +0200	[thread overview]
Message-ID: <CAJ3okZ0LYAbOxb_2ztWM2LrdUQF18_HRQ1k30jNE8jQn145KZw@mail.gmail.com> (raw)
In-Reply-To: <87wntm89lp.fsf@inria.fr>

Hi,

On Thu, 1 Apr 2021 at 14:36, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
> Efraim Flashner <efraim@flashner.co.il> skribis:
>
> > On Thu, Apr 01, 2021 at 12:17:22PM +0200, Ludovic Courtès wrote:
>
> [...]
>
> >>   guix package --delete-generations=4m
> >>
> >> or similar, which was enough to free more space.

For now, I am waiting to reach the full capacity at hand to ask how to
deal with it. ;-)

> > I feel like 4-6 months should be plenty for anything active. Even if it
> > were run automatically for them it wouldn't remove the last generation.
>
> It depends.  A practical use case I have in mind: you run experiments,
> you submit a paper including its results, you get initial reviews months
> later, and even later it’s published and you get to present it.  At that
> point, you want to answer questions and to reproduce it.  4–6 months is
> not a lot in that context.

To add another data point.  Even it is hard to have a good overview, I
would say the average is 2-4 years for a typical project.  It is hard
because currently all is not done with the same tools for the same
task.  For instance, some data is cleaned with some tools, then an
partial analysis is done, months later another data is added so
another partial analysis with probably different tools, then months
later a full toolsuite as Bioconductor (or whatever) is updated and
some analysis are re-done.  The final publication is a mix of all over
the 2-4 years project with details at various level.

Other said, it depends on the level we are looking.

> (Though of course, ideally you’d save channels.scm + manifest.scm and
> share it with reviewers and readers in the first place…)

It is what I am tirelessly explaining in my lab. ;-)

> >> Longer term, I think Guix should automatically delete old generations
> >> and instead store the channels + manifest to reproduce them, when
> >> possible.
> >>
> >
> > This seems to help a bit less when we run into issues about dates being
> > wrong on SSL tests, or when sources go missing.
>
> Good points.  Hopefully “sources go missing” can soon be considered
> addressed.  Really, failing TLS tests is the most worrisome issue to me
> because we don’t have any idea on how to address it systematically.

By "soon", you mean the bricks are there and it is missing to glue
them together.  From my opinion, some details need to be addressed to
have a full end-to-end sources fallback, since evil is hidden inside
the details. ;-)

About the TLS, you proposed to setup a machine ahead of clock.  Maybe
it is worth to try.

> > I guess I'm not really sure if its a technology problem or a people
> > problem. Figuring out if someone is the only one pulling in a copy of
> > glibc-2.25 is doable but how many copies of diffoscope is too many?
> >
> > On a practical note, 'guix package --list-profiles' as root currently
> > lists everyone's profiles so it can be easier to see who has older
> > profiles hanging around.
>
> Actually, as non-root, I walked /var/guix/profiles/per-user on the
> cluster to see the number of generations per user, which allowed us to
> target those with a lot of generations.  :-)

Well, when we discussed the '--list-profiles', it was initially for my
personal purposes.  Then, I have tried to use it to monitor the few
users that I have.  Well, in Biology they have the concept of "-80
fridge".  It is a big and very cold fridge where you keep samples,
potentially for a long time.  Everybody put in until it is full and
once it is full, there is endless discussion on what to throw... until
the fridge is broken because shutdown or unexpected failures and then
it is obvious to everybody what needs to be taken or thrown.  I am
using such analogy to explain the hygiene to have on shared machines
Hum, once written I do not know if it relevant. ;-)

From my point of view, having a channel+manifest "backup" for old
profiles seems something to try.  It cost nothing with
''--export-manifest" and "--export-channels".

Cheers,
simon


  reply	other threads:[~2021-04-01 13:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 10:17 GC strategy on clusters Ludovic Courtès
2021-04-01 10:45 ` Efraim Flashner
2021-04-01 12:36   ` Ludovic Courtès
2021-04-01 13:29     ` zimoun [this message]
2021-04-01 13:43       ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ3okZ0LYAbOxb_2ztWM2LrdUQF18_HRQ1k30jNE8jQn145KZw@mail.gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=efraim@flashner.co.il \
    --cc=guix-science@gnu.org \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).