* GC strategy on clusters @ 2021-04-01 10:17 Ludovic Courtès 2021-04-01 10:45 ` Efraim Flashner 0 siblings, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2021-04-01 10:17 UTC (permalink / raw) To: guix-science Hi there! Recently the Guix head node of our cluster at Inria was getting short on disk space, despite running ‘guix gc -F20G’ (or similar) twice a day. Turns out that some users had accumulated many profile generations and that was getting in the way. So we kindly asked them to run: guix package --delete-generations=4m or similar, which was enough to free more space. We’re now considering setting up automatic user notification by email, as is commonly done for disk quotas, asking them to remove old generations. That way, users remain in control and choose what GC roots or generations they want to remove. How do people on this list deal with that? Longer term, I think Guix should automatically delete old generations and instead store the channels + manifest to reproduce them, when possible. Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GC strategy on clusters 2021-04-01 10:17 GC strategy on clusters Ludovic Courtès @ 2021-04-01 10:45 ` Efraim Flashner 2021-04-01 12:36 ` Ludovic Courtès 0 siblings, 1 reply; 5+ messages in thread From: Efraim Flashner @ 2021-04-01 10:45 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-science [-- Attachment #1: Type: text/plain, Size: 2445 bytes --] On Thu, Apr 01, 2021 at 12:17:22PM +0200, Ludovic Courtès wrote: > Hi there! > > Recently the Guix head node of our cluster at Inria was getting short on > disk space, despite running ‘guix gc -F20G’ (or similar) twice a day. > > Turns out that some users had accumulated many profile generations and > that was getting in the way. So we kindly asked them to run: > > guix package --delete-generations=4m > > or similar, which was enough to free more space. I feel like 4-6 months should be plenty for anything active. Even if it were run automatically for them it wouldn't remove the last generation. > We’re now considering setting up automatic user notification by email, > as is commonly done for disk quotas, asking them to remove old > generations. That way, users remain in control and choose what GC roots > or generations they want to remove. > > How do people on this list deal with that? I like the idea of asking people to remove old generations. It's not something that we've come up against yet. It doesn't feel that different than reminding them that their $HOME is for code and smaller things and the storage space is for their large data collections. > Longer term, I think Guix should automatically delete old generations > and instead store the channels + manifest to reproduce them, when > possible. > This seems to help a bit less when we run into issues about dates being wrong on SSL tests, or when sources go missing. How much storage and people are you working with? Our initial multiuser system has 188GB for /gnu and I think 30-40 people and some people have profiles going back almost 3 years. Not many people have multiple profiles and the experiments we tried with shared profiles in /usr/local/guix-profiles don't see a lot of use or get updated frequently. I guess I'm not really sure if its a technology problem or a people problem. Figuring out if someone is the only one pulling in a copy of glibc-2.25 is doable but how many copies of diffoscope is too many? On a practical note, 'guix package --list-profiles' as root currently lists everyone's profiles so it can be easier to see who has older profiles hanging around. -- Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GC strategy on clusters 2021-04-01 10:45 ` Efraim Flashner @ 2021-04-01 12:36 ` Ludovic Courtès 2021-04-01 13:29 ` zimoun 0 siblings, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2021-04-01 12:36 UTC (permalink / raw) To: Efraim Flashner; +Cc: guix-science Hello! Efraim Flashner <efraim@flashner.co.il> skribis: > On Thu, Apr 01, 2021 at 12:17:22PM +0200, Ludovic Courtès wrote: [...] >> guix package --delete-generations=4m >> >> or similar, which was enough to free more space. > > I feel like 4-6 months should be plenty for anything active. Even if it > were run automatically for them it wouldn't remove the last generation. It depends. A practical use case I have in mind: you run experiments, you submit a paper including its results, you get initial reviews months later, and even later it’s published and you get to present it. At that point, you want to answer questions and to reproduce it. 4–6 months is not a lot in that context. (Though of course, ideally you’d save channels.scm + manifest.scm and share it with reviewers and readers in the first place…) Besides, I think the whole point of Guix is that users on the cluster can remain in control, unlike what happens with “environment modules”. >> Longer term, I think Guix should automatically delete old generations >> and instead store the channels + manifest to reproduce them, when >> possible. >> > > This seems to help a bit less when we run into issues about dates being > wrong on SSL tests, or when sources go missing. Good points. Hopefully “sources go missing” can soon be considered addressed. Really, failing TLS tests is the most worrisome issue to me because we don’t have any idea on how to address it systematically. > How much storage and people are you working with? Our initial multiuser > system has 188GB for /gnu and I think 30-40 people and some people have > profiles going back almost 3 years. Not many people have multiple > profiles and the experiments we tried with shared profiles in > /usr/local/guix-profiles don't see a lot of use or get updated > frequently. I’m not sure how much storage the Guix head node has (I’m not an admin), but the number of users and duration is in the same ballpark. > I guess I'm not really sure if its a technology problem or a people > problem. Figuring out if someone is the only one pulling in a copy of > glibc-2.25 is doable but how many copies of diffoscope is too many? > > On a practical note, 'guix package --list-profiles' as root currently > lists everyone's profiles so it can be easier to see who has older > profiles hanging around. Actually, as non-root, I walked /var/guix/profiles/per-user on the cluster to see the number of generations per user, which allowed us to target those with a lot of generations. :-) It would be nice to provide a documented approach sysadmins could follow! Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GC strategy on clusters 2021-04-01 12:36 ` Ludovic Courtès @ 2021-04-01 13:29 ` zimoun 2021-04-01 13:43 ` Ludovic Courtès 0 siblings, 1 reply; 5+ messages in thread From: zimoun @ 2021-04-01 13:29 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Efraim Flashner, guix-science Hi, On Thu, 1 Apr 2021 at 14:36, Ludovic Courtès <ludovic.courtes@inria.fr> wrote: > Efraim Flashner <efraim@flashner.co.il> skribis: > > > On Thu, Apr 01, 2021 at 12:17:22PM +0200, Ludovic Courtès wrote: > > [...] > > >> guix package --delete-generations=4m > >> > >> or similar, which was enough to free more space. For now, I am waiting to reach the full capacity at hand to ask how to deal with it. ;-) > > I feel like 4-6 months should be plenty for anything active. Even if it > > were run automatically for them it wouldn't remove the last generation. > > It depends. A practical use case I have in mind: you run experiments, > you submit a paper including its results, you get initial reviews months > later, and even later it’s published and you get to present it. At that > point, you want to answer questions and to reproduce it. 4–6 months is > not a lot in that context. To add another data point. Even it is hard to have a good overview, I would say the average is 2-4 years for a typical project. It is hard because currently all is not done with the same tools for the same task. For instance, some data is cleaned with some tools, then an partial analysis is done, months later another data is added so another partial analysis with probably different tools, then months later a full toolsuite as Bioconductor (or whatever) is updated and some analysis are re-done. The final publication is a mix of all over the 2-4 years project with details at various level. Other said, it depends on the level we are looking. > (Though of course, ideally you’d save channels.scm + manifest.scm and > share it with reviewers and readers in the first place…) It is what I am tirelessly explaining in my lab. ;-) > >> Longer term, I think Guix should automatically delete old generations > >> and instead store the channels + manifest to reproduce them, when > >> possible. > >> > > > > This seems to help a bit less when we run into issues about dates being > > wrong on SSL tests, or when sources go missing. > > Good points. Hopefully “sources go missing” can soon be considered > addressed. Really, failing TLS tests is the most worrisome issue to me > because we don’t have any idea on how to address it systematically. By "soon", you mean the bricks are there and it is missing to glue them together. From my opinion, some details need to be addressed to have a full end-to-end sources fallback, since evil is hidden inside the details. ;-) About the TLS, you proposed to setup a machine ahead of clock. Maybe it is worth to try. > > I guess I'm not really sure if its a technology problem or a people > > problem. Figuring out if someone is the only one pulling in a copy of > > glibc-2.25 is doable but how many copies of diffoscope is too many? > > > > On a practical note, 'guix package --list-profiles' as root currently > > lists everyone's profiles so it can be easier to see who has older > > profiles hanging around. > > Actually, as non-root, I walked /var/guix/profiles/per-user on the > cluster to see the number of generations per user, which allowed us to > target those with a lot of generations. :-) Well, when we discussed the '--list-profiles', it was initially for my personal purposes. Then, I have tried to use it to monitor the few users that I have. Well, in Biology they have the concept of "-80 fridge". It is a big and very cold fridge where you keep samples, potentially for a long time. Everybody put in until it is full and once it is full, there is endless discussion on what to throw... until the fridge is broken because shutdown or unexpected failures and then it is obvious to everybody what needs to be taken or thrown. I am using such analogy to explain the hygiene to have on shared machines Hum, once written I do not know if it relevant. ;-) From my point of view, having a channel+manifest "backup" for old profiles seems something to try. It cost nothing with ''--export-manifest" and "--export-channels". Cheers, simon ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GC strategy on clusters 2021-04-01 13:29 ` zimoun @ 2021-04-01 13:43 ` Ludovic Courtès 0 siblings, 0 replies; 5+ messages in thread From: Ludovic Courtès @ 2021-04-01 13:43 UTC (permalink / raw) To: zimoun; +Cc: Efraim Flashner, guix-science Hi, zimoun <zimon.toutoune@gmail.com> skribis: >> It depends. A practical use case I have in mind: you run experiments, >> you submit a paper including its results, you get initial reviews months >> later, and even later it’s published and you get to present it. At that >> point, you want to answer questions and to reproduce it. 4–6 months is >> not a lot in that context. > > To add another data point. Even it is hard to have a good overview, I > would say the average is 2-4 years for a typical project. It is hard > because currently all is not done with the same tools for the same > task. For instance, some data is cleaned with some tools, then an > partial analysis is done, months later another data is added so > another partial analysis with probably different tools, then months > later a full toolsuite as Bioconductor (or whatever) is updated and > some analysis are re-done. The final publication is a mix of all over > the 2-4 years project with details at various level. Right. >> Good points. Hopefully “sources go missing” can soon be considered >> addressed. Really, failing TLS tests is the most worrisome issue to me >> because we don’t have any idea on how to address it systematically. > > By "soon", you mean the bricks are there and it is missing to glue > them together. From my opinion, some details need to be addressed to > have a full end-to-end sources fallback, since evil is hidden inside > the details. ;-) True! I’m really hopeful about the Disarchive/SWH combination, together with <https://guix.gnu.org/sources.json>. Of course we’ll have to monitor that, and we can expect bumps on the road :-), but at least we have a plan. > About the TLS, you proposed to setup a machine ahead of clock. Maybe > it is worth to try. Oh sure, that trick definitely works. But it’s a terrible hack, and it means that, by default, people will just fail to build the package. > Well, when we discussed the '--list-profiles', it was initially for my > personal purposes. Then, I have tried to use it to monitor the few > users that I have. Well, in Biology they have the concept of "-80 > fridge". It is a big and very cold fridge where you keep samples, > potentially for a long time. Everybody put in until it is full and > once it is full, there is endless discussion on what to throw... until > the fridge is broken because shutdown or unexpected failures and then > it is obvious to everybody what needs to be taken or thrown. I am > using such analogy to explain the hygiene to have on shared machines > Hum, once written I do not know if it relevant. ;-) It surely is. :-) I mean, I didn’t even think about it until GC couldn’t make any progress. > From my point of view, having a channel+manifest "backup" for old > profiles seems something to try. It cost nothing with > ''--export-manifest" and "--export-channels". Yes, though it’s an approximation; so it should only be used when we know that it’s 100% faithful, as is the case for ‘guix pull’ profiles. Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-01 15:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-04-01 10:17 GC strategy on clusters Ludovic Courtès 2021-04-01 10:45 ` Efraim Flashner 2021-04-01 12:36 ` Ludovic Courtès 2021-04-01 13:29 ` zimoun 2021-04-01 13:43 ` Ludovic Courtès
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.