* Are 'guix gc' stats exaggerated? @ 2024-05-26 20:13 Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-27 9:10 ` raingloom ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-26 20:13 UTC (permalink / raw) To: guix-devel Hi, Today I ran 'guix gc' on equipment with an ext4 root partition. It had these space characteristics beforehand: Filesystem Size Used Avail Use% Mounted on /dev/dm-3 309047680 157252980 138126064 54% / or for human eyes: /dev/dm-3 295G 150G 132G 54% / After the run, the drive showed: /dev/dm-3 309047680 88267956 207111088 30% / or for human eyes: /dev/dm-3 295G 85G 198G 30% / By my math, about 65.8 GiB were recovered. When 'guix gc' was done, it announced: [184389 MiB] deleting '/gnu/store/...' deleting `/gnu/store/trash' deleting unused links... note: currently hard linking saves 59224.03 MiB guix gc: freed 110,649.49 MiBs Seeing the 184389 MiB number, or 180 GiB, already made me suspicious. It exceeded my drive usage by 30 GiB. Even the more conservative 110649 MiB "freed," however, are off by a mile. That would have been 108 GiB, or 42 GiB more than the space actually recovered. Am I looking at those numbers the wrong way? Thanks! Kind regards Felix ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-26 20:13 Are 'guix gc' stats exaggerated? Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-27 9:10 ` raingloom 2024-05-28 2:47 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-28 9:01 ` Efraim Flashner 2024-05-31 16:33 ` Simon Tournier 2 siblings, 1 reply; 12+ messages in thread From: raingloom @ 2024-05-27 9:10 UTC (permalink / raw) To: Felix Lechner; +Cc: guix-devel On 2024-05-26 22:13, Felix Lechner via "Development of GNU Guix and the GNU System distribution." wrote: > Hi, > > Today I ran 'guix gc' on equipment with an ext4 root partition. It had > these space characteristics beforehand: > > Filesystem Size Used Avail Use% Mounted on > /dev/dm-3 309047680 157252980 138126064 54% / > > or for human eyes: > > /dev/dm-3 295G 150G 132G 54% / > > After the run, the drive showed: > > /dev/dm-3 309047680 88267956 207111088 30% / > > or for human eyes: > > /dev/dm-3 295G 85G 198G 30% / > > By my math, about 65.8 GiB were recovered. > > When 'guix gc' was done, it announced: > > [184389 MiB] deleting '/gnu/store/...' > deleting `/gnu/store/trash' > deleting unused links... > note: currently hard linking saves 59224.03 MiB > guix gc: freed 110,649.49 MiBs > > Seeing the 184389 MiB number, or 180 GiB, already made me suspicious. > It exceeded my drive usage by 30 GiB. Even the more conservative 110649 > MiB "freed," however, are off by a mile. That would have been 108 GiB, > or 42 GiB more than the space actually recovered. > > Am I looking at those numbers the wrong way? Thanks! > > Kind regards > Felix Are you using compression? (BTRFS, ZFS, etc) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-27 9:10 ` raingloom @ 2024-05-28 2:47 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 0 siblings, 0 replies; 12+ messages in thread From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-28 2:47 UTC (permalink / raw) To: raingloom; +Cc: guix-devel Hi raingloom, On Mon, May 27 2024, raingloom@riseup.net wrote: > Are you using compression? (BTRFS, ZFS, etc) No, I thought about that, too, but that volume, like all my root volumes, is straight ext4 on LVM2, on bare metal. Kind regards Felix ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-26 20:13 Are 'guix gc' stats exaggerated? Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-27 9:10 ` raingloom @ 2024-05-28 9:01 ` Efraim Flashner 2024-05-31 22:03 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-31 16:33 ` Simon Tournier 2 siblings, 1 reply; 12+ messages in thread From: Efraim Flashner @ 2024-05-28 9:01 UTC (permalink / raw) To: Felix Lechner; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1901 bytes --] On Sun, May 26, 2024 at 01:13:45PM -0700, Felix Lechner via Development of GNU Guix and the GNU System distribution. wrote: > Hi, > > Today I ran 'guix gc' on equipment with an ext4 root partition. It had > these space characteristics beforehand: > > Filesystem Size Used Avail Use% Mounted on > /dev/dm-3 309047680 157252980 138126064 54% / > > or for human eyes: > > /dev/dm-3 295G 150G 132G 54% / > > After the run, the drive showed: > > /dev/dm-3 309047680 88267956 207111088 30% / > > or for human eyes: > > /dev/dm-3 295G 85G 198G 30% / > > By my math, about 65.8 GiB were recovered. > > When 'guix gc' was done, it announced: > > [184389 MiB] deleting '/gnu/store/...' > deleting `/gnu/store/trash' > deleting unused links... > note: currently hard linking saves 59224.03 MiB > guix gc: freed 110,649.49 MiBs > > Seeing the 184389 MiB number, or 180 GiB, already made me suspicious. > It exceeded my drive usage by 30 GiB. Even the more conservative 110649 > MiB "freed," however, are off by a mile. That would have been 108 GiB, > or 42 GiB more than the space actually recovered. > > Am I looking at those numbers the wrong way? Thanks! As your store grows larger the inherent deduplication from the guix-daemon approaches a 3:1 file deduplication ratio. If two files are the same then they are hardlinked to the same actual block on the drive and you save some space. I have found that if you switch to btrfs and add zstd (level 3) compression then you get about another 2:1 on top of that, for around 5.5:1. -- Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-28 9:01 ` Efraim Flashner @ 2024-05-31 22:03 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-06-02 8:24 ` Daemon deduplication and btrfs compression [was Re: Are 'guix gc' stats exaggerated?] Efraim Flashner 2024-06-06 14:17 ` Are 'guix gc' stats exaggerated? Ludovic Courtès 0 siblings, 2 replies; 12+ messages in thread From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-05-31 22:03 UTC (permalink / raw) To: Efraim Flashner; +Cc: guix-devel Hi Efraim, On Tue, May 28 2024, Efraim Flashner wrote: > As your store grows larger the inherent deduplication from the > guix-daemon approaches a 3:1 file deduplication ratio. Thank you for your explanations and your data about btrfs! Btrfs compression is a well-understood feature, although even its developers acknowledge that the benefit is hard to quantify. It probably makes more sense to focus on the Guix daemon here. I hope you don't mind a few clarifying questions. Why, please, does the benefit of de-duplication approach a fixed ratio of 3:1? Does the benefit not depend on the number of copies in the store, which can vary by any number? (It sounds like the answer may have something to do with store size.) Further, why is the removal of hardlinks counted as saving space even when their inode reference count, which is widely available [1] is greater than one? Finally, barring a better solution should our output numbers be divided by three to being them closer to the expected result for users? Thanks! Kind regards, Felix [1] https://en.wikipedia.org/wiki/Hard_link#Reference_counting ^ permalink raw reply [flat|nested] 12+ messages in thread
* Daemon deduplication and btrfs compression [was Re: Are 'guix gc' stats exaggerated?] 2024-05-31 22:03 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-02 8:24 ` Efraim Flashner 2024-06-06 14:17 ` Are 'guix gc' stats exaggerated? Ludovic Courtès 1 sibling, 0 replies; 12+ messages in thread From: Efraim Flashner @ 2024-06-02 8:24 UTC (permalink / raw) To: Felix Lechner; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 3517 bytes --] On Fri, May 31, 2024 at 03:03:47PM -0700, Felix Lechner wrote: > Hi Efraim, > > On Tue, May 28 2024, Efraim Flashner wrote: > > > As your store grows larger the inherent deduplication from the > > guix-daemon approaches a 3:1 file deduplication ratio. > > Thank you for your explanations and your data about btrfs! Btrfs > compression is a well-understood feature, although even its developers > acknowledge that the benefit is hard to quantify. > > It probably makes more sense to focus on the Guix daemon here. I hope > you don't mind a few clarifying questions. > > Why, please, does the benefit of de-duplication approach a fixed ratio > of 3:1? Does the benefit not depend on the number of copies in the > store, which can vary by any number? (It sounds like the answer may > have something to do with store size.) It would seem that this is just my experience and I'm not sure of an actual reason why this is the case. I believe that with the hardlinks only files which are identical would share a link, as opposed to a block based deduplication, where there could be more granular deduplication, so it's quite likely that multiple copies of the same package at the same version would share the majority of their files with the other copies of the package. > Further, why is the removal of hardlinks counted as saving space even > when their inode reference count, which is widely available [1] is > greater than one? I suspect that this part of the code is in the C++ daemon, which no one really wants to hack on. AFAIK Nix turned off deduplication by default years ago to speed up store operations, so I wouldn't be surprised if they also haven't worked on that part of the code. > Finally, barring a better solution should our output numbers be divided > by three to being them closer to the expected result for users? > > [1] https://en.wikipedia.org/wiki/Hard_link#Reference_counting (ins)efraim@3900XT ~$ sudo compsize -x /gnu Processed 39994797 files, 12867013 regular extents (28475611 refs), 20558307 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 56% 437G 776G 2.1T none 100% 275G 275G 723G zstd 32% 161G 500G 1.4T It looks like right now my store is physically using 437GB of space. Looking only at the total the Uncompressed -> Referenced ratio being about 2.77:1 and Disk Usage -> Uncompressed being about 1.78:1, I'm netting a total of 4.92:1. Numbers on Berlin are a bit different: (ins)efraim@berlin ~$ time guix shell compsize -- sudo compsize -x /gnu Processed 41030472 files, 14521470 regular extents (37470325 refs), 17429255 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 59% 578G 970G 3.2T none 100% 402G 402G 1.1T zstd 31% 176G 567G 2.1T real 45m9.762s user 1m53.984s sys 24m37.338s Uncompressed -> Referenced: 3.4:1 Disk Usage -> Uncompressed: 1.68:1 Total: 5.67:1 Looking at it another way, the bits that are compressible with zstd together move from 3.79:1 to 12.22:1, with no change (2.8:1) for the uncompressible bits. -- Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-31 22:03 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-06-02 8:24 ` Daemon deduplication and btrfs compression [was Re: Are 'guix gc' stats exaggerated?] Efraim Flashner @ 2024-06-06 14:17 ` Ludovic Courtès 2024-06-06 19:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 1 sibling, 1 reply; 12+ messages in thread From: Ludovic Courtès @ 2024-06-06 14:17 UTC (permalink / raw) To: Felix Lechner via Development of GNU Guix and the GNU System distribution. Cc: Efraim Flashner, Felix Lechner Hi Felix, Felix Lechner via "Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> skribis: > It probably makes more sense to focus on the Guix daemon here. I hope > you don't mind a few clarifying questions. > > Why, please, does the benefit of de-duplication approach a fixed ratio > of 3:1? Does the benefit not depend on the number of copies in the > store, which can vary by any number? (It sounds like the answer may > have something to do with store size.) Where does that 3:1 figure come from? > Further, why is the removal of hardlinks counted as saving space even > when their inode reference count, which is widely available [1] is > greater than one? Where do you see that in the code? After checking ‘removeUnusedLinks’, I think it counts space savings right. (OTOH, something somewhere is counted wrong, as anyone who’s used ‘guix gc -F…’ has seen; not sure where the bug is!) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-06-06 14:17 ` Are 'guix gc' stats exaggerated? Ludovic Courtès @ 2024-06-06 19:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-06-09 9:19 ` Efraim Flashner 0 siblings, 1 reply; 12+ messages in thread From: Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-06 19:32 UTC (permalink / raw) To: Ludovic Courtès, Felix Lechner via Development of GNU Guix and the GNU System distribution. Cc: Efraim Flashner Hi Ludo' On Thu, Jun 06 2024, Ludovic Courtès wrote: > Where does that 3:1 figure come from? Efraim's experience, I believe. > Where do you see that in the code? After checking > ‘removeUnusedLinks’, I think it counts space savings right. Sorry, I didn't look at the code. I was merely prompted to speculate by the mentioning of hard links and inferred wrongly, it seems, that the discrepancy was related---although in fairness I also doubted that a fixed 3:1 ratio could be credibly explained by deduplication alone. Also, I don't mean to appear critical. Thanks to everyone for your hard work on Guix! Kind regards Felix ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-06-06 19:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. @ 2024-06-09 9:19 ` Efraim Flashner 2024-06-09 9:30 ` Andreas Enge 0 siblings, 1 reply; 12+ messages in thread From: Efraim Flashner @ 2024-06-09 9:19 UTC (permalink / raw) To: Felix Lechner Cc: Ludovic Courtès, Felix Lechner via Development of GNU Guix and the GNU System distribution. [-- Attachment #1: Type: text/plain, Size: 1580 bytes --] On Thu, Jun 06, 2024 at 12:32:52PM -0700, Felix Lechner wrote: > Hi Ludo' > > On Thu, Jun 06 2024, Ludovic Courtès wrote: > > > Where does that 3:1 figure come from? > > Efraim's experience, I believe. I've found that to be my experience, and posted two compsize outputs to show where I got my numbers from. > > Where do you see that in the code? After checking > > ‘removeUnusedLinks’, I think it counts space savings right. > > Sorry, I didn't look at the code. I was merely prompted to speculate by > the mentioning of hard links and inferred wrongly, it seems, that the > discrepancy was related---although in fairness I also doubted that a > fixed 3:1 ratio could be credibly explained by deduplication alone. > > Also, I don't mean to appear critical. Thanks to everyone for your hard > work on Guix! In my not having looked at the code, I'll point out that running `guix gc -C 10G` will clear 10G of items from the store, but will return between 2-10G of real space for future use on the hard drive. Thinking across my various machines, on my desktop and laptop using btrfs this is the case, but on my other machines using ext4 I think the space cleared and what I'm expecting to have free to use do actually match up, but I don't remember paying that much attention to the numbers previously on those machines. -- Efraim Flashner <efraim@flashner.co.il> רנשלפ םירפא GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-06-09 9:19 ` Efraim Flashner @ 2024-06-09 9:30 ` Andreas Enge 2024-06-17 11:24 ` Ludovic Courtès 0 siblings, 1 reply; 12+ messages in thread From: Andreas Enge @ 2024-06-09 9:30 UTC (permalink / raw) To: Felix Lechner, Ludovic Courtès, Felix Lechner via Development of GNU Guix and the GNU System distribution. Am Sun, Jun 09, 2024 at 12:19:55PM +0300 schrieb Efraim Flashner: > In my not having looked at the code, I'll point out that running `guix > gc -C 10G` will clear 10G of items from the store, but will return > between 2-10G of real space for future use on the hard drive. Thinking > across my various machines, on my desktop and laptop using btrfs this is > the case, but on my other machines using ext4 I think the space cleared > and what I'm expecting to have free to use do actually match up, but I > don't remember paying that much attention to the numbers previously on > those machines. In my experience on ext4 (also not backed by looking at the code), "guix gc" always deletes substantially less than what I ask for. I always thought it just counted hard linked files even when the link count does not go to 0 and the file is not actually deleted. For instance, I have tried it just now: $ df -h . /dev/mapper/cryptroot 468G 427G 18G 97% / $ guix gc -F 20G guix gc: 2.931,84 MiB werden freigegeben ... deleted or invalidated more than 3074252800 bytes; stopping $ df -h . /dev/mapper/cryptroot 468G 427G 18G 96% / Andreas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-06-09 9:30 ` Andreas Enge @ 2024-06-17 11:24 ` Ludovic Courtès 0 siblings, 0 replies; 12+ messages in thread From: Ludovic Courtès @ 2024-06-17 11:24 UTC (permalink / raw) To: Andreas Enge Cc: Felix Lechner, Felix Lechner via Development of GNU Guix and the GNU System distribution. Andreas Enge <andreas@enge.fr> skribis: > In my experience on ext4 (also not backed by looking at the code), "guix gc" > always deletes substantially less than what I ask for. I always thought it > just counted hard linked files even when the link count does not go to 0 > and the file is not actually deleted. Yes, that’s also my experience. I did look at the code several times, I even thought 7033c7692ccbbbad8f7b9952015de071a5588e87 in 2020 would fix that estimate, but it didn’t. I guess I’m bad at maths and logic, we should give another look at that part of the code! (Note that creation of sparse files will be another source of discrepancy, though there will be few of them.) Ludo’. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Are 'guix gc' stats exaggerated? 2024-05-26 20:13 Are 'guix gc' stats exaggerated? Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-27 9:10 ` raingloom 2024-05-28 9:01 ` Efraim Flashner @ 2024-05-31 16:33 ` Simon Tournier 2 siblings, 0 replies; 12+ messages in thread From: Simon Tournier @ 2024-05-31 16:33 UTC (permalink / raw) To: Felix Lechner, guix-devel Hi, On Sun, 26 May 2024 at 13:13, Felix Lechner via "Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> wrote: > By my math, about 65.8 GiB were recovered. > > When 'guix gc' was done, it announced: > > [184389 MiB] deleting '/gnu/store/...' > deleting `/gnu/store/trash' > deleting unused links... > note: currently hard linking saves 59224.03 MiB > guix gc: freed 110,649.49 MiBs Well, 180 GiB does not count deduplication, I guess. And as Efraim said, the ratio on average is 3:1 so 65 GiB vs 180 GiB seems consistent, right? However, the question is then: what are these 110 GiB? Cheers, simon ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-06-17 11:25 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-26 20:13 Are 'guix gc' stats exaggerated? Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-27 9:10 ` raingloom 2024-05-28 2:47 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-05-28 9:01 ` Efraim Flashner 2024-05-31 22:03 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-06-02 8:24 ` Daemon deduplication and btrfs compression [was Re: Are 'guix gc' stats exaggerated?] Efraim Flashner 2024-06-06 14:17 ` Are 'guix gc' stats exaggerated? Ludovic Courtès 2024-06-06 19:32 ` Felix Lechner via Development of GNU Guix and the GNU System distribution. 2024-06-09 9:19 ` Efraim Flashner 2024-06-09 9:30 ` Andreas Enge 2024-06-17 11:24 ` Ludovic Courtès 2024-05-31 16:33 ` Simon Tournier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).