Ludovic Courtès writes: > Experience has shown that keeping too many entries increases disk usage > and, more importantly, leads to long delays when cleaning up the cache, > measured in minutes on slow or busy HDDs with hundreds of thousands of > cache entries, as is common on build machines. In those cases, the cost > of the cache outweighs its benefit. > > * guix/scripts/substitute.scm (%narinfo-expired-cache-entry-removal-delay): > Reduce to 5 days. > (cached-narinfo-expiration-time)[max-ttl]: Reduce to 2 days. > > Change-Id: Iab212f572ee9041be61716423a3c014f93fe81ed > --- > guix/scripts/substitute.scm | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > Hello, > > Chris mentioned it before and I experienced it the hard way on bayfront: > > https://lists.gnu.org/archive/html/guix-devel/2024-05/msg00177.html > > A big narinfo cache is a significant performance hit on spinning HDDs > when the time comes to remove expired entries. > > This change makes the cache more ephemeral (2 to 5 days). I still think > some caching is needed: one will often run several Guix commands in a > day that will query the same narinfos and will only download/build a > small subset (keep in mind that that ‘substitution-oracle’, used by > ‘derivation-build-plan’, query narinfos for the closure of the requested > derivations, minus those already valid); it would be wasteful and > inefficient to download them over and over again. I’d like to have > metrics to estimate that, but I don’t. > > Thoughts? This sounds good to me. I think one of the problems on bayfront is that each substitute process looks and decides it's time to remove the expired cache entries. For every new process that starts and decides to join it, it probably slows them all down. This is very similar to a "thundering herd" since the processes trip over each other trying to delete the same files. This change won't directly address that part of the issue, but maybe keeping the cache smaller will help reduce the impact when this happens.