From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms1.migadu.com with LMTPS id aBHDDmYsXGaJzAAAe85BDQ:P1 (envelope-from ) for ; Sun, 02 Jun 2024 10:25:10 +0200 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id aBHDDmYsXGaJzAAAe85BDQ (envelope-from ) for ; Sun, 02 Jun 2024 10:25:10 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=I8dJVtVN; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1717316710; a=rsa-sha256; cv=none; b=C8MmCGJGb7nEJLMpZqSJSYR4CbvYXGExz1AmrWiZUQaZdlCC3aWffxEGVYrAj9L3ARHklC OG7kukCuqxiuJPaKBbd6ObFoZipcUFV46Nm9QC2bg+8dXS/JHOYObYyoePid52KXFzUF61 INs0xjakV4mMf8wZYBt/PJGD0ES0JTEzpBqJfCgYJxkRQMIg4KQBAxRRvfQSPpBqP9kyxG X9NzQxpBkzvpRPMtx3hJgL3iiaelwSNS5ov9zR+URe5gSiQEmC4x9V7KiuIGCf4oIMCwVY wFD+tZS/JVduKbU6HRjv4PLpoM+o760kfMeM0rH0Lsxg58jHAqvoZlHQFOFFyg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=I8dJVtVN; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1717316710; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=gYzaV/ZP8Za55WxJZs+sVWmC5a/oVNKP6FfTeTx8sMg=; b=uAsF1+B2r/aUpj3PDK6RIAN8db47nRcRhGlZIF3o2c5ob7PfzmxTzsvYc1YFLbsAmOh7b8 mbzVI7eABrDDzVJWRA4kHOmpPhyPxdZCsMZit9jb3GNuFeonQE5Rr/pLnh9qloecgJy24M Y4yuoxsgTz+xXR0omb8CUBGVro8i/d60SCCaXuQxLGvq6z90iNIpRBOFq7uaCya6VLFSTx ijzJI5HA+VUB1wTGlFCxRDOIYPgVe97kJ9vYWqXB8+Pm+W82FduRO8HDnGX0aZkg4gnp+q JW3XnzakEWp4bq75DM8aq//Zx5uPOggUzi9W1dTkEJCMBKW+KzfHNNFus47yKg== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 075C016981 for ; Sun, 2 Jun 2024 10:25:10 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sDgVx-0002xo-Lj; Sun, 02 Jun 2024 04:24:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sDgVv-0002xT-HX for guix-devel@gnu.org; Sun, 02 Jun 2024 04:24:27 -0400 Received: from mail-wm1-x336.google.com ([2a00:1450:4864:20::336]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sDgVt-0001a2-Kf for guix-devel@gnu.org; Sun, 02 Jun 2024 04:24:27 -0400 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4212a3e82b6so14479695e9.0 for ; Sun, 02 Jun 2024 01:24:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717316663; x=1717921463; darn=gnu.org; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:sender:from:to :cc:subject:date:message-id:reply-to; bh=gYzaV/ZP8Za55WxJZs+sVWmC5a/oVNKP6FfTeTx8sMg=; b=I8dJVtVNfknnXBosOkTpi6jMeBnK26QDDS+/IONNatnx++VDNu9JHh/OGGXDApN5xM WepTefs2Bvq854CIscXoFt3//+UV/lD3CqhA3SNUq8TnLamKAOw+HVktv6ov3wplFC6i rT3cXh7Ys/vpk/AufwRQtWoJs10s+ZEZkURobUvh4mHKFERCz2gthqJfh/5pblP5L/hB BczX6BuHnnsnUCei+rhzKrIRFZHBwMcAC/qmjO9loOZTx/bBvAeMCCZpRDoF+n+U2pNi Vxby0jCXuJYJIGoNJp2Qdlat4evUWVV9Kd+CniranYVRmWykyET50PmMhCBdEkgzQ9xW s4Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717316663; x=1717921463; h=in-reply-to:content-disposition:mime-version:references :mail-followup-to:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gYzaV/ZP8Za55WxJZs+sVWmC5a/oVNKP6FfTeTx8sMg=; b=ogAorIHoOE/niOZTrKYQHD47xAHyH2S5mse/nkKUkVfEO/UyWx18+lA+MibxEOWxQe b4AbnyOxjkUbEtK09hUvhShSIwA04T7G/m+/A4O1HT4S1wsn7jBoU004LAj215b/HRf7 FDNHQCr41eeVQP5w1tUjeKZJEjcJ/7TrbXgAuhUEvJTTMddceB1L39P/vn32DrUZd/+J EfKP8rl53Ldw2RTBlBt9ei7YIfyrO82WEcsowikfMW7NQ/QLlnneCzJ4vS5RsPzlj/Dh qTYN0NabbIYroWCD2bYqKbAAYeuOnjMpe6d+GLnEkKehvlsy11P5wWSe6IJ31sPnVU1I R2jA== X-Gm-Message-State: AOJu0YylVV08Ddt2dSVFxZY6HpqTD7xT5utLwYq6GFmeu/spLaFQFZMf rINAPDaMv5KbuNJC6Mi/CNnYoLdGyI2X7IXDDQx4U76vuZEGOXqZntO7DXd2 X-Google-Smtp-Source: AGHT+IEUuNNe4KAU03UtvbTWGIefLwelekAHrAoj53go50PTUFKLSLkmiY4EW3xTVwBkF9yyUxtGLQ== X-Received: by 2002:a05:600c:4f05:b0:421:347a:f0a6 with SMTP id 5b1f17b1804b1-421347af365mr33978765e9.3.1717316662785; Sun, 02 Jun 2024 01:24:22 -0700 (PDT) Received: from localhost ([141.226.15.89]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42133227f8asm56945365e9.19.2024.06.02.01.24.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jun 2024 01:24:22 -0700 (PDT) Date: Sun, 2 Jun 2024 11:24:20 +0300 From: Efraim Flashner To: Felix Lechner Cc: guix-devel@gnu.org Subject: Daemon deduplication and btrfs compression [was Re: Are 'guix gc' stats exaggerated?] Message-ID: Mail-Followup-To: Felix Lechner , guix-devel@gnu.org References: <87bk4stjpi.fsf@lease-up.com> <87a5k5oczg.fsf@lease-up.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="D4OPgymAKpGUoyv6" Content-Disposition: inline In-Reply-To: <87a5k5oczg.fsf@lease-up.com> X-PGP-Key-ID: 0x41AAE7DCCA3D8351 X-PGP-Key: https://flashner.co.il/~efraim/efraim_flashner.asc X-PGP-Fingerprint: A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Received-SPF: pass client-ip=2a00:1450:4864:20::336; envelope-from=efraim.flashner@gmail.com; helo=mail-wm1-x336.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 075C016981 X-Migadu-Scanner: mx12.migadu.com X-Migadu-Spam-Score: -3.94 X-Spam-Score: -3.94 X-TUID: k5L39p1cptGY --D4OPgymAKpGUoyv6 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 31, 2024 at 03:03:47PM -0700, Felix Lechner wrote: > Hi Efraim, >=20 > On Tue, May 28 2024, Efraim Flashner wrote: >=20 > > As your store grows larger the inherent deduplication from the > > guix-daemon approaches a 3:1 file deduplication ratio. >=20 > Thank you for your explanations and your data about btrfs! Btrfs > compression is a well-understood feature, although even its developers > acknowledge that the benefit is hard to quantify. >=20 > It probably makes more sense to focus on the Guix daemon here. I hope > you don't mind a few clarifying questions. >=20 > Why, please, does the benefit of de-duplication approach a fixed ratio > of 3:1? Does the benefit not depend on the number of copies in the > store, which can vary by any number? (It sounds like the answer may > have something to do with store size.) It would seem that this is just my experience and I'm not sure of an actual reason why this is the case. I believe that with the hardlinks only files which are identical would share a link, as opposed to a block based deduplication, where there could be more granular deduplication, so it's quite likely that multiple copies of the same package at the same version would share the majority of their files with the other copies of the package. > Further, why is the removal of hardlinks counted as saving space even > when their inode reference count, which is widely available [1] is > greater than one? I suspect that this part of the code is in the C++ daemon, which no one really wants to hack on. AFAIK Nix turned off deduplication by default years ago to speed up store operations, so I wouldn't be surprised if they also haven't worked on that part of the code. > Finally, barring a better solution should our output numbers be divided > by three to being them closer to the expected result for users? >=20 > [1] https://en.wikipedia.org/wiki/Hard_link#Reference_counting (ins)efraim@3900XT ~$ sudo compsize -x /gnu Processed 39994797 files, 12867013 regular extents (28475611 refs), 2055830= 7 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 56% 437G 776G 2.1T none 100% 275G 275G 723G zstd 32% 161G 500G 1.4T It looks like right now my store is physically using 437GB of space. Looking only at the total the Uncompressed -> Referenced ratio being about 2.77:1 and Disk Usage -> Uncompressed being about 1.78:1, I'm netting a total of 4.92:1. Numbers on Berlin are a bit different: (ins)efraim@berlin ~$ time guix shell compsize -- sudo compsize -x /gnu Processed 41030472 files, 14521470 regular extents (37470325 refs), 1742925= 5 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 59% 578G 970G 3.2T none 100% 402G 402G 1.1T zstd 31% 176G 567G 2.1T real 45m9.762s user 1m53.984s sys 24m37.338s Uncompressed -> Referenced: 3.4:1 Disk Usage -> Uncompressed: 1.68:1 Total: 5.67:1 Looking at it another way, the bits that are compressible with zstd together move from 3.79:1 to 12.22:1, with no change (2.8:1) for the uncompressible bits. --=20 Efraim Flashner =D7=A8=D7=A0=D7=A9=D7=9C=D7=A4 = =D7=9D=D7=99=D7=A8=D7=A4=D7=90 GPG key =3D A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted --D4OPgymAKpGUoyv6 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAmZcLDEACgkQQarn3Mo9 g1HsLw//c4zd3yNyUQLFVFYl25MyCNi+mCSfS/I1fBo8O5VvOqNr8McmZtMSv4nn 2z6BjEKEhWOc+AoR5jzVi8RQ17KEMoTssjNNiQrgjX0NQR199TeX5P6QsuU12bKe ax1ZC6gM5MbyBXoLsQpjIyIg80+L3s5r+m1kSuHeC2imSuaSSFG4dXGI0oegxqKF qvX3MZDEhcbPlrCj7CmE4lAwkCLgly9zuNP8p31iu9SymuiPL0SNJg7+wJJMbvNd WwiDe5OOGf8ymoPuUePUNo5RFL2WiXBJrSoklHngtfpHIMqswY2rfNnt7mecpuXD CJevHy3/DheE33XKj83F6COJQiZq06y6oIAN+1VC3IKx+isfOkm9//K8/lF8TqnK axJ0LxXKb4sFDht97v26ax0mDCS8/ZpLp4LAbV25l/RRkJEDthem2eQfiCamR816 W38mauHKU+ae9lXXoz19Mk3yKpFAIoGH6e+xxhfiZBGS5Pvtg3ocxli2P/KvVrzR r1Cg6knz4CotbnDZUud4isUvW1f1GgwqFBFSunZ+b+vsucJwE+amzGm9kEOLWff5 4V1PgHgPe5i+yNTfQpn71ZjdaiWYKwAmIZHJ3eSyB1lKXi6hrlm0OHIG4vUXbowc xYsA/03spmghzB/pjNCm1aLwRtaxTF40Fy1ObgeE7ErqURn9pX4= =Bp7/ -----END PGP SIGNATURE----- --D4OPgymAKpGUoyv6--