From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id z38eFn1C41+aHAAA0tVLHw (envelope-from ) for ; Wed, 23 Dec 2020 13:13:33 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id yGN3EX1C419HGwAA1q6Kng (envelope-from ) for ; Wed, 23 Dec 2020 13:13:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id EEEAA940429 for ; Wed, 23 Dec 2020 13:13:32 +0000 (UTC) Received: from localhost ([::1]:52616 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ks3xL-0007pk-Ru for larch@yhetil.org; Wed, 23 Dec 2020 08:13:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50788) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ks3uv-0006Jb-FX for guix-devel@gnu.org; Wed, 23 Dec 2020 08:11:01 -0500 Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]:43383) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ks3uq-0002xn-HZ; Wed, 23 Dec 2020 08:11:01 -0500 Received: by mail-wr1-x432.google.com with SMTP id y17so18564678wrr.10; Wed, 23 Dec 2020 05:10:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=4HrlMNzttULbOmCfC7EiplUOTjOr1S/VwtgzrFm1Fgs=; b=bSmn82KMU2QLBkVb7ASab8OpM6snLdjSdGsFazp76gfDaJ1K5lkzolKHJHlJzVneCu XcNqcJ4tPJCwCBHP4nTSDMdKFN6BYHa6+h+qi6b/HVvd4+KizHRS6ePGNNQHbEkB4gjk RXoHJsC/HN9FHWUj47phUDBPgCADkFxAR98ntrL9xkwRGBYWCYIsFRg0c4C8G9QTTBDr /msx2Pxqj+z6uLG8/QWOiE2YB3VgO4CgB5AMMfOgaV/Kb2cwl0UzLt/Y+rTCQyRZB66P T/1aFvrvOs3+Gs+XyqKayB2fQx65tRmeKSiHz28osZQ2mhHexqwT1bazu6BMKXB3TgK/ Mz5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=4HrlMNzttULbOmCfC7EiplUOTjOr1S/VwtgzrFm1Fgs=; b=GbFezRKnI+TtyE6aAHCsjJSPd+g3C3i5vngAFWEOO7ebtEGIO3DIP/wCUeEL6inOeU /B5ULmM2MGciIxf/8QaoYNJRB/C2w7OWLwTAOtO6cZTCIjdFkLOmzHqPsBAhoBXry6Uz KXDYFQNlZz4VowN9UFqOGP32ShgYBZhc+MEzs5v8pBL5mq8FTivpMFVKZCdI+0x1Di+s Mm9+rulYeiTjTZnTKcwe2MspW4EkgYJk/fERXtZz45hDSHKiZgy2EeT/gPSw7bMuvTQ4 xeFIC1qVo4m/Lex3IP8dLLidQsX7uFDNcyeq7KTSFJ6ffihFxHTSel8tef25yWHlYt3k NBHw== X-Gm-Message-State: AOAM531HnaDh4LYUAmJ8eZ0w3TmmrHUxSvlfo+Hd2gYMVGiBU4Jl+K0k jQ/cuDW5I256QfzgvBdgURGDZZLvAzs= X-Google-Smtp-Source: ABdhPJyZnCMGSnQG4/BzqDi69VWdt00ULrO9IdeVY2+I/X0FTdhjcgl21PNrTL31VT2an8WUEd4v6A== X-Received: by 2002:a5d:5401:: with SMTP id g1mr29056592wrv.93.1608729054206; Wed, 23 Dec 2020 05:10:54 -0800 (PST) Received: from unfall (36.193.158.146.dynamic.jazztel.es. [146.158.193.36]) by smtp.gmail.com with ESMTPSA id i11sm11975251wmq.10.2020.12.23.05.10.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Dec 2020 05:10:53 -0800 (PST) From: =?utf-8?Q?Miguel_=C3=81ngel_Arruga_Vivas?= To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Identical files across subsequent package revisions References: <87wnx9wlea.fsf@gnu.org> Date: Wed, 23 Dec 2020 14:10:52 +0100 In-Reply-To: <87wnx9wlea.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Tue, 22 Dec 2020 23:01:17 +0100") Message-ID: <878s9oy8f7.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::432; envelope-from=rosen644835@gmail.com; helo=mail-wr1-x432.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -3.03 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=bSmn82KM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: EEEAA940429 X-Spam-Score: -3.03 X-Migadu-Scanner: scn1.migadu.com X-TUID: O65PFHDl9qW7 Hi Ludo, Just one interjection: wow! :-) Ludovic Court=C3=A8s writes: > Hello Guix! > > Every time a package changes, we end up downloading complete substitutes > for itself and for all its dependents, even though we have the intuition > that a large fraction of the files in those store items are unchanged. It's great you're taking a look into these kind of optimizations, as they also close the gap between only-binary distribution and the substitutes system. > [Awesome data collection omitted for brevity] > > Thoughts? :-) Probably you're already aware of it, but I want to mention that Tridgell's thesis[1] contains a very neat approach to this problem. A naive prototype would be copying of the latest available nar of the package on the client side and using it as the destination for a copy using rsync. Either the protocol used by the rsync application, or a protocol based on those ideas, could be implemented over the HTTP layer; client and server implementation and cooperation would be needed though. Another idea that might fit well into that kind of protocol---with harder impact on the design, and probably with a high cost on the runtime---would be the "upgrade" of the deduplication process towards a content-based file system as git does[2]. This way the a description of the nar contents (size, hash) could trigger the retrieval only of the needed files not found in the current store. Nonetheless, these are only thoughts, I'll ping back if and when I have something more tangible. ;-) Happy hacking! Miguel [1] https://rsync.samba.org/~tridge/phd_thesis.pdf [2] https://git-scm.com/book/en/v2/Git-Internals-Git-Objects