From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id qHQ6HEhP41+lDwAA0tVLHw (envelope-from ) for ; Wed, 23 Dec 2020 14:08:08 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id CA4iGEhP418xPQAA1q6Kng (envelope-from ) for ; Wed, 23 Dec 2020 14:08:08 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 8F0639403A2 for ; Wed, 23 Dec 2020 14:08:07 +0000 (UTC) Received: from localhost ([::1]:50586 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ks4o9-0005O6-SW for larch@yhetil.org; Wed, 23 Dec 2020 09:08:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:60398) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ks4nq-0005Nz-54 for guix-devel@gnu.org; Wed, 23 Dec 2020 09:07:46 -0500 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]:38500) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ks4no-0005Q7-Gp; Wed, 23 Dec 2020 09:07:45 -0500 Received: by mail-ed1-x52a.google.com with SMTP id cw27so16327632edb.5; Wed, 23 Dec 2020 06:07:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version:content-transfer-encoding; bh=6ZiwwVYEMLNnPkO6qThDIV+SsjU0tUGhMRUkNwmIsUE=; b=fKiRjK78EhJbLFA6ThtDSUu4jVIm389QPdirkkh0pBUcjM6OpSkcD2Re3epxoQU8xT hf3i3C6gREsvWV/QpoIIK04ivZfgtKU9yRL/WJbpdUdiUcwOccxk1V/jfQ1TBti57w0l VDzcww8V39gLTRt1VaLRy5mPABZxBbM6sRzRJoQ+IfBzDQYKzeB8T3hbEy2vWBa8Btpe /gdFR9jb4wxVMbGYw2sfCW7y3wA/jEV9Ul1VIhtETzRmEB5VMiRmQnYVlOfDjslscchA RYwsKn0pQmMl9ShWjovZQk8oIfVryKbVqo3Km5YqX4+3sszQh6b0fQ7MxbysWGy1EyJg wGBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=6ZiwwVYEMLNnPkO6qThDIV+SsjU0tUGhMRUkNwmIsUE=; b=KZNzwJMr442Upa444eLxpaHl1Jc5ct95IHJaTMM8RGIXZD71rCKQSCML9w5fSQLIyX ibwvYgI+fOQVHipvmlunLZh3weSIyarS1q1PLAkt2dv5Wo/HWYqoitvZttZe4eKrU2VN 6NYVaS/5PazYSohTt1uMl4F7xM0cZRLFvM2uKouNnISdWOK8d86diVg+UK2eYjrFzx6L GObzNArVCssvdfNnZKGIUyOlmO5ULCCs+cCZ5t6zKfH1LhD6HZmJwxO64BRPtexQQr+Y l8ch9e1FX5oY4Be2f3PODTkL+YAtKh9Wz1WEMoDplse1F/Js5eqzxngPVmj1OS6lzVX1 Hjxg== X-Gm-Message-State: AOAM531cu3pxQ+Jg4rUNQkkpl3T1BRVL7KtsqlXCWOYZUYBD1VkPZRWy 6zw/E0dFe7JFtwwVyyHpXUcJNkywHWw= X-Google-Smtp-Source: ABdhPJz+7UsTAnk0C1JdapLcpRbmYiETGf7RAVamKLIs2jNq4ne4BXu7qkZNpldNjxli8Z77EZ8OBw== X-Received: by 2002:aa7:c753:: with SMTP id c19mr24897181eds.358.1608732462541; Wed, 23 Dec 2020 06:07:42 -0800 (PST) Received: from lili ([2a01:e35:2e80:2030:3231:7ae:dc70:aacf]) by smtp.gmail.com with ESMTPSA id e11sm30727481edj.44.2020.12.23.06.07.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Dec 2020 06:07:41 -0800 (PST) From: zimoun To: Miguel =?utf-8?Q?=C3=81ngel?= Arruga Vivas , Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: Identical files across subsequent package revisions In-Reply-To: <878s9oy8f7.fsf@gmail.com> References: <87wnx9wlea.fsf@gnu.org> <878s9oy8f7.fsf@gmail.com> Date: Wed, 23 Dec 2020 15:07:23 +0100 Message-ID: <86wnx8r4ys.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::52a; envelope-from=zimon.toutoune@gmail.com; helo=mail-ed1-x52a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -3.03 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=fKiRjK78; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: 8F0639403A2 X-Spam-Score: -3.03 X-Migadu-Scanner: scn1.migadu.com X-TUID: RuvcO9KXf13a Hi, On Wed, 23 Dec 2020 at 14:10, Miguel =C3=81ngel Arruga Vivas wrote: > Probably you're already aware of it, but I want to mention that > Tridgell's thesis[1] contains a very neat approach to this problem. This thesis is a must to read! :-) > A naive prototype would be copying of the latest available nar of the > package on the client side and using it as the destination for a copy > using rsync. Either the protocol used by the rsync application, or a > protocol based on those ideas, could be implemented over the HTTP layer; > client and server implementation and cooperation would be needed > though. I could misunderstand and miss something, one part of the problem is how to detect =E2=80=9Clatest=E2=80=9D; other said how to know it is different.= From my memories, and I have drunk couple of beers since I read the thesis :-), the =E2=80=99rsync=E2=80=99 approach uses timestamp and size. And if you s= witch to checksum instead, the performances are poor, because of IO. Well, it depends on the number of files and their size, if this checksum are computed ahead, etc. > Another idea that might fit well into that kind of protocol---with > harder impact on the design, and probably with a high cost on the > runtime---would be the "upgrade" of the deduplication process towards a > content-based file system as git does[2]. This way the a description of > the nar contents (size, hash) could trigger the retrieval only of the > needed files not found in the current store. Is it not related to Content-Addressed Store? i.e, =C2=ABintensional model= =C2=BB? Chap. 6: Nix FRC: Cheers, simon