From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 200C21F934 for ; Mon, 4 Jan 2021 20:12:50 +0000 (UTC) Received: by mail-qv1-xf30.google.com with SMTP id a13so13651363qvv.0 for ; Mon, 04 Jan 2021 12:12:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=Cadr/SSeUtO6wjIoYTzFlYy+tEMW6Fc9JRUk1FYUJok=; b=H58QyybxAo/CXaP4idtWq/iljN9twwB7Ib3bbuGpdpj5uhD4MqzVNOvbhP/Kw/M8Tv XXt0u8lFRwcDE/XK67QAdWonL8BGy5B0pSftEiHfzxzixaBo12AdjpL4AtJtfZNojDU3 Y6jrc0fyvaXnU+xml+oA95E0oMFuk9v7ivLpk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=Cadr/SSeUtO6wjIoYTzFlYy+tEMW6Fc9JRUk1FYUJok=; b=lbyoxiRY1vrZq8iSzSTF7osuK/WeCVHfLF5+W/H42v5Cw2NIc+K/+erk68pQ0FlGkK amn5wnA36+OJ6FI1fkV9OD53iMcSJARllA2y9Jevthc7dPpwWNNfSEjmJRfH4j3JP3lY 45TDjka54x9r+l7MeeuIOEfPUsMgEVlh8/heymiNMNb0CAsUqi44/A+3GUpGpUBLMIQf QI5YueerV5fQezKFbJmh8vOkJgCDqk0SdoDJ41bKn2qsRzyLIv1wFLKRqZjpXOQWMj2c Cf6W74hAt/aSuVW9IwuhLrKw1X7nA1zTNPMPgK4N/jfiK90DyR4ORm+3pQUTkAb434O3 IaXA== X-Gm-Message-State: AOAM530rO9YryIh15xzK3Gl+tFx2Gv9U9os5fzI1HHyVJYekzOPy1YXN I5hfinCahTZ0GAZeXs2hEBKCoA== X-Google-Smtp-Source: ABdhPJzN9F1xunvMyWYi8w1TkZ7PkYPUxEFbscgufy0pQ6KpcYXt5CQCuR4rBhVM4mLfC93Tlli2/Q== X-Received: by 2002:ad4:436b:: with SMTP id u11mr77343028qvt.21.1609791168406; Mon, 04 Jan 2021 12:12:48 -0800 (PST) Received: from chatter.i7.local ([89.36.78.230]) by smtp.gmail.com with ESMTPSA id d190sm38086825qkc.14.2021.01.04.12.12.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Jan 2021 12:12:47 -0800 (PST) Date: Mon, 4 Jan 2021 15:12:45 -0500 From: Konstantin Ryabitsev To: Eric Wong Cc: meta@public-inbox.org Subject: Re: public-inbox + mlmmj best practices? Message-ID: <20210104201245.cbtqno6cyxw5iycu@chatter.i7.local> Mail-Followup-To: Eric Wong , meta@public-inbox.org References: <20201221212032.syunaxzrvcqcrose@chatter.i7.local> <20201221213914.GA9374@dcvr> <20201222062808.GA4522@dcvr> <20201228162218.zcnqxkgwa2i3nt66@chatter.i7.local> <20201228213139.GA17600@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20201228213139.GA17600@dcvr> List-Id: On Mon, Dec 28, 2020 at 09:31:39PM +0000, Eric Wong wrote: > AFAIK, V2Writable always does the right thing on -purge/-edit; > at least for WWW users(*). > > V2W does more work in rare cases when history gets rewritten, > but doesn't track anything beyond the latest indexed commit > hash. > > In the V2Writable::log_range sub, it uses "git merge-base --is-ancestor" > (via is_ancestor wrapper) to cover the common case of contiguous history. > > Otherwise, it attempts "git merge-base" to find a common ancestor: > > if (common_ancestor_found) > unindex some history starting at common ancestor > reindex from common ancestor > else > unindex all history in epoch > reindex epoch from stratch I think I understand, but in the case of grok-pi-piper, unindexing is not an option, since we can't control what the receiving-end app has already done with the messages we have previously piped to it. We can't assume that it will do the right thing when it receives duplicate messages, so we need to somehow make sure that we don't pipe the same message twice. > AFAIK, the common_ancestor_found case is always true unless > somebody was wacky enough to run a full gc+prune immediately > after fetching. IOW, I don't think the else case happens > in practice. :) It kinda does in grok-pi-piper case, since one of the config options is to continuously "reshallow" the repository to basically contain no objects. https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_piper.py#n58 I know that this is "wacky" as you say, but it helps save dramatic amounts of space when cloning most of lore.kernel.org repositories. We can still use "git fetch --deepen" when necessary, but this does make it impossible to use the common ancestor strategy when dealing with history rewrites. -K