From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_RED shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C109F1F5AE for ; Mon, 26 Apr 2021 19:47:02 +0000 (UTC) Received: by mail-qk1-x72d.google.com with SMTP id o5so57784623qkb.0 for ; Mon, 26 Apr 2021 12:47:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linuxfoundation.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to; bh=OUj7AUggz/FV3+evA4+XAiRyy5JRC3vMecXwFLCXW88=; b=JZceJB8TvliK7ebPwp3EgSPvNr//Nmz1lDZe6IlsLRgOK9AlOKWvG+dZTX4BADaWhA XP+eniL4lY6Psg7JTZd8Gq2wTx3zptMoAdfzuuPQA6Yu2b+2IUqufhQegy0OQwj0d12v vjU1zEmU5iF3w3Oh6NmoZcZ6mCSqAhoI1LSdo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to; bh=OUj7AUggz/FV3+evA4+XAiRyy5JRC3vMecXwFLCXW88=; b=CyHhQg+xbNVV5bIZ7TIxf5sJ7UaraAHyk9PLyLy/aEHTuoJoEYfJpom1iV2Xqajnp4 9tgH/QTmTNGkezNBjgYXHu6HifY23aaO/eYa+JsZ9251Pln+b89LSiVgjSUtTa8AwJFL 9yEpE49qMHnY3WuH8akkRcxTGd3gsHbbWUCVmqd8wu82PYcrMms3d+0dQMmCpD2JC9Aq SpxwdnBVs8toHWMUDOeBb4i2PKvY2olaEIfdVwXqal57FXHLj7VwCqdVpk7Dbs3vGq4n 1g95agW8WxBXF4LihgaeAcJmUIVk2a2OoIW/E+dGOhbb047Z+bXyzRp5FuQzpYHhbUAy ioGg== X-Gm-Message-State: AOAM5338/wVagvmkaNCZO9T+QHs+KNb8k08peHerMhJLW8FTMa77OKl4 tHpAC75mjOytZt0CkKkNBWrmnQ== X-Google-Smtp-Source: ABdhPJwan711pih2dQzLcBPqokQeCJYileJvI4mn54DB8a5hXzk4WVhyFURyEhjldBes/qaHdh5oAw== X-Received: by 2002:a05:620a:66b:: with SMTP id a11mr19361285qkh.15.1619466420882; Mon, 26 Apr 2021 12:47:00 -0700 (PDT) Received: from nitro.local (bras-base-mtrlpq5031w-grc-32-216-209-220-18.dsl.bell.ca. [216.209.220.18]) by smtp.gmail.com with ESMTPSA id a26sm13044957qtg.60.2021.04.26.12.47.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Apr 2021 12:47:00 -0700 (PDT) Date: Mon, 26 Apr 2021 15:46:59 -0400 From: Konstantin Ryabitsev To: Eric Wong Cc: meta@public-inbox.org Subject: Re: lei-managed pseudo mailing lists Message-ID: <20210426194659.d5w2nkeqvtyni4ay@nitro.local> Mail-Followup-To: Eric Wong , meta@public-inbox.org References: <20210426164454.5zd5kgugfhfwfkpo@nitro.local> <20210426173726.GA22986@dcvr> <20210426182020.olonbxkc6a2gfzl3@nitro.local> <20210426184717.GA29112@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210426184717.GA29112@dcvr> List-Id: On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote: > > I'm thinking we need the ability to make it a real clonable repository -- > > perhaps without its own xapian index? Actual git repositories aren't large, > > especially if they are only used for direct git operations. Disk space is > > cheap, it's the IO that's expensive. :) > > True, though cache overheads hurt a bit. I also wonder if lei > can increase traffic to public-inbox- to reduce > the need/use of "git clone". > > > If these are real clonable repositories, then it would be easy for people to > > set up replication for just the curated content people want. > > Understood. Using --output v2publicinbox:... w/o --shared is > totally doable. I'm just worried that if we overuse the alternates, then we may find ourselves in a situation where when we repack the "every blob" shared repository, we'll end up with a pack that isn't really optimized to be used by any of the member repos. So, in a situation where a clone is performed, git-upload-pack will have to spend a lot of cycles navigating through the monstrous parent pack just to build and re-compress the small subset of objects it needs to send. Git has ways of dealing with this by allowing to set things like pack islands, but it's finicky and requires that each child repo is defined as refs in the parent repo. We deal with this in grokmirror, but it's messy and requires properly tracking child repo additions/removals/etc. I think it may be one of those cases where wasting disk space on duplicate objects is worth the CPU cycle savings. > > Not really worried about deduping blobs, but I'm wondering how to make it work > > well when search parameters change (see above). E.g.: > > > > 1. we create the repo with one set of parameters > > 2. maintainer then broadens it up to include something else > > 3. maintainer then decides that it's now *way* too much and narrows it down again > > > > We don't really want step 2 to lead to a permanent ballooning of the > > repository, so perhaps all query changes should force-append a dt: with the > > open-ended datetime of the change? Or do you already have a way to deal with > > this situation? > > The aforementioned maxuid prevents stuff that's too old from > being seen. Otherwise, there's always "public-inbox-learn rm". How would it handle the situation where we import a new list into lore with a 10-year-long archive of messages? -K