unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: lei-managed pseudo mailing lists
Date: Mon, 26 Apr 2021 15:46:59 -0400	[thread overview]
Message-ID: <20210426194659.d5w2nkeqvtyni4ay@nitro.local> (raw)
In-Reply-To: <20210426184717.GA29112@dcvr>

On Mon, Apr 26, 2021 at 06:47:17PM +0000, Eric Wong wrote:
> > I'm thinking we need the ability to make it a real clonable repository --
> > perhaps without its own xapian index? Actual git repositories aren't large,
> > especially if they are only used for direct git operations. Disk space is
> > cheap, it's the IO that's expensive. :)
> 
> True, though cache overheads hurt a bit.  I also wonder if lei
> can increase traffic to public-inbox-<imapd|nntpd> to reduce
> the need/use of "git clone".
> 
> > If these are real clonable repositories, then it would be easy for people to
> > set up replication for just the curated content people want.
> 
> Understood.  Using --output v2publicinbox:... w/o --shared is
> totally doable.

I'm just worried that if we overuse the alternates, then we may find ourselves
in a situation where when we repack the "every blob" shared repository, we'll
end up with a pack that isn't really optimized to be used by any of the
member repos. So, in a situation where a clone is performed, git-upload-pack
will have to spend a lot of cycles navigating through the monstrous parent
pack just to build and re-compress the small subset of objects it needs to
send.

Git has ways of dealing with this by allowing to set things like pack islands,
but it's finicky and requires that each child repo is defined as refs in the
parent repo. We deal with this in grokmirror, but it's messy and requires
properly tracking child repo additions/removals/etc.

I think it may be one of those cases where wasting disk space on duplicate
objects is worth the CPU cycle savings.

> > Not really worried about deduping blobs, but I'm wondering how to make it work
> > well when search parameters change (see above). E.g.:
> > 
> > 1. we create the repo with one set of parameters
> > 2. maintainer then broadens it up to include something else
> > 3. maintainer then decides that it's now *way* too much and narrows it down again
> > 
> > We don't really want step 2 to lead to a permanent ballooning of the
> > repository, so perhaps all query changes should force-append a dt: with the
> > open-ended datetime of the change? Or do you already have a way to deal with
> > this situation?
> 
> The aforementioned maxuid prevents stuff that's too old from
> being seen.  Otherwise, there's always "public-inbox-learn rm".

How would it handle the situation where we import a new list into lore with a
10-year-long archive of messages?

-K

  reply	other threads:[~2021-04-26 19:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 16:44 lei-managed pseudo mailing lists Konstantin Ryabitsev
2021-04-26 17:37 ` Eric Wong
2021-04-26 18:20   ` Konstantin Ryabitsev
2021-04-26 18:47     ` Eric Wong
2021-04-26 19:46       ` Konstantin Ryabitsev [this message]
2021-04-26 20:34         ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210426194659.d5w2nkeqvtyni4ay@nitro.local \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).