From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: Re: public-inbox + mlmmj best practices?
Date: Tue, 5 Jan 2021 01:06:43 +0000 [thread overview]
Message-ID: <20210105010643.GA20926@dcvr> (raw)
In-Reply-To: <20210104201245.cbtqno6cyxw5iycu@chatter.i7.local>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Mon, Dec 28, 2020 at 09:31:39PM +0000, Eric Wong wrote:
> > AFAIK, V2Writable always does the right thing on -purge/-edit;
> > at least for WWW users(*).
> >
> > V2W does more work in rare cases when history gets rewritten,
> > but doesn't track anything beyond the latest indexed commit
> > hash.
> >
> > In the V2Writable::log_range sub, it uses "git merge-base --is-ancestor"
> > (via is_ancestor wrapper) to cover the common case of contiguous history.
> >
> > Otherwise, it attempts "git merge-base" to find a common ancestor:
> >
> > if (common_ancestor_found)
> > unindex some history starting at common ancestor
> > reindex from common ancestor
> > else
> > unindex all history in epoch
> > reindex epoch from stratch
>
> I think I understand, but in the case of grok-pi-piper, unindexing is not an
> option, since we can't control what the receiving-end app has already done
> with the messages we have previously piped to it. We can't assume that it will
> do the right thing when it receives duplicate messages, so we need to somehow
> make sure that we don't pipe the same message twice.
Nevermind, I just reread my code more carefully :x
Actually the unindexing code currently stores an {unindexed}
hash which is a { Message-ID => (NNTP )num } mapping
Which allows most unedited messages keep the same NNTP article
number so clients don't see it twice. "Most" meaning non-broken
messages which don't have reused Message-IDs.
I'm thinking {unindexed} should be a
{ OID => [ num, Message-ID ] } mapping
That would allow the new version of the edited message to be
piped and seen by NNTP/IMAP readers.
You *do* want to pipe the new version of the message you've
edited, right?
> > AFAIK, the common_ancestor_found case is always true unless
> > somebody was wacky enough to run a full gc+prune immediately
> > after fetching. IOW, I don't think the else case happens
> > in practice.
>
> :) It kinda does in grok-pi-piper case, since one of the config options is to
> continuously "reshallow" the repository to basically contain no objects.
>
> https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_piper.py#n58
>
> I know that this is "wacky" as you say, but it helps save dramatic amounts of
> space when cloning most of lore.kernel.org repositories. We can still use "git
> fetch --deepen" when necessary, but this does make it impossible to use the
> common ancestor strategy when dealing with history rewrites.
Understood. So yeah, actually the current {unindexed} hash in
V2Writable mostly does what we want, but I'm preparing a patch
which does the aforementioned { OID => [ num, Message-ID ] }
mapping.
next prev parent reply other threads:[~2021-01-05 1:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-21 21:20 public-inbox + mlmmj best practices? Konstantin Ryabitsev
2020-12-21 21:39 ` Eric Wong
2020-12-22 6:28 ` Eric Wong
2020-12-28 16:22 ` Konstantin Ryabitsev
2020-12-28 21:31 ` Eric Wong
2021-01-04 20:12 ` Konstantin Ryabitsev
2021-01-05 1:06 ` Eric Wong [this message]
2021-01-05 1:29 ` [PATCH] v2writable: exact discontiguous history handling Eric Wong
2021-01-09 22:21 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210105010643.GA20926@dcvr \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).