From: Eric Wong <e@80x24.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: meta@public-inbox.org
Subject: msgmap serial number regeneration [was: Q: V2 format]
Date: Fri, 13 Jul 2018 22:22:00 +0000 [thread overview]
Message-ID: <20180713222200.GB27845@dcvr> (raw)
In-Reply-To: <87h8l2ykb4.fsf@xmission.com>
"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> > Eric Wong <e@80x24.org> writes:
> >> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >>>
> >>> Because of the parallelism in V2 I have noticed messages in numbered
> >>> in an order that does not correspond to their commit order. So the
> >>> SQLite database isn't as recoverable as it might be. Especially as the
> >>> parallelism introduces an element of non-determinancy.
> >>
> >> *puzzled* were you able to reproduce that? The serial number
> >> generation + threading happens in the main process and the
> >> parallelism is limited to Xapian text indexing. -index
> >> generates serial numbers by walking backwards with v2, and
> >> complains on unexpected results.
>
> Digging into this I have found consistenly non-reproducible numbering,
> because of deleted files. Apparently in both V1 and V2 an a worst-case
> estimate is made of the total numbers that are going to be needed and
> numbers are assigned backwards from there.
>
> A fresh indexing of the git mailling list archive on v1 gives me numbers
> starting with 360 and on v2 numbers starting with 355. Which
> corresponds with the number of deleted messages.
>
> I am still looking to see if there are any other weird things here.
Ah, yes, you're correct deletes don't get accounted for when
regenerating. Oh well. I guess it was correct to document msgmap
as something important to backup and not break for instances of
particular servers. (emphasis on "particular servers")
So I think you'd need to walk revision history twice to account
for deleted messages...
Across different machines, it should not matter to preserve
serials.
> I definitely do not like not being able to reconstruct message numbers
> from a backup.
For v2, I see serial numbers are an internal optimization which
happens to map to NNTP.
If the git repo is cloned and the cloner sets up a different
server, it'll have a different address and clients won't know to
deduplicate them anyways. I suppose it makes the load-balanced
case a little more complex to sync(*)
And this can't even account for independently started mirrors
with no common git ancestry, as SMTP has zero guarantees on
ordering.
(*) But optimizing for load-balanced instances isn't ideal,
I'd rather see more independently-run servers than giant
load-balanced instances which everybody relies on.
next prev parent reply other threads:[~2018-07-13 22:22 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 20:01 Q: V2 format Eric W. Biederman
2018-07-11 21:18 ` Konstantin Ryabitsev
2018-07-11 21:41 ` Eric W. Biederman
2018-07-12 1:47 ` Eric Wong
2018-07-12 13:58 ` Eric W. Biederman
2018-07-12 23:09 ` Eric Wong
2018-07-13 13:39 ` Eric W. Biederman
2018-07-13 20:03 ` Eric W. Biederman
2018-07-13 22:22 ` Eric Wong [this message]
2018-07-14 19:01 ` msgmap serial number regeneration [was: Q: V2 format] Eric W. Biederman
2018-07-15 3:18 ` Eric Wong
2018-07-16 15:20 ` Eric W. Biederman
2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong
2018-07-13 22:51 ` Eric W. Biederman
2018-07-14 0:46 ` [PATCH] v2writable: unindex deleted messages after incremental fetch Eric Wong
2018-07-13 23:07 ` IMAP server [was: Q: V2 format] Eric Wong
2018-07-13 23:12 ` Eric W. Biederman
2018-09-28 20:10 ` Johannes Berg
2018-09-28 21:01 ` Eric W. Biederman
2018-10-01 7:46 ` Johannes Berg
2018-10-01 8:51 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180713222200.GB27845@dcvr \
--to=e@80x24.org \
--cc=ebiederm@xmission.com \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).