From: Eric Wong <e@80x24.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: meta@public-inbox.org
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch
Date: Sat, 14 Jul 2018 00:46:01 +0000 [thread overview]
Message-ID: <20180714004601.x2xlmdxv5ahfqtwz@dcvr> (raw)
In-Reply-To: <20180713220259.GA27845@dcvr>
Eric Wong <e@80x24.org> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > Eric Wong <e@80x24.org> writes:
> > > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > >> Then I am going to report a probable bug. In V2 in public-inbox-index
> > >> I can not find a path from finding a 'd' file and a call to unindex. V1
> > >> unindexes deleted files. Rebased heads for purges call unindex. I
> > >> don't see that for ordinary d files though.
> > >
> > > It shouldn't need to call unindex because they never get indexed
> > > on rebuilds. V2 indexing walks history backwards (normal "git log"
> > > behavior) so it remembers 'd' paths in the "$D" hash; and skips blobs
> > > as it encounters them.
> > >
> > > v1 needed to unindex because it used "git log --reverse" to walk
> > > forward in history.
> >
> > This assumes that you see them in the same git pull. I would think
> > ideally anything that is going to be deleted that quickly you can just
> > skip archiving.
> >
> > What is the time window of you expecting 'd' messages to appear?
>
> Ah, this is definitely a bug when using incremental fetch + -index.
> Right now, it only warns on unseen entries in $D but won't reach
> beyond the current "git log" window.
The following should fix it, thanks for the bug report.
-------8<-------
Subject: [PATCH] v2writable: unindex deleted messages after incremental fetch
The normal behavior is to prevent the deleted messages from
being indexed in the first place. However, when fetching
incrementally via git; public-inbox-index needs to account for
deleted files which were created outside of the most recent
fetch/reindexing window.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
---
lib/PublicInbox/V2Writable.pm | 20 ++++++++++----------
t/v2mirror.t | 28 +++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 412eb6a..934640e 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -653,7 +653,7 @@ sub mark_deleted {
my $mids = mids($mime->header_obj);
my $cid = content_id($mime);
foreach my $mid (@$mids) {
- $D->{"$mid\0$cid"} = 1;
+ $D->{"$mid\0$cid"} = $oid;
}
}
@@ -671,7 +671,7 @@ sub reindex_oid {
my $num = -1;
my $del = 0;
foreach my $mid (@$mids) {
- $del += (delete $D->{"$mid\0$cid"} || 0);
+ $del += delete($D->{"$mid\0$cid"}) ? 1 : 0;
my $n = $mm_tmp->num_for($mid);
if (defined $n && $n > $num) {
$mid0 = $mid;
@@ -882,7 +882,7 @@ sub index_sync {
my ($min, $max) = $mm_tmp->minmax;
my $regen = $self->index_prepare($opts, $epoch_max, $ranges);
$$regen += $max if $max;
- my $D = {};
+ my $D = {}; # "$mid\0$cid" => $oid
my @cmd = qw(log --raw -r --pretty=tformat:%H
--no-notes --no-color --no-abbrev --no-renames);
@@ -912,13 +912,13 @@ sub index_sync {
delete $self->{reindex_pipe};
$self->update_last_commit($git, $i, $cmt) if defined $cmt;
}
- my @d = sort keys %$D;
- if (@d) {
- warn "BUG: ", scalar(@d)," unseen deleted messages marked\n";
- foreach (@d) {
- my ($mid, undef) = split(/\0/, $_, 2);
- warn "<$mid>\n";
- }
+
+ # unindex is required for leftovers if "deletes" affect messages
+ # in a previous fetch+index window:
+ if (scalar keys %$D) {
+ my $git = $self->{-inbox}->git;
+ $self->unindex_oid($git, $_) for values %$D;
+ $git->cleanup;
}
$self->done;
}
diff --git a/t/v2mirror.t b/t/v2mirror.t
index c0c329c..f95ad0f 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -182,7 +182,33 @@ is($mibx->git->check($to_purge), undef, 'unindex+prune successful in mirror');
is_deeply(\@warn, [], 'no warnings from index_sync after purge');
}
-$v2w->done;
+# deletes happen in a different fetch window
+{
+ $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+ is(scalar($mset->items), 1, '1@example.com visible in mirror');
+ $mime->header_set('Message-ID', '<1@example.com>');
+ $mime->header_set('Subject', 'subject = 1');
+ ok($v2w->remove($mime), 'removed <1@example.com> from source');
+ $v2w->done;
+ fetch_each_epoch();
+
+ open my $err, '+>', "$tmpdir/index-err" or die "open: $!";
+ my $ipid = fork;
+ if ($ipid == 0) {
+ dup2(fileno($err), 2) or die "dup2 failed: $!";
+ exec("$script-index", "$tmpdir/m");
+ die "exec fail: $!";
+ }
+ ok($ipid, 'running index');
+ is(waitpid($ipid, 0), $ipid, 'index done');
+ is($?, 0, 'no error from index');
+ ok(seek($err, 0, 0), 'rewound stderr');
+ $err = eval { local $/; <$err> };
+ is($err, '', 'no errors reported by index');
+ $mset = $mibx->search->reopen->query('m:1@example.com', {mset => 1});
+ is(scalar($mset->items), 0, '1@example.com no longer visible in mirror');
+}
+
ok(kill('TERM', $pid), 'killed httpd');
$pid = undef;
waitpid(-1, 0);
--
EW
next prev parent reply other threads:[~2018-07-14 0:46 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 20:01 Q: V2 format Eric W. Biederman
2018-07-11 21:18 ` Konstantin Ryabitsev
2018-07-11 21:41 ` Eric W. Biederman
2018-07-12 1:47 ` Eric Wong
2018-07-12 13:58 ` Eric W. Biederman
2018-07-12 23:09 ` Eric Wong
2018-07-13 13:39 ` Eric W. Biederman
2018-07-13 20:03 ` Eric W. Biederman
2018-07-13 22:22 ` msgmap serial number regeneration [was: Q: V2 format] Eric Wong
2018-07-14 19:01 ` Eric W. Biederman
2018-07-15 3:18 ` Eric Wong
2018-07-16 15:20 ` Eric W. Biederman
2018-07-13 22:02 ` bug: v2 deletes on incremental fetch " Eric Wong
2018-07-13 22:51 ` Eric W. Biederman
2018-07-14 0:46 ` Eric Wong [this message]
2018-07-13 23:07 ` IMAP server " Eric Wong
2018-07-13 23:12 ` Eric W. Biederman
2018-09-28 20:10 ` Johannes Berg
2018-09-28 21:01 ` Eric W. Biederman
2018-10-01 7:46 ` Johannes Berg
2018-10-01 8:51 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180714004601.x2xlmdxv5ahfqtwz@dcvr \
--to=e@80x24.org \
--cc=ebiederm@xmission.com \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).