unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 09/27] import: (v2) delete writes the blob into history in subdir
Date: Mon, 19 Mar 2018 08:14:41 +0000	[thread overview]
Message-ID: <20180319081459.10645-10-e@80x24.org> (raw)
In-Reply-To: <20180319081459.10645-1-e@80x24.org>

This makes it easier to audit deletes with "git log -p" and
prevents an unstable specification of "content_id" from being
stored in history.

This should be cost-free if done in the same partition (and even
cheaper than before as it introduces no new blobs).  It does
have a higher cost across partitions, but is probably irrelevant
given the typical ham:spam ratio.
---
 lib/PublicInbox/Import.pm     | 15 ++++++++++-----
 lib/PublicInbox/V2Writable.pm |  4 +++-
 t/v2writable.t                |  9 +++++++++
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index e20c6e0..94a49fe 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -11,7 +11,6 @@ use Fcntl qw(:flock :DEFAULT);
 use PublicInbox::Spawn qw(spawn);
 use PublicInbox::MID qw(mid_mime mid2path);
 use PublicInbox::Address;
-use PublicInbox::ContentId qw(content_id);
 use PublicInbox::MsgTime qw(msg_timestamp);
 
 sub new {
@@ -163,7 +162,6 @@ sub get_mark {
 # ('MISMATCH', Email::MIME) on mismatch
 # (:MARK, Email::MIME) on success
 #
-# For v2 inboxes, the content_id is returned instead of the msg
 # v2 callers should check with Xapian before calling this as
 # it is not idempotent.
 sub remove {
@@ -179,10 +177,17 @@ sub remove {
 		($err, $cur) = check_remove_v1($r, $w, $tip, $path, $mime);
 		return ($err, $cur) if $err;
 	} else {
-		$cur = content_id($mime);
-		my $len = length($cur);
+		my $sref;
+		if (ref($mime) eq 'SCALAR') { # optimization used by V2Writable
+			$sref = $mime;
+		} else { # XXX should not be necessary:
+			my $str = $mime->as_string;
+			$sref = \$str;
+		}
+		my $len = length($$sref);
 		$blob = $self->{mark}++;
-		print $w "blob\nmark :$blob\ndata $len\n$cur\n" or wfail;
+		print $w "blob\nmark :$blob\ndata $len\n",
+			$$sref, "\n" or wfail;
 	}
 
 	my $ref = $self->{ref};
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 656f069..fd9bf61 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -220,6 +220,7 @@ sub remove {
 				warn "broken smsg for $mid\n";
 				return 1; # continue
 			}
+			my $orig = $$msg;
 			my $cur = PublicInbox::MIME->new($msg);
 			if (content_id($cur) eq $cid) {
 				$mm->num_delete($smsg->num);
@@ -227,7 +228,8 @@ sub remove {
 				# no bugs in our deduplication code:
 				$removed = $smsg;
 				$removed->{mime} = $cur;
-				$im->remove($cur, $cmt_msg);
+				$im->remove(\$orig, $cmt_msg);
+				$orig = undef;
 				$removed->num; # memoize this for callers
 
 				my $oid = $smsg->{blob};
diff --git a/t/v2writable.t b/t/v2writable.t
index 6e37b72..a5c982e 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -191,6 +191,7 @@ EOF
 {
 	local $ENV{NPROC} = 2;
 	my @before = $git0->qx(qw(log --pretty=oneline));
+	my $before = $git0->qx(qw(log --pretty=raw --raw -r --no-abbrev));
 	$im = PublicInbox::V2Writable->new($ibx, 1);
 	is($im->{partitions}, 1, 'detected single partition from previous');
 	my $smsg = $im->remove($mime, 'test removal');
@@ -207,6 +208,14 @@ EOF
 	my @found = ();
 	$srch->each_smsg_by_mid($smsg->mid, sub { push @found, @_; 1 });
 	is(scalar(@found), 0, 'no longer found in Xapian skeleton');
+
+	my $after = $git0->qx(qw(log -1 --pretty=raw --raw -r --no-abbrev));
+	if ($after =~ m!( [a-f0-9]+ )A\td$!) {
+		my $oid = $1;
+		ok(index($before, $oid) > 0, 'no new blob introduced');
+	} else {
+		fail('failed to extract blob from log output');
+	}
 }
 
 done_testing();
-- 
EW


  parent reply	other threads:[~2018-03-19  8:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19  8:14 [PATCH 00/27] v2 public-inbox-watch support Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 01/27] content_id: use Sender header if From is not available Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 02/27] v2writable: support "barrier" operation to avoid reforking Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 03/27] use string ref for Email::Simple->new Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 04/27] v2writable: remove unnecessary idx_init call Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 05/27] searchidx: do not delete documents while iterating Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 06/27] search: allow ->reopen to be chainable Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 07/27] v2writable: implement remove correctly Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 08/27] skeleton: barrier init requires a lock Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-19  8:14 ` [PATCH 10/27] import: (v2): write deletes to a separate '_' subdirectory Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 11/27] import: implement barrier operation for v1 repos Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 12/27] mid: mid_mime uses v2-compatible mids function Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 13/27] watchmaildir: use content_digest to generate Message-Id Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 14/27] import: force Message-ID generation for v1 here Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 15/27] import: switch to URL-safe Base64 for Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 16/27] v2writable: test for idempotent removals Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 17/27] import: enable locking under v2 Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 18/27] index: s/GIT_DIR/REPO_DIR/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 19/27] Lock: new base class for writable lockers Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 20/27] t/watch_maildir: note the reason for FIFO creation Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 21/27] v2writable: ensure ->done is idempotent Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 22/27] watchmaildir: support v2 repositories Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 23/27] searchidxpart: s/barrier/remote_barrier/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 24/27] v2writable: allow disabling parallelization Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 25/27] scripts/import_vger_from_mbox: filter out same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 26/27] v2writable: add DEBUG_DIFF env support Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 27/27] v2writable: remove "resent" message for duplicate Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:18   ` SQUASH: " Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180319081459.10645-10-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).