unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: "Eric Wong (Contractor, The Linux Foundation)" <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 05/27] searchidx: do not delete documents while iterating
Date: Mon, 19 Mar 2018 08:14:37 +0000	[thread overview]
Message-ID: <20180319081459.10645-6-e@80x24.org> (raw)
In-Reply-To: <20180319081459.10645-1-e@80x24.org>

Followup-to: ebb59815035b42c2
  ("searchidx: do not modify Xapian DB while iterating")
---
 lib/PublicInbox/SearchIdx.pm | 50 +++++++++++++++++++++++---------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 725bbd8..ccec018 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -406,29 +406,38 @@ sub add_message {
 	$doc_id;
 }
 
-# returns deleted doc_id on success, undef on missing
+sub batch_do {
+	my ($self, $termval, $cb) = @_;
+	my $batch_size = 1000; # don't let @ids grow too large to avoid OOM
+	while (1) {
+		my ($head, $tail) = $self->find_doc_ids($termval);
+		return if $head == $tail;
+		my @ids;
+		for (; $head != $tail && @ids < $batch_size; $head->inc) {
+			push @ids, $head->get_docid;
+		}
+		$cb->(\@ids);
+	}
+}
+
 sub remove_message {
 	my ($self, $mid) = @_;
 	my $db = $self->{xdb};
-	my $doc_id;
+	my $called;
 	$mid = mid_clean($mid);
 
 	eval {
-		my ($head, $tail) = $self->find_doc_ids('Q' . $mid);
-		if ($head->equal($tail)) {
-			warn "cannot remove non-existent <$mid>\n";
-		}
-		for (; $head != $tail; $head->inc) {
-			my $docid = $head->get_docid;
-			$db->delete_document($docid);
-		}
+		batch_do($self, 'Q' . $mid, sub {
+			my ($ids) = @_;
+			$db->delete_document($_) for @$ids;
+			$called = 1;
+		});
 	};
-
 	if ($@) {
 		warn "failed to remove message <$mid>: $@\n";
-		return undef;
+	} elsif (!$called) {
+		warn "cannot remove non-existent <$mid>\n";
 	}
-	$doc_id;
 }
 
 sub term_generator { # write-only
@@ -795,22 +804,15 @@ sub merge_threads {
 	my ($self, $winner_tid, $loser_tid) = @_;
 	return if $winner_tid == $loser_tid;
 	my $db = $self->{xdb};
-
-	my $batch_size = 1000; # don't let @ids grow too large to avoid OOM
-	while (1) {
-		my ($head, $tail) = $self->find_doc_ids('G' . $loser_tid);
-		return if $head == $tail;
-		my @ids;
-		for (; $head != $tail && @ids < $batch_size; $head->inc) {
-			push @ids, $head->get_docid;
-		}
-		foreach my $docid (@ids) {
+	batch_do($self, 'G' . $loser_tid, sub {
+		my ($ids) = @_;
+		foreach my $docid (@$ids) {
 			my $doc = $db->get_document($docid);
 			$doc->remove_term('G' . $loser_tid);
 			$doc->add_boolean_term('G' . $winner_tid);
 			$db->replace_document($docid, $doc);
 		}
-	}
+	});
 }
 
 sub _read_git_config_perm {
-- 
EW


  parent reply	other threads:[~2018-03-19  8:15 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19  8:14 [PATCH 00/27] v2 public-inbox-watch support Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 01/27] content_id: use Sender header if From is not available Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 02/27] v2writable: support "barrier" operation to avoid reforking Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 03/27] use string ref for Email::Simple->new Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 04/27] v2writable: remove unnecessary idx_init call Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` Eric Wong (Contractor, The Linux Foundation) [this message]
2018-03-19  8:14 ` [PATCH 06/27] search: allow ->reopen to be chainable Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 07/27] v2writable: implement remove correctly Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 08/27] skeleton: barrier init requires a lock Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 09/27] import: (v2) delete writes the blob into history in subdir Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 10/27] import: (v2): write deletes to a separate '_' subdirectory Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 11/27] import: implement barrier operation for v1 repos Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 12/27] mid: mid_mime uses v2-compatible mids function Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 13/27] watchmaildir: use content_digest to generate Message-Id Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 14/27] import: force Message-ID generation for v1 here Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 15/27] import: switch to URL-safe Base64 for Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 16/27] v2writable: test for idempotent removals Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 17/27] import: enable locking under v2 Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 18/27] index: s/GIT_DIR/REPO_DIR/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 19/27] Lock: new base class for writable lockers Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 20/27] t/watch_maildir: note the reason for FIFO creation Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 21/27] v2writable: ensure ->done is idempotent Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 22/27] watchmaildir: support v2 repositories Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 23/27] searchidxpart: s/barrier/remote_barrier/ Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 24/27] v2writable: allow disabling parallelization Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 25/27] scripts/import_vger_from_mbox: filter out same headers as MDA Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 26/27] v2writable: add DEBUG_DIFF env support Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:14 ` [PATCH 27/27] v2writable: remove "resent" message for duplicate Message-IDs Eric Wong (Contractor, The Linux Foundation)
2018-03-19  8:18   ` SQUASH: " Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180319081459.10645-6-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).