From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 087751FA12 for ; Sun, 25 Jul 2021 00:11:04 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 3/3] extsearchidx: use more appropriate max for dedupe Date: Sun, 25 Jul 2021 00:11:03 +0000 Message-Id: <20210725001103.11638-4-e@80x24.org> In-Reply-To: <20210725001103.11638-1-e@80x24.org> References: <20210725001103.11638-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: The over.msgid table may contain ghost Message-IDs and also Message-IDs of deleted spam messages, so over->max isn't a good aproproximation of dedupe progress. --- lib/PublicInbox/ExtSearchIdx.pm | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm index 1c2a9758..51dbf54f 100644 --- a/lib/PublicInbox/ExtSearchIdx.pm +++ b/lib/PublicInbox/ExtSearchIdx.pm @@ -896,7 +896,10 @@ sub eidx_dedupe ($$$) { my ($iter, $cur_mid); my $min_id = 0; my $idx = 0; - local $sync->{-regen_fmt} = "dedupe %u/".$self->{oidx}->max."\n"; + my ($max_id) = $self->{oidx}->dbh->selectrow_array(<{-regen_fmt} = "dedupe %u/$max_id\n"; # note: we could write this query more intelligently, # but that causes lock contention with read-only processes