From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 67D6C1F4D7 for ; Wed, 11 Dec 2024 08:10:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1733904648; bh=rWpBT3GZtQTBNL5ectgAOZs98grbyfUWTFm3iI70xH4=; h=From:To:Subject:Date:In-Reply-To:References:From; b=4sbsL8I9DKP2v8Tm2Um6u7aOHc3qfGJ2f8VoBeILaLCL6ptYzWWrh418TX3geOENA YregorCF4TbWo8w9hfxUXie9n4E96eOqKadfE+z7wqOJuFPhhDY4chOjQ9TyVmrpOG s5EGJ8auT4PWy/yZ1zmXT0dj6pzEj8Q/LkjM7uGs= From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 3/4] cindex: adjust estimated memory cost for deletes Date: Wed, 11 Dec 2024 08:10:46 +0000 Message-ID: <20241211081047.1267062-4-e@80x24.org> In-Reply-To: <20241211081047.1267062-1-e@80x24.org> References: <20241211081047.1267062-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Based on my conversations with the Xapian lead, the cost of deletes were overestimated by 7x in cindex. Adjust the estimate cost of a deleted document to a more reasonable number based on calculations discussed on the xapian-discuss list. In any case, all of our batch size memory cost estimates are rough since since Xapian provides no way of letting us know the memory cost of the current transaction. --- lib/PublicInbox/CodeSearchIdx.pm | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm index 13533a00..8b5f5ad0 100644 --- a/lib/PublicInbox/CodeSearchIdx.pm +++ b/lib/PublicInbox/CodeSearchIdx.pm @@ -813,11 +813,18 @@ sub prune_init { # via wq_io_do in IDX_SHARDS $self->begin_txn_lazy; } +# <20230504084559.M203335@dcvr> thread on xapian-discuss@lists.xapian.org +# discusses getting an estimate term length to multiply the get_doclength() +# result to estimate memory use of uncommitted deletes. We need to estimate +# length here since the data may no longer be available at all if we get to +# prune_one(). +our $EST_LEN = 6; + sub prune_one { # via wq_io_do in IDX_SHARDS my ($self, $term) = @_; my @docids = $self->docids_by_postlist($term); for (@docids) { - $TXN_BYTES -= $self->{xdb}->get_doclength($_) * 42; + $TXN_BYTES -= $self->{xdb}->get_doclength($_) * $EST_LEN; $self->{xdb}->delete_document($_); } ++$self->{nr_prune};