From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <e@80x24.org>
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN:  
X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,
	T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no
	version=3.4.6
Received: from localhost (dcvr.yhbt.net [127.0.0.1])
	by dcvr.yhbt.net (Postfix) with ESMTP id 9E3C01F51A
	for <meta@public-inbox.org>; Fri,  8 Dec 2023 03:54:39 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org;
	s=selector1; t=1702007679;
	bh=DSmr8+Cswo0L+Nw+0UforDfTIUFOaskR1M9u/6gGjkc=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=TTvosPFE1nXt+KJHoXwE6mYZZsApy8yFre5mnBLblcGqGPTNR7yvPnvoYNQDve1BP
	 tM7WtJeotPktGWIOcMMCDG8V+c1MOCZ73xk9HyRyPHrPJO3KHDMiMRmTbi3B57gU5C
	 iWhmPRBXTkHsQI4mKq+fV/YcQ0MXisuaNbdmDsdA=
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 6/6] cindex: switch --join to use dfpost7 by default
Date: Fri,  8 Dec 2023 03:54:38 +0000
Message-ID: <20231208035438.3710696-7-e@80x24.org>
In-Reply-To: <20231208035438.3710696-1-e@80x24.org>
References: <20231208035438.3710696-1-e@80x24.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
List-Id: <meta.public-inbox.org>

Post-image blob OIDs are what solver already works with, and
longer OIDs may not be available in historical mail archives.

`patchid' turns out to be unsuitable since:
1) git's default diff algorithm has changed over time
2) users may use different diff options to improve readability

Of course, we could eventually run `lei rediff' during the index
phase to regenerate patchids, but that's out-of-scope for now
and likely to be too expensive.
---
 lib/PublicInbox/CodeSearchIdx.pm | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm
index 967933f2..5d420de2 100644
--- a/lib/PublicInbox/CodeSearchIdx.pm
+++ b/lib/PublicInbox/CodeSearchIdx.pm
@@ -34,9 +34,9 @@
 # The $IBX_OFF here is ephemeral (per-join_data) and NOT related to
 # the `ibx_off' column of `over.sqlite3' for extindex.
 # @ROOT_COMMIT_OID_OFFS is space-delimited
-# In both cases, $PFX is typically the value of the patchid (XDFID) but it
-# can be configured to use any combination of patchid, dfpre, dfpost or
-# dfblob.
+# In both cases, $PFX is typically the value of the 7-(hex)char dfpost
+# XDFPOST but it can be configured to use any combination of patchid,
+# dfpre, dfpost or dfblob.
 #
 # WARNING: this is vulnerable to arbitrary memory usage attacks if we
 # attempt to index or join against malicious coderepos with
@@ -1199,11 +1199,13 @@ sub init_join_prefork ($) {
 	require PublicInbox::CidxXapHelperAux;
 	require PublicInbox::XapClient;
 	my @unknown;
-	my $pfx = $JOIN{prefixes} // 'patchid';
-	for (split /\+/, $pfx) {
-		my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$_} //
-			push(@unknown, $_);
-		push(@JOIN_PFX, split(/ /, $v));
+	my $pfx = $JOIN{prefixes} // 'dfpost7';
+	for my $p (split /\+/, $pfx) {
+		my $n = '';
+		$p =~ s/([0-9]+)\z// and $n = $1;
+		my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$p} //
+			push(@unknown, $p);
+		push(@JOIN_PFX, map { $_.$n } split(/ /, $v));
 	}
 	@unknown and die <<EOM;
 E: --join=prefixes= contains unsupported prefixes: @unknown