From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <e@80x24.org> X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 9E3C01F51A for <meta@public-inbox.org>; Fri, 8 Dec 2023 03:54:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1702007679; bh=DSmr8+Cswo0L+Nw+0UforDfTIUFOaskR1M9u/6gGjkc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=TTvosPFE1nXt+KJHoXwE6mYZZsApy8yFre5mnBLblcGqGPTNR7yvPnvoYNQDve1BP tM7WtJeotPktGWIOcMMCDG8V+c1MOCZ73xk9HyRyPHrPJO3KHDMiMRmTbi3B57gU5C iWhmPRBXTkHsQI4mKq+fV/YcQ0MXisuaNbdmDsdA= From: Eric Wong <e@80x24.org> To: meta@public-inbox.org Subject: [PATCH 6/6] cindex: switch --join to use dfpost7 by default Date: Fri, 8 Dec 2023 03:54:38 +0000 Message-ID: <20231208035438.3710696-7-e@80x24.org> In-Reply-To: <20231208035438.3710696-1-e@80x24.org> References: <20231208035438.3710696-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: <meta.public-inbox.org> Post-image blob OIDs are what solver already works with, and longer OIDs may not be available in historical mail archives. `patchid' turns out to be unsuitable since: 1) git's default diff algorithm has changed over time 2) users may use different diff options to improve readability Of course, we could eventually run `lei rediff' during the index phase to regenerate patchids, but that's out-of-scope for now and likely to be too expensive. --- lib/PublicInbox/CodeSearchIdx.pm | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/CodeSearchIdx.pm b/lib/PublicInbox/CodeSearchIdx.pm index 967933f2..5d420de2 100644 --- a/lib/PublicInbox/CodeSearchIdx.pm +++ b/lib/PublicInbox/CodeSearchIdx.pm @@ -34,9 +34,9 @@ # The $IBX_OFF here is ephemeral (per-join_data) and NOT related to # the `ibx_off' column of `over.sqlite3' for extindex. # @ROOT_COMMIT_OID_OFFS is space-delimited -# In both cases, $PFX is typically the value of the patchid (XDFID) but it -# can be configured to use any combination of patchid, dfpre, dfpost or -# dfblob. +# In both cases, $PFX is typically the value of the 7-(hex)char dfpost +# XDFPOST but it can be configured to use any combination of patchid, +# dfpre, dfpost or dfblob. # # WARNING: this is vulnerable to arbitrary memory usage attacks if we # attempt to index or join against malicious coderepos with @@ -1199,11 +1199,13 @@ sub init_join_prefork ($) { require PublicInbox::CidxXapHelperAux; require PublicInbox::XapClient; my @unknown; - my $pfx = $JOIN{prefixes} // 'patchid'; - for (split /\+/, $pfx) { - my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$_} // - push(@unknown, $_); - push(@JOIN_PFX, split(/ /, $v)); + my $pfx = $JOIN{prefixes} // 'dfpost7'; + for my $p (split /\+/, $pfx) { + my $n = ''; + $p =~ s/([0-9]+)\z// and $n = $1; + my $v = $PublicInbox::Search::PATCH_BOOL_COMMON{$p} // + push(@unknown, $p); + push(@JOIN_PFX, map { $_.$n } split(/ /, $v)); } @unknown and die <<EOM; E: --join=prefixes= contains unsupported prefixes: @unknown