From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 2D80B1F406 for ; Tue, 28 Nov 2023 14:56:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1701183388; bh=euWHl10B9hk4rgvAUrTW4fKCxw/NR4HmSg0ask4d9a4=; h=From:To:Subject:Date:From; b=Ao449G0KLrPfLY3xo4PkbuCabJ/UV+ewj6LJvGDy7bJY6Tw4BPaze7aL7v0TAKt74 T0MQJqVJnp4ihWz/FNDJnvEeCB6wN34StijR2IKM2gY1fyFVNeX3mGVfZ0eK2r9uew g8aKg5jfp0XfXw1cgkrkQlwOY5G/9oLUPnGzNdnU= From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/14] IT'S ALIVE! www loads cindex join data Date: Tue, 28 Nov 2023 14:56:13 +0000 Message-ID: <20231128145628.1455176-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: 8/14 is the killer one which actually makes the cindex data useful for WWW and powering solver. Keep in mind, I've had to cap solver at 3 coderepos as a temporary measure since there's a lot of "weak" joins we should be weeding out. More documentation coming, but cindex joins are very much a fuzzy thing which will have to deal with false positives and such. So figuring out the scoring for sanity would make sense... Fortunately, --join=aggressive,reset only takes ~1 hour for me, so probably 1/3 that on modern hardware. Incremental `-cindex --join' (no suboptions) usually takes <5 minutes if done frequently. New performance problem: solver could definitely be smarter about dealing with common roots/groups. For the longest time, I've only had 1 coderepo per-inbox, having hundreds is wacky. Actual searching against the cindex isn't done, yet, but that's kinda straightforward. Eric Wong (14): test_common: create_*: detect changes all parameters t/cindex*: require SCM_RIGHTS for these tests codesearch: eliminate redundant substitutions solver: schedule cleanup after synchronous git->check xap_helper.h: move cindex endpoints to separate file xap_helper: implement mset endpoint for WWW, IMAP, etc... hval: use File::Spec to make relative paths for href www: load and use cindex join data git: speed up ->git_path for non-worktrees cindex: require `-g GIT_DIR' or `-r PROJECT_ROOT' git: speed up Git->new by 5% or so admin: resolve_git_dir respects symlinks cindex: extra quit checks www: start working on a repo listing Documentation/public-inbox-cindex.pod | 2 +- MANIFEST | 3 + Makefile.PL | 8 +- lib/PublicInbox/Admin.pm | 25 +- lib/PublicInbox/CodeSearch.pm | 162 ++++++++++- lib/PublicInbox/CodeSearchIdx.pm | 52 ++-- lib/PublicInbox/Config.pm | 39 ++- lib/PublicInbox/Git.pm | 27 +- lib/PublicInbox/Hval.pm | 12 +- lib/PublicInbox/RepoList.pm | 39 +++ lib/PublicInbox/Search.pm | 42 +++ lib/PublicInbox/SearchIdx.pm | 10 +- lib/PublicInbox/SolverGit.pm | 9 +- lib/PublicInbox/TestCommon.pm | 35 ++- lib/PublicInbox/View.pm | 7 +- lib/PublicInbox/WWW.pm | 1 + lib/PublicInbox/WwwCoderepo.pm | 44 ++- lib/PublicInbox/WwwStream.pm | 11 +- lib/PublicInbox/WwwText.pm | 19 +- lib/PublicInbox/XapHelper.pm | 51 ++-- lib/PublicInbox/XapHelperCxx.pm | 14 +- lib/PublicInbox/xap_helper.h | 379 +++++++------------------- lib/PublicInbox/xh_cidx.h | 244 +++++++++++++++++ lib/PublicInbox/xh_mset.h | 96 +++++++ script/public-inbox-cindex | 38 ++- t/admin.t | 12 + t/cindex-join.t | 9 +- t/cindex.t | 91 ++++++- t/xap_helper.t | 53 +++- xt/solver.t | 3 +- 30 files changed, 1111 insertions(+), 426 deletions(-) create mode 100644 lib/PublicInbox/RepoList.pm create mode 100644 lib/PublicInbox/xh_cidx.h create mode 100644 lib/PublicInbox/xh_mset.h