From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 841301F428 for ; Tue, 21 Mar 2023 23:07:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1679440021; bh=KTl1ezxliRPB8RJ+4bQ4AMqezDj4CU2vvEG0/DtU0Nc=; h=From:To:Subject:Date:From; b=BJDoBo3REY7VUHT0pinXffaXlEnSHrjVgdKe9wNxbYY51jjj7TIwNpCvUwRQ/RgIA 63X9Qac2LmwR66VQ96THNeO9ulDgjskuPoo8YpvIrwDYmoOYhLybUp9TPrNzf1+aYd tfLvZ7CQ1lY4/nrSl4W6/rGRPIOB7jTF3bDoM98U= From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/28] cindex coderepo commit indexer Date: Tue, 21 Mar 2023 23:07:01 +0000 Message-Id: <20230321230701.3019936-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Not wired up to WWW nor lei, yet; but indexing + pruning of commits works. I'm not sure if indexing (root) tree OIDs or committer names+emails is worth it, since I don't think those are very important terms to search for. I first wanted to shoehorn this into extindex, but I think it works better as a separate Xapian schema. It allows both internal indexes ($GIT_DIR/public-inbox-cindex) for unforked repos, as well as extindex-style external index to encompass several projects. The indexer is structured a bit more nicely than existing indexers since I'm relying on OnDestroy and `local', more. I would like to trickle some of these improvements back to the mail indexers at some point. --prune and --reindex currently block incremental updates, which isn't great since both take a while for giant Xapian DBs. Pruning is pretty important since it's much common for coderepos (e.g. `seen' branch of git.git) `lei cq' will probably be a new command which behaves similarly to `lei q -f text', but takes `git log' options for output... Eric Wong (28): ipc: move nproc_shards from v2writable search: relocate all_terms from lei_search admin: hoist out resolve_git_dir admin: ensure resolved GIT_DIR is absolute test_common: create_inbox: use `$!' properly on mkdir failure codesearch: initial cut w/ -cindex tool cindex: parallelize prep phases cindex: use read-only shards during prep phases searchidxshard: improve comment wording cindex: use DS and workqueues for parallelism ds: @post_loop_do replaces SetPostLoopCallback cindex: implement --exclude= like -clone cindex: show shard number in progress message cindex: drop `unchanged' progress message cindex: handle graceful shutdown by default sigfd: pass signal name rather than number to callback cindex: implement --max-size=SIZE cindex: check for checkpoint before giant messages cindex: truncate or drop body for over-sized commits cindex: attempt to give oldest commits lowest docids cindex: improve granularity of quit checks spawn: show failing directory for chdir failures cindex: filter out non-existent git directories cindex: add support for --prune cindex: implement reindex cindex: squelch incompatible options cindex: respect existing permissions cindex: ignore SIGPIPE MANIFEST | 4 + lib/PublicInbox/Admin.pm | 18 +- lib/PublicInbox/CodeSearch.pm | 121 +++++ lib/PublicInbox/CodeSearchIdx.pm | 835 ++++++++++++++++++++++++++++++ lib/PublicInbox/Config.pm | 2 +- lib/PublicInbox/DS.pm | 30 +- lib/PublicInbox/Daemon.pm | 4 +- lib/PublicInbox/ExtSearchIdx.pm | 2 +- lib/PublicInbox/IPC.pm | 33 +- lib/PublicInbox/LEI.pm | 4 +- lib/PublicInbox/LeiSearch.pm | 14 - lib/PublicInbox/MiscIdx.pm | 2 +- lib/PublicInbox/Search.pm | 77 ++- lib/PublicInbox/SearchIdx.pm | 88 ++-- lib/PublicInbox/SearchIdxShard.pm | 7 +- lib/PublicInbox/Sigfd.pm | 10 +- lib/PublicInbox/Spawn.pm | 6 +- lib/PublicInbox/SpawnPP.pm | 2 +- lib/PublicInbox/TestCommon.pm | 47 +- lib/PublicInbox/V2Writable.pm | 26 +- lib/PublicInbox/Watch.pm | 2 +- script/public-inbox-cindex | 86 +++ script/public-inbox-convert | 2 +- t/cindex.t | 134 +++++ t/dir_idle.t | 6 +- t/ds-leak.t | 8 +- t/imapd.t | 6 +- t/nntpd.t | 2 +- t/sigfd.t | 7 +- t/watch_maildir.t | 8 +- xt/mem-imapd-tls.t | 7 +- xt/mem-nntpd-tls.t | 8 +- xt/net_writer-imap.t | 4 +- 33 files changed, 1424 insertions(+), 188 deletions(-) create mode 100644 lib/PublicInbox/CodeSearch.pm create mode 100644 lib/PublicInbox/CodeSearchIdx.pm create mode 100755 script/public-inbox-cindex create mode 100644 t/cindex.t