From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 1088F1F5AE for ; Fri, 24 Jul 2020 05:56:07 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/20] indexing changes and new features Date: Fri, 24 Jul 2020 05:55:46 +0000 Message-Id: <20200724055606.27332-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: --rethread and --no-sync options are now supported in public-inbox-index. --no-sync should be nice for users of FSes with poor fsync(2) performance. Now: I also wonder if --no-sync is a bad name since we also use it for to mean synchronising indices. Perhaps --no-fsync would be a better name, though technically SQLite and Xapian use fdatasync(2), nowadays. Some of this is prep work for exposing THREADID via IMAP (and JMAP) to aid in searching. Since THREADID (`over.tid') will be exposed in a user-visible way, I'm finally giving up on using the default (reverse chronological) log order for indexing to ensure THREADID ascends for newer threads. This also simplifies the indexing code significantly. To avoid pinning huge amounts of RAM, the working space is held in a IdxStack temporary file. This further simplifies our code since we no longer have to worry about old that did not use Xapian w/o FD_CLOEXEC. There's still more work on the horizon, here... Eric Wong (20): index: support --rethread switch to fix old indices v2: index forwards (via `git log --reverse') v2writable: introduce idx_stack v2writable: index_sync: reduce fill_alternates calls v2writable: move {autime} and {cotime} into $sync state v2writable: allow >= 40 byte git object IDs v2writable: drop "EPOCH.git indexing $RANGE" progress message use consistent {ibx} field for writable code paths search: avoid copying {inboxdir} v2writable: use read-only PublicInbox::Git for cat_file v2writable: get rid of {reindex_pipe} field v2writable: clarify "epoch" for {last_commits} xapcmd: set {from} properly for v1 inboxes searchidx: rename _xdb_{acquire,release} => idx_ searchidx: make v1 indexing closer to v2 index+xcpdb: support --no-sync flag v2writable: share log2stack code with v1 searchidx: support async git check searchidx: $batch_cb => v1_checkpoint v2writable: {unindexed} belongs in $sync state Documentation/public-inbox-index.pod | 30 +- Documentation/public-inbox-xcpdb.pod | 6 + MANIFEST | 3 +- lib/PublicInbox/Git.pm | 72 ++++- lib/PublicInbox/IdxStack.pm | 52 ++++ lib/PublicInbox/Import.pm | 6 +- lib/PublicInbox/Msgmap.pm | 21 +- lib/PublicInbox/MultiMidQueue.pm | 62 ---- lib/PublicInbox/Over.pm | 1 + lib/PublicInbox/OverIdx.pm | 78 ++++- lib/PublicInbox/Search.pm | 25 +- lib/PublicInbox/SearchIdx.pm | 384 ++++++++++++------------ lib/PublicInbox/SearchIdxShard.pm | 12 +- lib/PublicInbox/Smsg.pm | 8 +- lib/PublicInbox/V2Writable.pm | 427 +++++++++------------------ lib/PublicInbox/Xapcmd.pm | 10 +- script/public-inbox-index | 5 +- script/public-inbox-xcpdb | 4 +- t/idx_stack.t | 56 ++++ t/inbox_idle.t | 4 +- t/search.t | 4 +- t/v1reindex.t | 36 ++- t/v2reindex.t | 45 +++ 23 files changed, 744 insertions(+), 607 deletions(-) create mode 100644 lib/PublicInbox/IdxStack.pm delete mode 100644 lib/PublicInbox/MultiMidQueue.pm create mode 100644 t/idx_stack.t