From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id E3B321F404; Tue, 16 Jan 2018 22:36:16 +0000 (UTC) Date: Tue, 16 Jan 2018 22:36:16 +0000 From: Eric Wong To: meta@public-inbox.org Cc: Konstantin Ryabitsev Subject: [PATCH] TODO: notes about v2 format for giant archives Message-ID: <20180116223616.GA18470@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline List-Id: Inspired by interest in LKML archival: https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org --- TODO | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/TODO b/TODO index 3163b8a..605013e 100644 --- a/TODO +++ b/TODO @@ -78,3 +78,34 @@ all need to be considered for everything we introduce) * more and better test cases (use git fast-import to speed up creation) * large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import) + +* Read-only WebDAV interface to the git repo so it can be mounted + via davfs2 or fusedav to avoid full clones. + +* Improve tree layout to help giant archives (v2 format): + + * Must be optional; old ssoma users may continue using v1 + + * Xapian becomes becomes a requirement when using v2; they + claim good scalability: https://xapian.org/docs/scalability.html + + * Allow git to perform better deltafication for quoted messages + + * Changing tree layout for deltafication means we need to handle + deletes for spam differently than we do now. + + * Deal with duplicate Message-IDs (web UI, at least, not sure about NNTP) + + * (Maybe) SQLite alternatives (MySQL/MariaDB/Pg) for NNTP article + number mapping: https://www.sqlite.org/whentouse.html + + * Ref rotation (splitting heads by YYYY or YYYY-MM) + + * Support multiple git repos for a single archive? + This seems gross, but splitting large packs in in git conflicts + with bitmaps and we want to use both features. Perhaps this + limitation can be fixed in git instead of merely being documented: + https://public-inbox.org/git/20160428072854.GA5252@dcvr.yhbt.net/ + + * Optional history squashing to reduce commit and intermediate + tree objects -- EW