From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Subject: [PATCH] TODO: notes about v2 format for giant archives
Date: Tue, 16 Jan 2018 22:36:16 +0000 [thread overview]
Message-ID: <20180116223616.GA18470@80x24.org> (raw)
Inspired by interest in LKML archival:
https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org
---
TODO | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/TODO b/TODO
index 3163b8a..605013e 100644
--- a/TODO
+++ b/TODO
@@ -78,3 +78,34 @@ all need to be considered for everything we introduce)
* more and better test cases (use git fast-import to speed up creation)
* large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import)
+
+* Read-only WebDAV interface to the git repo so it can be mounted
+ via davfs2 or fusedav to avoid full clones.
+
+* Improve tree layout to help giant archives (v2 format):
+
+ * Must be optional; old ssoma users may continue using v1
+
+ * Xapian becomes becomes a requirement when using v2; they
+ claim good scalability: https://xapian.org/docs/scalability.html
+
+ * Allow git to perform better deltafication for quoted messages
+
+ * Changing tree layout for deltafication means we need to handle
+ deletes for spam differently than we do now.
+
+ * Deal with duplicate Message-IDs (web UI, at least, not sure about NNTP)
+
+ * (Maybe) SQLite alternatives (MySQL/MariaDB/Pg) for NNTP article
+ number mapping: https://www.sqlite.org/whentouse.html
+
+ * Ref rotation (splitting heads by YYYY or YYYY-MM)
+
+ * Support multiple git repos for a single archive?
+ This seems gross, but splitting large packs in in git conflicts
+ with bitmaps and we want to use both features. Perhaps this
+ limitation can be fixed in git instead of merely being documented:
+ https://public-inbox.org/git/20160428072854.GA5252@dcvr.yhbt.net/
+
+ * Optional history squashing to reduce commit and intermediate
+ tree objects
--
EW
next reply other threads:[~2018-01-16 22:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-16 22:36 Eric Wong [this message]
2018-02-08 3:09 ` [PATCH] TODO: notes about v2 format for giant archives Eric Wong
2018-02-08 4:05 ` Konstantin Ryabitsev
2018-02-08 17:08 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180116223616.GA18470@80x24.org \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).