* [PATCH] TODO: notes about v2 format for giant archives
@ 2018-01-16 22:36 Eric Wong
2018-02-08 3:09 ` Eric Wong
0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2018-01-16 22:36 UTC (permalink / raw)
To: meta; +Cc: Konstantin Ryabitsev
Inspired by interest in LKML archival:
https://public-inbox.org/meta/d5546b24-5840-4ae9-d25b-5e3e737ed73b@linuxfoundation.org
---
TODO | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/TODO b/TODO
index 3163b8a..605013e 100644
--- a/TODO
+++ b/TODO
@@ -78,3 +78,34 @@ all need to be considered for everything we introduce)
* more and better test cases (use git fast-import to speed up creation)
* large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import)
+
+* Read-only WebDAV interface to the git repo so it can be mounted
+ via davfs2 or fusedav to avoid full clones.
+
+* Improve tree layout to help giant archives (v2 format):
+
+ * Must be optional; old ssoma users may continue using v1
+
+ * Xapian becomes becomes a requirement when using v2; they
+ claim good scalability: https://xapian.org/docs/scalability.html
+
+ * Allow git to perform better deltafication for quoted messages
+
+ * Changing tree layout for deltafication means we need to handle
+ deletes for spam differently than we do now.
+
+ * Deal with duplicate Message-IDs (web UI, at least, not sure about NNTP)
+
+ * (Maybe) SQLite alternatives (MySQL/MariaDB/Pg) for NNTP article
+ number mapping: https://www.sqlite.org/whentouse.html
+
+ * Ref rotation (splitting heads by YYYY or YYYY-MM)
+
+ * Support multiple git repos for a single archive?
+ This seems gross, but splitting large packs in in git conflicts
+ with bitmaps and we want to use both features. Perhaps this
+ limitation can be fixed in git instead of merely being documented:
+ https://public-inbox.org/git/20160428072854.GA5252@dcvr.yhbt.net/
+
+ * Optional history squashing to reduce commit and intermediate
+ tree objects
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] TODO: notes about v2 format for giant archives
2018-01-16 22:36 [PATCH] TODO: notes about v2 format for giant archives Eric Wong
@ 2018-02-08 3:09 ` Eric Wong
2018-02-08 4:05 ` Konstantin Ryabitsev
0 siblings, 1 reply; 4+ messages in thread
From: Eric Wong @ 2018-02-08 3:09 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Eric Wong <e@80x24.org> wrote:
> + * Ref rotation (splitting heads by YYYY or YYYY-MM)
> +
> + * Support multiple git repos for a single archive?
> + This seems gross, but splitting large packs in in git conflicts
> + with bitmaps and we want to use both features. Perhaps this
> + limitation can be fixed in git instead of merely being documented:
> + https://public-inbox.org/git/20160428072854.GA5252@dcvr.yhbt.net/
OK, so I've been strongly considering having git repos based on
YYYY or YYYY-MM because of shardability and ease-of-maintenance.
The other thing is I think we can support is subsystem lists
({netdev,fsdevel,stable}@vger) in the SAME git repo(s) with a
different head for each list. That could make subsystem lists
cheaper to archive since there's a lot of overlap with the main
list.
Headers like the List-Id:, Sender:, etc will still differ for
archival purposes and have different object_id in git; but git
should be able to delta well based on the rest of the message
based on age.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] TODO: notes about v2 format for giant archives
2018-02-08 3:09 ` Eric Wong
@ 2018-02-08 4:05 ` Konstantin Ryabitsev
2018-02-08 17:08 ` Eric Wong
0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Ryabitsev @ 2018-02-08 4:05 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
On Thu, Feb 08, 2018 at 03:09:51AM +0000, Eric Wong wrote:
> The other thing is I think we can support is subsystem lists
> ({netdev,fsdevel,stable}@vger) in the SAME git repo(s) with a
> different head for each list. That could make subsystem lists
> cheaper to archive since there's a lot of overlap with the main
> list.
Hmm... that's a pretty cool idea, but I wonder if people would find it
annoying to have to pull all the lkml-related objects if all they care
about are messages from, say, netdev. For archiving purposes I think it
would be great, but I'm less sure about UX impacts.
-K
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] TODO: notes about v2 format for giant archives
2018-02-08 4:05 ` Konstantin Ryabitsev
@ 2018-02-08 17:08 ` Eric Wong
0 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2018-02-08 17:08 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Feb 08, 2018 at 03:09:51AM +0000, Eric Wong wrote:
> > The other thing is I think we can support is subsystem lists
> > ({netdev,fsdevel,stable}@vger) in the SAME git repo(s) with a
> > different head for each list. That could make subsystem lists
> > cheaper to archive since there's a lot of overlap with the main
> > list.
>
> Hmm... that's a pretty cool idea, but I wonder if people would find it
> annoying to have to pull all the lkml-related objects if all they care
> about are messages from, say, netdev. For archiving purposes I think it
> would be great, but I'm less sure about UX impacts.
Good point. It would be great from a the perspective of
somebody who wants to read and store all the mail; but it
might also be bad for people who only want a list subset.
Wouldn't be good for pack reuse on the server side, either.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-02-08 17:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-16 22:36 [PATCH] TODO: notes about v2 format for giant archives Eric Wong
2018-02-08 3:09 ` Eric Wong
2018-02-08 4:05 ` Konstantin Ryabitsev
2018-02-08 17:08 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).