From: Michal Sojka <sojkam1@fel.cvut.cz>
To: Stewart Smith <stewart@flamingspork.com>, notmuch@notmuchmail.org
Subject: Re: Mail in git
Date: Tue, 16 Feb 2010 10:08:45 +0100 [thread overview]
Message-ID: <87wrydim3m.fsf@steelpick.localdomain> (raw)
In-Reply-To: <20100215002914.GA22402@flamingspork.com>
Hi Stewart,
On Mon, 15 Feb 2010 11:29:14 +1100, Stewart Smith <stewart@flamingspork.com> wrote:
> Which goes from a 15GB Maildir to a 3.7GB git repo.
That's quite interesting ratio. I've tried a plain git add and git gc on
my mail store and the result was a repo of approximately 50% of mail
store size. Do you think that this difference might be caused by the way
you created the packs?
>
> The algorithm of evenless.pl is basically:
> 1 get next directory entry
> 2 if is directory, recurse into it
> 3 write item to git (git hash-object -w)
> 4 add item to tree object
> 5 if number of items written = 1000
> 5.1 make pack of last 1000 items
> 6 goto 1
So it seems that you have all you mails in a single tree. How long it
takes to caculate difference of two trees (git diff-tree --name-status)?
This operation will be needed by "notmuch new" to determine which
files/blobs to index. I suppose it will be better if mail blobs are
stored in subtrees. If a subtree is not changed git doesn't need to
descend to it because it has the same sha1.
I think that storing mails in a similar structure as in .git/objects
(i.e. 256 subdirectories based on the first sha1 byte and file names
based on the last 39 sha1 bytes) would be reasonable.
> Next step?
>
> Make notmuch be able to read mail out of it and add it to an index
> (oh, and some kind of verification and error checking about creating
> the git repo).
Besides using git to compact the size of mail store, another feature that
cames with git for free is synchronization. For this to work, you only
need to store tags in the repo. What might work is to store tags in
files named <mail-name>.tags. The tags would be stored in the files
alphabetically, one tag per line. I guess, that this way makes it easy
to merge tags during synchronization even without writing custom git
merge driver.
Onother point that must be solved if we would like to use git with
notmuch is the license problem. As it was pointed out by Carl in another
thread, Git is licensed under GPLv2 only whereas notmuch under GPLv3 and
these licences are incompatible. So I think we will need some kind of
hooks in notmuch from which external programs (git) will be called.
Cheers,
Michal
next prev parent reply other threads:[~2010-02-16 9:08 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-15 0:29 Mail in git Stewart Smith
2010-02-16 9:08 ` Michal Sojka [this message]
2010-02-16 19:06 ` Ben Gamari
2010-02-17 0:21 ` Stewart Smith
2010-02-17 10:07 ` Stewart Smith
2011-05-21 7:05 ` martin f krafft
2011-05-21 7:25 ` Stewart Smith
2010-02-17 1:21 ` martin f krafft
2010-02-17 15:03 ` Ben Gamari
2010-02-17 19:23 ` Mark Anderson
2010-02-17 19:34 ` Ben Gamari
2010-02-17 23:52 ` martin f krafft
2010-02-18 0:39 ` Ben Gamari
2010-02-18 1:58 ` martin f krafft
2010-02-18 2:19 ` Ben Gamari
2010-02-18 2:48 ` nested tag trees (was: Mail in git) martin f krafft
2010-02-18 4:32 ` martin f krafft
[not found] ` <1266463007-sup-8777@ben-laptop>
2010-02-18 4:34 ` martin f krafft
[not found] ` <20100218034613.GD1991@lapse.rw.madduck.net>
2010-02-18 4:44 ` Ben Gamari
2010-02-18 4:59 ` martin f krafft
2010-02-18 5:10 ` Ben Gamari
2010-02-19 0:31 ` martin f krafft
2010-02-19 9:52 ` Michal Sojka
2010-02-19 14:27 ` Ben Gamari
2010-02-17 23:56 ` Mail in git Stewart Smith
2010-02-18 1:01 ` Ben Gamari
2010-02-18 2:00 ` martin f krafft
2010-02-18 2:11 ` Git ancestry and sync problems (was: Mail in git) martin f krafft
2010-02-18 8:34 ` racin
2010-02-18 12:20 ` Jameson Rollins
2010-02-18 12:47 ` Ben Gamari
2010-02-18 23:23 ` martin f krafft
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wrydim3m.fsf@steelpick.localdomain \
--to=sojkam1@fel.cvut.cz \
--cc=notmuch@notmuchmail.org \
--cc=stewart@flamingspork.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).