unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Ben Gamari <bgamari@gmail.com>
To: notmuch <notmuch@notmuchmail.org>
Subject: Re: Mail in git
Date: Wed, 17 Feb 2010 20:01:21 -0500	[thread overview]
Message-ID: <1266453575-sup-3653@ben-laptop> (raw)
In-Reply-To: <87ljerju0q.fsf@willster.local.flamingspork.com>

Excerpts from Stewart Smith's message of Wed Feb 17 18:56:53 -0500 2010:
> On Wed, 17 Feb 2010 14:21:01 +1300, martin f krafft <madduck@madduck.net> wrote:
> > What I am wondering is if (explicit) tags couldn't be represented as
> > tree-objects with this.
> 
> I think it could get expensive for tags with lots of messages.
> 
> As far as I understand it, the tree object is stored in full and space
> is only reclaimed during repack (due to delta compression).
> 
> So if you, say, had the entire history of a high volume list such as
> linux-kernel, adding messages could get rather expensive if you
> auto-tagged (or autotagged messages with patches or whatever).
> 

Well, it's tough to say, but I don't think it's as bad as you think. I
proposed that we could use a tree structure like the following,

                  ╭─msg1
      ╭tagA.list1╶┼─msg2
      │           ╰─msg3
      │
      │           ╭─msg4
      ├tagA.list2╶┼─msg5
      │           ╰─msg6
tagA ╶┤
      │           ╭─msg7
      ├tagA.list3╶┼─msg8
      │           ╰─msg9
      │
      │           ╭─msg10
      ╰tagA.list4╶┼─msg11
                  ╰─msg12

This way, adding a message to, say list3, would only require rewriting
list3 and tagA, which seems pretty reasonable to me. Moreover, we could
make the tree structure as deep as necessary, although we
would need to rewrite a node at every level of the tree, so its tough
saying how many levels is too many. It could simply be adaptive (e.g.
bisect any nodes with more than N children).

This certainly isn't as simple as the naive approach, but I think it's
the only reasonable approach performance-wise and I don't believe it
shouldn't be too tricky.

> > messages would then be deleted whenever using git-gc.
> > 
> > No idea how this would sync if we don't keep ancestry. Otoh, it
> > would probably not be very expensive to do just that.
> 
> If we keep ancestry though, we are reusing existing working code for
> backup (git-pull :)

This is one of the reasons I feel it's important we keep it. And as is
stated below, the storage overhead is minimal.
> 
> Keep in mind that with my tests, the Maildir in git is about a quarter
> to a fifth of the size of it in Maildir... so a bit of extra usage per
> message isn't as dramatic as it may sound.
> 

  reply	other threads:[~2010-02-18  1:01 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-15  0:29 Mail in git Stewart Smith
2010-02-16  9:08 ` Michal Sojka
2010-02-16 19:06 ` Ben Gamari
2010-02-17  0:21   ` Stewart Smith
2010-02-17 10:07     ` Stewart Smith
2011-05-21  7:05       ` martin f krafft
2011-05-21  7:25         ` Stewart Smith
2010-02-17  1:21 ` martin f krafft
2010-02-17 15:03   ` Ben Gamari
2010-02-17 19:23     ` Mark Anderson
2010-02-17 19:34       ` Ben Gamari
2010-02-17 23:52         ` martin f krafft
2010-02-18  0:39           ` Ben Gamari
2010-02-18  1:58             ` martin f krafft
2010-02-18  2:19               ` Ben Gamari
2010-02-18  2:48                 ` nested tag trees (was: Mail in git) martin f krafft
2010-02-18  4:32                   ` martin f krafft
     [not found]                   ` <1266463007-sup-8777@ben-laptop>
2010-02-18  4:34                     ` martin f krafft
     [not found]                     ` <20100218034613.GD1991@lapse.rw.madduck.net>
2010-02-18  4:44                       ` Ben Gamari
2010-02-18  4:59                         ` martin f krafft
2010-02-18  5:10                           ` Ben Gamari
2010-02-19  0:31                             ` martin f krafft
2010-02-19  9:52                               ` Michal Sojka
2010-02-19 14:27                                 ` Ben Gamari
2010-02-17 23:56   ` Mail in git Stewart Smith
2010-02-18  1:01     ` Ben Gamari [this message]
2010-02-18  2:00       ` martin f krafft
2010-02-18  2:11         ` Git ancestry and sync problems (was: Mail in git) martin f krafft
2010-02-18  8:34           ` racin
2010-02-18 12:20             ` Jameson Rollins
2010-02-18 12:47             ` Ben Gamari
2010-02-18 23:23             ` martin f krafft

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1266453575-sup-3653@ben-laptop \
    --to=bgamari@gmail.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).