unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Distributed Notmuch
@ 2012-01-08 10:23 Ethan Glasser-Camp
  2012-01-10  3:54 ` Jan Pobrislo
  0 siblings, 1 reply; 2+ messages in thread
From: Ethan Glasser-Camp @ 2012-01-08 10:23 UTC (permalink / raw)
  To: notmuch

Hi guys,

It's kind of academic for me right now because I'm mostly just using one 
computer, but one reason I've hesitated to switch over entirely to 
notmuch is that it's hard to distribute across many machines. The last 
time I wrote the list about this, David Bremner pointed me to gitmuch in 
the notmuch-scripts package, which uses git to synchronize tags. He 
wrote, "No one claims this is a great solution, but it works now."

In brainstorming about the One True Mail Setup, my friend suggested to 
me that Maildir/IMAP are not really the best choices for mail storage. 
Among other flaws: to synchronize mail via IMAP you have to check the 
headers of each message, which means a lot of bandwidth; you can't 
compress Maildir, meaning lots of wasted space; mechanisms to 
synchronize arbitrary tags have to be bolted on top. My friend suggested 
that instead it might be better to dump mail into some kind of database, 
for example CouchDB, and synchronize it that way. Of course, doing 
full-text indexing and tagging using an arbitrary DB would be a ton of 
work, so instead it probably makes the most sense to keep a Xapian 
instance on each node and feed all the mail to that. Tagging operations 
still have to be replicated, probably by an oplog that's also kept in 
Couch, so it's still a lot of work, but keeping things in Couch 
automatically gets you a lot of the replication mechanisms, offline 
access, etc., that would have to be bolted on/hacked up using tools like 
nmbug/gitmuch/rsync. I also see in the wiki that someone proposes to use 
git as the mail store, presumably for similar reasons. Xapian itself has 
the idea of master/slave replication but that doesn't really get you 
full offline access.

So my question for the wizards on this list is what their idea of the 
One True Mail Setup would be in a perfect, or slightly better, world, 
and what needs to be done to get there. I know some people use one 
notmuch install that they access remotely. For myself, I'm on a pretty 
limited Internet connection, so low bandwidth/offline access are big for 
me, and despite Nicolas Sebrecht and Sebastian Spaeth's heroic work on 
OfflineIMAP, it still uses a lot of bandwidth to sync. And obviously the 
whole point of this exercise is tag synchronization..

Ethan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Distributed Notmuch
  2012-01-08 10:23 Distributed Notmuch Ethan Glasser-Camp
@ 2012-01-10  3:54 ` Jan Pobrislo
  0 siblings, 0 replies; 2+ messages in thread
From: Jan Pobrislo @ 2012-01-10  3:54 UTC (permalink / raw)
  To: notmuch

Quoting Ethan Glasser-Camp (2012-01-08 11:23:59)
>Hi guys,
>
> ...
>
>In brainstorming about the One True Mail Setup, my friend suggested to 
>me that Maildir/IMAP are not really the best choices for mail storage. 

In my opinion Maildirs are very good mail storage format, the issue is
just that IMAP can't transfer them in their entirety and simplicity.

>Among other flaws: to synchronize mail via IMAP you have to check the 
>headers of each message, which means a lot of bandwidth;

There are UIDs in IMAP, see:
http://tools.ietf.org/html/rfc3501#section-2.3.1.1
But I do agree IMAP is indeed not a very good protocol.

>compress Maildir, meaning lots of wasted space;

There are several approaches to compressing the filesystem that can be
used with maildirs, but this could easilly become bottleneck for most
setups.

>My friend suggested that instead it might be better to dump mail into
>some kind of database, for example CouchDB, and synchronize it that way. 

Some time ago I pondered putting emails into MongoDB so the client does
not have to deal with parsing MIME, but this does not give you any big
advantage for synchronization. Rather, you'll be running into the
consistency/availability/partition-tolerance issue. You'll have to
choose if you want to support offline write operations and if so, how
will you handle conflicts that will appear. DVCSes are built to make
this as easy as possible, databases usually not. I cannot comment on
CouchDB and it's MVCC, but I still doubt it would be as practical as
true DVCS.

By the way I highly reccomend this blogpost series:
http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1

> ...
>
>So my question for the wizards on this list is what their idea of the 
>One True Mail Setup would be in a perfect, or slightly better, world, 
>and what needs to be done to get there. I know some people use one 
>notmuch install that they access remotely. For myself, I'm on a pretty 
>limited Internet connection, so low bandwidth/offline access are big for 
>me, and despite Nicolas Sebrecht and Sebastian Spaeth's heroic work on 
>OfflineIMAP, it still uses a lot of bandwidth to sync. And obviously the 
>whole point of this exercise is tag synchronization..

I tend to go offline too with my laptop, so I can see what are you
talking about. For me the Ideal Mail Setup would be:

* access via ssh to limited/pseudoshell account
  - ssh handles autentication far better than sasl-based apps
  - ssh is designed to allow multiple operations in parallel including
    large uploads/downloads that can be resumed
* maildir is accessible via sftp (sshfs) and ssh+rsync
* there is notmuch launchable from the restricted shell, every new mail
  is indexed
* there is database of messages, tags and filenames, kept under DVCS.
  With aid of this database full three-way merges may be performed.
* once client is connected, he should have a way to listen for change
  messages that the server will push

This would allow for convenient operation both in online (storage-free)
and offline (replicated) mode.

I think this is actually pretty implementable. I'd use twisted.conch for
ssh server (launchpad.net uses this), which can be easilly tied in with
dovecot's autentication daemon. Change detection can be done via
inotify/lsyncd. The versioning/merging tool can possibly be based off
current nmbug (I haven't examined it yet). But I'm pretty sure I won't
have time for project of such scale in near future.

</braindump>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-01-10  3:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-08 10:23 Distributed Notmuch Ethan Glasser-Camp
2012-01-10  3:54 ` Jan Pobrislo

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).