unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* RFC: notmuch powered (personal) (end-to-end) e-mail system
@ 2011-03-20 14:07 Ciprian Dorin Craciun
  2011-03-20 15:18 ` Brett Viren
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Ciprian Dorin Craciun @ 2011-03-20 14:07 UTC (permalink / raw)
  To: notmuch

    Hello all! (Sorry for the long email.)

    I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

    Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
    * the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
    * most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
    * SPAM -- as above -- is not integrated;
    * filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

    As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

    I'm targeting the following features:
    * (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
    * automatic spam filtering, and tag applying;
    * automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
    * remote RPC-like access to the whole system;
    * remote Web user interface;

    About the overall architecture I'm thinking on adopting the following:
    * in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
    * all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
    * all the communication is JSON based;

    The components would be:
    * SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
    * email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
    * spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
    * email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
    * (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
    * restful web service that would intermediate access to all the
above components;

    For now I have the following uncertainties:
    * how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
    * should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
    * should I use libnotmuch or for starters just make a notmuch tool wrapper;
    * and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

    Suggestions? Considerations?

    Ciprian.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: notmuch powered (personal) (end-to-end) e-mail system
  2011-03-20 14:07 RFC: notmuch powered (personal) (end-to-end) e-mail system Ciprian Dorin Craciun
@ 2011-03-20 15:18 ` Brett Viren
  2011-03-20 16:37 ` Ben Gamari
  2011-03-20 18:01 ` Austin Clements
  2 siblings, 0 replies; 4+ messages in thread
From: Brett Viren @ 2011-03-20 15:18 UTC (permalink / raw)
  To: Ciprian Dorin Craciun; +Cc: notmuch

On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:

>    I'm "struggling" for some time to get rid of the current
> "de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
> observed for some time the notmuch project and community.

It sounds like what you want *is* GMail (I don't know Zimbra) but just
that you want it running on your own box instead of on Google's
servers.

>    Suggestions? Considerations?

Based on what you wrote, I think BerkeleyDB will be too limiting.
I suggest for you to look into DBMail[1] for the mail store.


-Brett.

[1] http://www.dbmail.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: notmuch powered (personal) (end-to-end) e-mail system
  2011-03-20 14:07 RFC: notmuch powered (personal) (end-to-end) e-mail system Ciprian Dorin Craciun
  2011-03-20 15:18 ` Brett Viren
@ 2011-03-20 16:37 ` Ben Gamari
  2011-03-20 18:01 ` Austin Clements
  2 siblings, 0 replies; 4+ messages in thread
From: Ben Gamari @ 2011-03-20 16:37 UTC (permalink / raw)
  To: Ciprian Dorin Craciun, notmuch

On Sun, 20 Mar 2011 16:07:50 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
>     Hello all! (Sorry for the long email.)
> 
> [snip]
> 
>     * the most feature full UI is the Emacs one -- thus limited remote
> access (I mean from an arbitrary computer with only a web-browser);
> (and I'm not a very big fan of Emacs;)
> 
There have been a few attempts to put together an HTML front-end to
notmuch[1]. None have made it very far though. It would be nice to see
this space filled.

>     * most are still dependent on external IMAP systems -- this is not
> a problem with notmuch itself, but for the integrating clients;
> 
Not entirely sure what you mean by this. You could easily use
e.g. notmuch-deliver as the local delivery agent with a SMTP server and
you'd have no need for IMAP.

>     * SPAM -- as above -- is not integrated;
> 
Nor should it be. Mail indexing, viewing, composing, and
filtering are all orthogonal parts of a mail system. It takes all of
ten lines to invoke a spam filter in your filter script.

>     * filtering (tag applying) is not automatic (as in integrated in
> notmuch itself or the client), but triggered through external scripts;
> 
Again, there is no reason why this should be incorporated into your
mail indexer.

>     As such I'm thinking on implementing a custom end-to-end email
> system and I would like to hear your feedback before embarking on such
> a task.
> 
Notmuch works so well for its audience because it adheres to the UNIX
philosophy of "do one thing and do it well." The goal of an "integrated
end-to-end" mail system might sound nice, but IMHO it's a recipe for a
kludgey, unmaintainable nightmare which is mediocre at performing its
task, on a good day. Perhaps I'm misunderstanding your proposal but it
seems to me like you are taking an easy, already solved problem and
turning it into a difficult one.

>     I'm targeting the following features:
>     * (inbound) SMTP integration, thus once an email is received it is
> automatically pushed through the system; (I'm primarily targeting
> those users that afford to run their own SMTP server; but the solution
> could still be adapted for those that only want the other features;)
> 
Is there something wrong with Postfix with notmuch-deliver as a LDA?

>     * automatic spam filtering, and tag applying;
>
A traditional sorting script with bogofilter/spamassassin?

>     * automatic email triggers based on tags (such as user
> notifications, forwarding, etc.)
>
Again, a sorting script?

>     * remote RPC-like access to the whole system;
> 
What's wrong with SSH?

>     * remote Web user interface;
> 
Nothing fills this need currently. Feel free to write up something but
please don't couple it to some all-inclusive beheamoth of a project.

Personally, I would think more carefully about this project before
proceding. It sounds like you intend on reinventing various portions of
the wheel several times. Nothing you have listed is difficult to do with
a few scripts, notmuch, and an SMTP server. 

>     About the overall architecture I'm thinking on adopting the following:
>     * in general the whole system is decomposed in independent
> components (long-lived OS daemons) that each one does a particular job
> (see below);
>     * all the components communicate between each-other through a
> message queue system (for example ZeroMQ or RabbitMQ);
>     * all the communication is JSON based;
> 
>     The components would be:
>     * SMTP inbound gateway -- for example I could take qmail or
> Postfix and replace the delivery agent with a custom process that
> pushes the email into the system; (any other solution suggestions?);
>     * email store -- as the name suggests it is a simple
> key-value-like store that should persist raw email-messages; it should
> be as robust as possible, and its contents should be the only thing
> needed to reconstruct all the other derived data; (I could use here a
> simple process that maintains a maildir, I could go also with a
> BerkeleyDB wrapper, or even something more sophisticated;)
>     * spam filter -- which either classifies the email or trains the
> spam filter; (for example I would use bogofilter;)
>     * email index -- this is where notmuch would come into play; it
> would be fed with emails, which it would automatically apply tags and
> issue trigger notifications based on tags; it also maintains a set of
> filters and tags to automatically apply;
>     * (maybe) a coordinator that should delegate and monitor requests
> to the above components; but if I'm using RabbitMQ and carefully
> designing the above components, they could drive each other;
>     * restful web service that would intermediate access to all the
> above components;
> 
>     For now I have the following uncertainties:
>     * how should I handle multiple users? I think each user should
> have it's own store / notmuch / bogofilter instance (at least in terms
> of storage if not even in terms of separate daemon);
>     * should I keep the emails is a file-system, or a key-value store?
> (the file-system is more bug-free, but I'm confident that a BerkeleyDB
> instance would be more efficient);
>     * should I use libnotmuch or for starters just make a notmuch tool
>     wrapper;
>
>     * and the most pressing one, transactions: I would like that at no
> point does a message get half processed or lost; as such I need
> notmuch to behave transactionally -- indexing the message and tagging
> it should be atomic and durable; (is there a way with libnotmuch to
> control the underlaying BerkeleyDB database?)
>
It tries to be as reliable as possible. I believe before a patch a few
months ago it was possible to kill notmuch during "notmuch new" and lose
messages (they would never be indexed). It's possible there are more
bugs such as this. If they do exist they should be dealt with.

Also, notmuch uses xapian, not bdb.

Anyways, that is my two cents. Good luck.

Cheers,

- Ben


[1] https://github.com/dme/noneatall

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: notmuch powered (personal) (end-to-end) e-mail system
  2011-03-20 14:07 RFC: notmuch powered (personal) (end-to-end) e-mail system Ciprian Dorin Craciun
  2011-03-20 15:18 ` Brett Viren
  2011-03-20 16:37 ` Ben Gamari
@ 2011-03-20 18:01 ` Austin Clements
  2 siblings, 0 replies; 4+ messages in thread
From: Austin Clements @ 2011-03-20 18:01 UTC (permalink / raw)
  To: Ciprian Dorin Craciun; +Cc: notmuch

Much of the beauty of notmuch is how few assumptions it makes about
your mail system.  It plays well with others.  For example, one deep
insight of notmuch is that it *doesn't* require a custom mail store,
even though a more obvious design might; in fact, it doesn't even
require Maildir.

That said, I think I can see where you're coming from and I also think
you're targeting some of the deficiencies of notmuch, but I also think
you're overengineering the solution.  As a result of notmuch's
simplicity, a fully working mail setup requires a lot of moving parts
besides notmuch and it can take a while for a new user to set all that
up, especially if they're migrating wholesale from some external mail
setup.

On Sun, Mar 20, 2011 at 10:07 AM, Ciprian Dorin Craciun
<ciprian.craciun@gmail.com> wrote:
>    As such I'm thinking on implementing a custom end-to-end email
> system and I would like to hear your feedback before embarking on such
> a task.
>
>    I'm targeting the following features:
>    * (inbound) SMTP integration, thus once an email is received it is
> automatically pushed through the system; (I'm primarily targeting
> those users that afford to run their own SMTP server; but the solution
> could still be adapted for those that only want the other features;)

As others have mentioned, see notmuch-deliver.  I and others have also
suggested inotify support for notmuch before, which would make the
inbound mail mechanism (be it SMTP, IMAP fetching, or whatever)
completely unaware of notmuch, offer some other benefits (for example,
if mail is manipulated outside notmuch via IMAP), and is highly
discoverable for new users (just have notmuch setup ask if they want
notmuch to monitor for new email and then fire up an inotify daemon
the first time notmuch is called).

>    * automatic spam filtering, and tag applying;
>    * automatic email triggers based on tags (such as user
> notifications, forwarding, etc.)

Obviously the above two can be scripted, but I agree that it's
unsatisfying that every user needs to roll their own delivery script.
While tagging and triggering are highly personal, they're not *so*
personal that everyone needs a completely custom solution.  This
should be more approachable.  I'm not sure what the best answer here
is, but I don't think it requires it requires integration with a
monolithic system to do right.

>    * remote RPC-like access to the whole system;

This is another deep insight of notmuch.  It already has an awesome
RPC interface: the CLI.  Perhaps your actual problem is that the only
supported remote transport protocol is SSH.  This comes with a lot of
benefits (authentication, RPC pipelining), but also a lot of baggage
(a full SSH client on the client side).  I've thought about this in
the context of both an HTTP client and an Android client and in both
cases I concluded that a simple HTTPS transport wrapped around the
notmuch CLI would be the way to go.  Just put the CLI arguments in a
POST and send the JSON on stdout back.  This is trivial to prototype
as a Python CGI script, easy to build as a standalone Python server,
and not especially hard to build as a robust C server.

>    * remote Web user interface;

A good web UI would be fantastic.  Based on the rest of your email, I
get the impression this was a requirements driver from much of the
above, especially the integrated tagging/triggering and RPC access.
I've already suggested a simple solution to the RPC problem.  For
tagging/triggering, it's probably worth developing a solution that
allows for machine-editable rules (ideally retaining
user-editableness), which would make it possible to integrate filter
management in to a web UI.  This could be as simple as a standard
delivery script that operates from some simple rule database.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-03-20 18:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-20 14:07 RFC: notmuch powered (personal) (end-to-end) e-mail system Ciprian Dorin Craciun
2011-03-20 15:18 ` Brett Viren
2011-03-20 16:37 ` Ben Gamari
2011-03-20 18:01 ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).