unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* My mail configuration
@ 2011-02-23 14:22 Ben Gamari
  2011-03-06 21:34 ` Michal Sojka
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ben Gamari @ 2011-02-23 14:22 UTC (permalink / raw)
  To: notmuch

Here is my mail sorting script that has been slowly evolving for almost
a year now. It uses the Python bindings, along with Bogofilter for spam
filtering. There is also an update-spam script which brings the
Bogofilter database in to synchronization with the notmuch tags. On this
note, if someone wants to implement the ability to hide certain tags
(say, those matching /\..+/) in the emacs interface it would be greatly
appreciated. I have notmuch configured such that all new mail starts
with just the "new" tag. The sorting script then takes it from
there. Hope this will give folks some ideas.

Cheers,

- Ben


===File ~/.env/mail/sort_mail.py============================
#!/usr/bin/python

# Warning:
# Be careful about using Query.count_messages(), it's technically an estimate
# and is not guarranteed to be correct

import os
import logging
import time

logging.basicConfig(level=logging.DEBUG)

_tags = []
start_time = time.time()

def sf_list(name, tag):
        #_tags.append( ('to:%s@lists.sourceforge.net' % name, ['list', tag]) )
        #_tags.append( ('to:%s@lists.sf.net' % name, ['list', tag]) )
        _tags.append( ('to:%s' % name, ['list', tag]) )

def kernel_list(name, tag):
        #_tags.append( ('to:%s@vger.kernel.org' % name, ['list', tag]) )
        _tags.append( ('to:%s' % name, ['list', tag]) )

def fdo_list(name, tag):
        #_tags.append( ('to:%s@lists.freedesktop.org' % name, ['list', tag]) )
        _tags.append( ('to:%s' % name, ['list', tag]) )

def _list(name, tag):
        _tags.append( ('to:%s' % name, ['list', tag]) )

def tag(filter, *tags):
        _tags.append( (filter, tags) )

kernel_list('linux-kernel', 'lkml')
kernel_list('mm-commits', 'mm-commits')
kernel_list('linux-omap', 'linux-omap')
kernel_list('linux-next', 'linux-next')
kernel_list('linux-wireless', 'linux-wireless')
kernel_list('linux-btrfs', 'btrfs')
_list('linux-pm', 'linux-pm')
_list('linux-arm-kernel', 'linux-arm')
sf_list('oprofile-list', 'oprofile')
sf_list('spi-devel-general', 'spi-devel')
sf_list('linux1394-devel', 'ieee1394')

sf_list('ipw3945-devel', 'ipw')
_list('hostap@lists.shmoo.com', 'hostap')
_list('ath9k-devel@', 'ath9k')
_list('vim-dev@vim.org', 'vim')
_list('vim_dev', 'vim')

fdo_list('intel-gfx', 'intel-gfx')
fdo_list('xorg', 'xorg')
fdo_list('hal', 'hal')
fdo_list('compiz', 'compiz')
sf_list('dri-devel', 'dri')
sf_list('dri-users', 'dri')
sf_list('mesa3d-dev', 'mesa')
fdo_list('mesa-dev', 'mesa')

fdo_list('devkit-devel', 'devkit')
sf_list('matplotlib-users', 'matplotlib')
sf_list('matplotlib-devel', 'matplotlib')
_list('notmuch@notmuchmail.org', 'notmuch')
_list('eigen@lists.tuxfamily.org', 'eigen')
_list('launchpad-users@lists.launchpad.net', 'launchpad')
_list('boost@lists.boost.org', 'boost')
_list('debian-python@lists.debian.org', 'debian-python')

_list('geda-user@', 'geda')

_list('openembedded-devel@lists.openembedded.org', 'openembedded')
_list('beagleboard@googlegroups.com', 'beagleboard')
_list('angstrom-distro-devel@linuxtogo.org', 'angstrom')
_list('angstrom-distro-users@linuxtogo.org', 'angstrom')

_list('mono-devel-list@lists.ximian.com', 'mono')
_list('mono-list@', 'mono')
_list('ubuntu-devel-discuss@lists.ubuntu.com', 'ubuntu-devel')
_list('git@vger.kernel.org', 'git')
_list('sup-talk@rubyforge.org', 'sup')
_list('thrust-users@googlegroups.com', 'thrust')
_list('golang-nuts@googlegroups.com', 'go')
_list('numpy-discussion@scipy.org', 'numpy')
_list('scipy-user@scipy.org', 'scipy')

_list('rsync@lists.samba.org', 'rsync')
tag('from:samba-bugs', 'bugs', 'rsync', 'list')

_list('containers@', 'containers')

tag('from:bugzilla', 'bugs', 'list')

# Tags that aren't for lists
tag('from:Facebook', 'facebook')
tag('to:gdh@gdhour.com', 'gdh')

tag('to:bgamari@gmail.com', 'gmail')
tag('to:bgamari.foss@gmail.com', 'foss')
tag('from:Ben Gamari', 'sent')
tag('from:bgamari.foss', 'sent')

from sort_junk import sort_junk
from notmuch_utils import *
import notmuch
db = notmuch.Database(mode=notmuch.Database.MODE.READ_WRITE)

# Freeze new messages
q_new = notmuch.Query(db, 'tag:new')
n_msgs = 0
for msg in q_new.search_messages():
        msg.freeze()
        n_msgs += 1

# Take care of basics
tag_search(db, 'tag:new', '+unread', '+unseen')

# Take care of feeds
tag_search(db, 'folder:feeds', '+feeds', '-new')

# Run through Bogofilter
sort_junk(q_new)

# Tag things
for filter, tags in _tags:
        tag_search(db, '%s and tag:new' % filter, *tags)

# Ignore things I sent
tag_search(db, 'tag:new and tag:sent', '-unseen', '-new', '-unread', '+watch')

# Update watch tag
for msg in q_new.search_messages():
        q = notmuch.Query(db, 'tag:watch and thread:%s' % msg.get_thread_id())
        if len(q.search_messages()) > 0:
                logging.debug('watching %s' % msg.get_message_id())
                msg.add_tag('watch')

# Watched items should go to inbox
tag_search(db, 'tag:new and tag:watch', '+inbox', '-new')

# Ignore threads that I've already seen
q = notmuch.Query(db, 'tag:new and tag:list')
for msg in q.search_messages():
        q2 = notmuch.Query(db, 'thread:%s and not tag:unseen' % msg.get_thread_id())
        if len(q2.search_messages()) > 0:
                msg.remove_tag('unseen')
                msg.remove_tag('new')

# Remove new from sorted list items
tag_search(db, 'tag:new and tag:list', '-new')

# Tag remaining new items for inbox
tag_search(db, 'tag:new', '+inbox', '-new')

# Thaw new messages
for msg in q_new.search_messages():
        msg.thaw()

end_time = time.time()
logging.info('Sorted %d messages in %1.2f seconds' % (n_msgs, end_time - start_time))

============================================================


===File ~/.env/mail/sort_junk.py============================
#!/usr/bin/python

import logging
import subprocess
from subprocess import PIPE
import notmuch
import re

def sort_junk(query):
        spam_re = re.compile('X-Bogosity:\s*Spam')
        spamicity_re = re.compile('spamicity=(\d\.\d+)')
        bf = subprocess.Popen(['bogofilter', '-bv'], stdin=PIPE, stdout=PIPE)
        for msg in query.search_messages():
                bf.stdin.write(msg.get_filename() + '\n')
                l = bf.stdout.readline()
                if re.search(spam_re, l):
                        logging.debug('Message %s marked as junk' % msg.get_message_id())
                        msg.add_tag('junk')
        bf.stdin.close()

if __name__ == '__main__':
        import sys
        db = notmuch.Database(mode=notmuch.Database.MODE.READ_WRITE)
        query = notmuch.Query(db, ' '.join(sys.argv[1:]))
        sort_junk(query)

============================================================


===File ~/.env/mail/update-junk=============================
#!/usr/bin/python

import notmuch
from notmuch_utils import *
import subprocess
from time import time
import sys

logging.basicConfig(level=logging.INFO)

db = notmuch.Database(mode=notmuch.Database.MODE.READ_WRITE)
if '--clean' in sys.argv:
        import shutil, os.path
        shutil.rmtree(os.path.expanduser('~/.bogofilter'))
        tag_search(db, 'tag:.bf_spam', '-.bf_spam')
        tag_search(db, 'tag:.bf_ham', '-.bf_ham')

def do_update(search, tag_func, bf_args):
        start_time = time()
        p = subprocess.Popen(['bogofilter', bf_args], stdin=subprocess.PIPE)
        q = notmuch.Query(db, search)
        n = 0
        for msg in q.search_messages():
                p.stdin.write('%s\n' % msg.get_filename())
                tag_func(msg)
                n += 1
        p.stdin.close()
        p.wait()
        return (n, time()-start_time)

logging.info('Registering spam')
n,t = do_update('tag:junk and not tag:.bf_spam', lambda msg: msg.add_tag('.bf_spam'), '-sb')
logging.info('Registered %d spam in %1.2f seconds' % (n,t))

logging.info('Unregistering spam')
n,t = do_update('not tag:junk and tag:.bf_spam', lambda msg: msg.remove_tag('.bf_spam'), '-Sb')
logging.info('Unregistered %d spam in %1.2f seconds' % (n,t))

# Only consider messages that have been read as ham
logging.info('Registering ham')
n,t = do_update('not tag:junk and not tag:unread and not tag:.bf_ham', lambda msg: msg.add_tag('.bf_ham'), '-nb')
logging.info('Registered %d ham in %1.2f seconds' % (n,t))

logging.info('Unregistering ham')
n,t = do_update('tag:junk and tag:.bf_ham', lambda msg: msg.remove_tag('.bf_ham'), '-Nb')
logging.info('Unregistered %d ham in %1.2f seconds' % (n,t))

============================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: My mail configuration
  2011-02-23 14:22 My mail configuration Ben Gamari
@ 2011-03-06 21:34 ` Michal Sojka
  2011-03-11 14:55   ` Ben Gamari
  2011-03-18  7:31 ` Jesse Rosenthal
  2011-05-20 22:39 ` eric casteleijn
  2 siblings, 1 reply; 6+ messages in thread
From: Michal Sojka @ 2011-03-06 21:34 UTC (permalink / raw)
  To: Ben Gamari, notmuch

On Wed, 23 Feb 2011, Ben Gamari wrote:
> Here is my mail sorting script that has been slowly evolving for almost
> a year now. It uses the Python bindings, along with Bogofilter for spam
> filtering. There is also an update-spam script which brings the
> Bogofilter database in to synchronization with the notmuch tags. On this
> note, if someone wants to implement the ability to hide certain tags
> (say, those matching /\..+/) in the emacs interface it would be greatly
> appreciated. I have notmuch configured such that all new mail starts
> with just the "new" tag. The sorting script then takes it from
> there. Hope this will give folks some ideas.

Hi Ben,

thanks for sharing your setup. It seems really interesting and probably
useful for many people. I'd propose to put it to wiki at
http://notmuchmail.org/initial_tagging/.

When compared to shell scripts for initial tagging, this seems to be
much faster because of freezing all new messages and thawing them after
all tagging is done. Also the handling of watch and unseen tag is
interesting.

The only thing I missed in your email is the definition of tag_search().
You probably define it in notmuch_utils similarly to this:

def tag_search(db, query, *tags):
    logging.debug('tagging %s %s' % (query, tags))
    q = notmuch.Query(db, query)
    for msg in q.search_messages():
        for tagop in tags:
            if tagop[0] == '+':
                msg.add_tag(tagop[1:])
            elif tagop[0] == '-':
                msg.remove_tag(tagop[1:])
            else:
                msg.add_tag(tagop)

> # Tag things
> for filter, tags in _tags:
>         tag_search(db, '%s and tag:new' % filter, *tags)

Here I would suggest to add parentheses around %s like:

     tag_search(db, '( %s ) and tag:new' % filter, *tags)

I use the 'or' operator in a few of my filters and without the
parentheses the query would be interpreted incorrectly.

-Michal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: My mail configuration
  2011-03-06 21:34 ` Michal Sojka
@ 2011-03-11 14:55   ` Ben Gamari
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Gamari @ 2011-03-11 14:55 UTC (permalink / raw)
  To: Michal Sojka, notmuch

On Sun, 06 Mar 2011 22:34:13 +0100, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> Hi Ben,
> 
> thanks for sharing your setup.
> 
No worries.

> It seems really interesting and probably
> useful for many people. I'd propose to put it to wiki at
> http://notmuchmail.org/initial_tagging/.
> 
Done.

> When compared to shell scripts for initial tagging, this seems to be
> much faster because of freezing all new messages and thawing them after
> all tagging is done.
> 
Indeed, IMHO the python bindings are perfect for this sort of
application.

> Also the handling of watch and unseen tag is interesting.
> 
The watch tag is one of the defining points of my workflow. I deal with
a lot of lists although generally I'm interested in very few threads
from each. The watch tag makes it very easy to find and follow relevant
discussions.

Just to make sure everyone understands what is happening here, the
workflow works as follows,

 1) When a message starting a new thread is encountered it is marked as
    "unseen"

 2) I look at the "unseen" tag, marking threads of interest with "watch"

 3) When a message comes in associated with an existing thread marked
    with "watch" it is marked with "inbox", otherwise it is ignored

In other words, the "unseen" tag is an indicator of whether I have
evaluated the message for interest. I use the standard "unread" tag to
mark whether I have read the contents of the message.

> The only thing I missed in your email is the definition of tag_search().
> You probably define it in notmuch_utils similarly to this:
> 
Doh, thanks for pointing that out. Your tag_search implementation
matches mine almost verbatim. For the record I have included
notmuch_utils.py below.

> Here I would suggest to add parentheses around %s like:
> 
>      tag_search(db, '( %s ) and tag:new' % filter, *tags)
> 
> I use the 'or' operator in a few of my filters and without the
> parentheses the query would be interpreted incorrectly.
> 
Good point. Thanks!

- Ben


--8<---------------cut here---------------start------------->8---
import notmuch
import logging

def tag_message(msg, *tags):
        msg.freeze()
        for tag in tags:
                if tag[0] == '+':
                        msg.add_tag(tag[1:])
                elif tag[0] == '-':
                        msg.remove_tag(tag[1:])
                else:
                        msg.add_tag(tag)
        msg.thaw()

def tag_search(db, search, *tags):
        q = notmuch.Query(db, search)
        count = 0
        for msg in q.search_messages():
                count += 1
                tag_message(msg, *tags)

        if count > 0:
                logging.debug('Tagging %d messages with (%s)' % (count, ' '.join(tags)))

--8<---------------cut here---------------end--------------->8---

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: My mail configuration
  2011-02-23 14:22 My mail configuration Ben Gamari
  2011-03-06 21:34 ` Michal Sojka
@ 2011-03-18  7:31 ` Jesse Rosenthal
  2011-03-18 12:32   ` Ben Gamari
  2011-05-20 22:39 ` eric casteleijn
  2 siblings, 1 reply; 6+ messages in thread
From: Jesse Rosenthal @ 2011-03-18  7:31 UTC (permalink / raw)
  To: Ben Gamari, notmuch

Hi Ben,

On Wed, 23 Feb 2011 09:22:57 -0500, Ben Gamari <bgamari.foss@gmail.com> wrote:
> Here is my mail sorting script that has been slowly evolving for almost
> a year now. 

Thanks for posting this, and sorry for digging this up so much later. I
was thinking of setting up something similar, and had one question:

> # Freeze new messages
> q_new = notmuch.Query(db, 'tag:new')
> n_msgs = 0
> for msg in q_new.search_messages():
>         msg.freeze()
>         n_msgs += 1

It seems like every time you iterate over `q_new.search_messages()', you
run a new search on tag:new. So at the end, when you thaw the messages,
you're running that search again, from scratch:

> # Tag remaining new items for inbox
> tag_search(db, 'tag:new', '+inbox', '-new')
> 
> # Thaw new messages
> for msg in q_new.search_messages():
>         msg.thaw()

But there are no longer and "tag:new"s, so there shouldn't be any
results for `q_new.search_messages()', should there? It seems like
it's thawing 0 messages. Playing around with it, it doesn't seem to make
a difference, so perhaps thawing is unneccessary if you're exiting after
tagging. Or am I misunderstanding something?

By the way, my understanding of the bindings is that you can avoid
running the new searches by dumping a Messages object into a list. So,
you can do something like:

    new_msg_obj = q_new.search_messages()
    new_msg_list = [m for m in new_msg_obj]

and then deal with the list from there on out. Not sure if that would
buy you much performance over running the query repeatedly, but it
couldn't hurt, and it would seem closer to the effect that you're aiming
at (since the members of the list would be set from the first query, and
therefore you'd be thawing the same elements you froze in the first
place).

Thanks again for posting this.

Best,
Jesse

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: My mail configuration
  2011-03-18  7:31 ` Jesse Rosenthal
@ 2011-03-18 12:32   ` Ben Gamari
  0 siblings, 0 replies; 6+ messages in thread
From: Ben Gamari @ 2011-03-18 12:32 UTC (permalink / raw)
  To: Jesse Rosenthal, notmuch

On Fri, 18 Mar 2011 03:31:59 -0400, Jesse Rosenthal <jrosenthal@jhu.edu> wrote:
> > # Freeze new messages
> > q_new = notmuch.Query(db, 'tag:new')
> > n_msgs = 0
> > for msg in q_new.search_messages():
> >         msg.freeze()
> >         n_msgs += 1
> 
> It seems like every time you iterate over `q_new.search_messages()', you
> run a new search on tag:new. So at the end, when you thaw the messages,
> you're running that search again, from scratch:
> 
Ouch, yes, you are absolutely right. Thankfully, as you noted, notmuch
seems to be cleaning up after me when I exit.

> > # Tag remaining new items for inbox
> > tag_search(db, 'tag:new', '+inbox', '-new')
> > 
> > # Thaw new messages
> > for msg in q_new.search_messages():
> >         msg.thaw()
> 
> But there are no longer and "tag:new"s, so there shouldn't be any
> results for `q_new.search_messages()', should there? It seems like
> it's thawing 0 messages. Playing around with it, it doesn't seem to make
> a difference, so perhaps thawing is unneccessary if you're exiting after
> tagging. Or am I misunderstanding something?
> 
> By the way, my understanding of the bindings is that you can avoid
> running the new searches by dumping a Messages object into a list. So,
> you can do something like:
> 
>     new_msg_obj = q_new.search_messages()
>     new_msg_list = [m for m in new_msg_obj]
> 
> and then deal with the list from there on out. Not sure if that would
> buy you much performance over running the query repeatedly, but it
> couldn't hurt, and it would seem closer to the effect that you're aiming
> at (since the members of the list would be set from the first query, and
> therefore you'd be thawing the same elements you froze in the first
> place).
> 
This is true, although I'd be worried about memory usage when there are
many new messages. I suppose this probably won't be a problem, however,
and even in the worst case it would probably only be a few hundred
MB. I'll consider this.

Thanks for your note!

Cheers,

- Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: My mail configuration
  2011-02-23 14:22 My mail configuration Ben Gamari
  2011-03-06 21:34 ` Michal Sojka
  2011-03-18  7:31 ` Jesse Rosenthal
@ 2011-05-20 22:39 ` eric casteleijn
  2 siblings, 0 replies; 6+ messages in thread
From: eric casteleijn @ 2011-05-20 22:39 UTC (permalink / raw)
  To: Ben Gamari, notmuch

On Wed, 23 Feb 2011 09:22:57 -0500, Ben Gamari <bgamari.foss@gmail.com> wrote:
> Here is my mail sorting script that has been slowly evolving for almost
> a year now. It uses the Python bindings, along with Bogofilter for spam
> filtering. There is also an update-spam script which brings the
> Bogofilter database in to synchronization with the notmuch tags. On this
> note, if someone wants to implement the ability to hide certain tags
> (say, those matching /\..+/) in the emacs interface it would be greatly
> appreciated. I have notmuch configured such that all new mail starts
> with just the "new" tag. The sorting script then takes it from
> there. Hope this will give folks some ideas.

Hi Ben,

After a few previous tries, I finally came up with a notmuch setup that
works for me, and it's largely thanks to your example, which I've
modified slightly (of course ;) to suit my own purposes. The thing that
I really liked was the 'watch' tag for mailing list messages, but after
initial experimentation with it, I still felt paranoid about missing
stuff that I cared about, since anything that does not go through my
inbox is unlikely to ever show up on my radar.

The solution I chose was to have an 'unwatch' tag (which could do with a
better name, maybe 'ignore', or 'skip-inbox') that I apply to threads
that I have no interest in. I also decided that (for me) this was not
really tied to mailing list messages. My change made the script quite a
bit simpler, (it gets rid of the unseen tag for one thing,) though at
the cost of some of the original functionality, I realize. So, while I
doubt this will be massively useful to you, or other people with a lot
of list traffic with a very high noise-to-signal ratio, it may be of
interest to others who have use cases more like my own.

I put my copies under version control here:

http://bazaar.launchpad.net/~thisfred/+junk/scripts/view/head:/sort_mail.py
http://bazaar.launchpad.net/~thisfred/+junk/scripts/view/head:/update_junk.py

(Though I'm not sure I changed anything substantial in the second one.)

Also I made some simple keybindings to mark stuff as junk/unwatch in my
.emacs:

http://bazaar.launchpad.net/~thisfred/+junk/scripts/view/head:/.emacs#L552

Hope someone finds this of use,

--
eric casteleijn
https://launchpad.net/~thisfred

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-05-20 22:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-23 14:22 My mail configuration Ben Gamari
2011-03-06 21:34 ` Michal Sojka
2011-03-11 14:55   ` Ben Gamari
2011-03-18  7:31 ` Jesse Rosenthal
2011-03-18 12:32   ` Ben Gamari
2011-05-20 22:39 ` eric casteleijn

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).