unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* maildir synchronisation
@ 2010-07-30 15:06 Amit Kucheria
  2010-08-16 15:38 ` Integration with training-based bayesian filters Samium Gromoff
  0 siblings, 1 reply; 3+ messages in thread
From: Amit Kucheria @ 2010-07-30 15:06 UTC (permalink / raw)
  To: notmuch

Hi,

Is maildir synchronisation being looked into for the next release?

I've seen a patchset from Michal[1], but the discussion did not seem
to reach any conclusion.

That feature, along with support for tagging based on List-Id header
and other custom headers are features I miss most. Perhaps I can help
if someone is already working on these?

Regards,
Amit

[1] http://www.mail-archive.com/notmuch@notmuchmail.org/msg02279.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Integration with training-based bayesian filters 
  2010-07-30 15:06 maildir synchronisation Amit Kucheria
@ 2010-08-16 15:38 ` Samium Gromoff
  2010-08-17  7:15   ` Sebastian Spaeth
  0 siblings, 1 reply; 3+ messages in thread
From: Samium Gromoff @ 2010-08-16 15:38 UTC (permalink / raw)
  To: notmuch

Good day folks,

My "+notmuch AND train" query on the local notmuch list archive didn't
yield anything relevant, so I've got at least one excuse if the
question I'm going to pose was already answered to death here.

So, how is a notmuch user supposed to integrate a train-based message
classifier like crm114[1], which operates as follows:

  - the filter->you information flow is established by prepending
    either "ADV: " or "UNS: " strings to the message subject, denoting,
    correspondingly, either "spam" or "please tell me if this is spam"
    categories.  The non-spam messages, naturally, have their subject
    lines unmodified.

  - the you->filter information flow is established by taking the
    message file whose status you want to pin down (mostly those marked
    as UNS, because after a while crm144 gets really really good),
    and piping it to the classifier executable.

One thing is certain -- we're talking elisp territory here.

Another is certain, also -- such questions appear at some point, sooner
or later, in the life of every mail user agent.  Again, sorry if
I failed the due diligence part of prior art discovery.

Now to some answers (the unexpected part):

The first part is handled easily, well, by a composition of procmailing
the "ADV: "-prefixed messages out of one's sight, which becomes a
plausible strategy once the classifier becomes clueful enough, and
by adding a simple xapian "subject:" rule for "UNS: "-prefixed ones.

The second part can be solved either in a way pleasant to the user,
or easily.

The easy way is to expect the user enter the spam thread, which contains
exactly one message (never seen longer spam threads, still wondering
why...), and then press some key and confirm the destination, station
purple hell.  Then you exit the thread.  To enter another one...

So, after a couple of minutes of processing the backlog, it's becoming
painfully clear, that you don't want to spend more effort on these
one-message spam threads than pressing 's', and then confirming it with
'y', avoiding the painful, distracting and redrawing thread enter/exit
sequence.

Note, that this conveniently avoids the question of non-spam messages,
which actually often land within threads, but I'd like to keep this
aside, sorry for incomplete solutions.

So, the crux is, to pipe the file to the classifier you need the filename,
and the filename appears to be easily available only in the 'show' mode.

I've had to introduce some code to operate on single-message threads,
or actually, threads with all messages ignored, but the first one.

So, here goes, the solution modulo the conveniently avoided question
of non-spam messages:


(defun notmuch-pipe-file (filename command)
  (apply 'start-process-shell-command "notmuch-pipe-command" "*notmuch-pipe*"
	 (list command " < " (shell-quote-argument filename))))

(defun notmuch-query (query)
  (notmuch-query-get-threads (append (list "\'") query (list "\'"))))

(defun notmuch-result-firstmsg-property (result property)
  (plist-get (caaar result) property))

(defun notmuch-result-backend-remove-tags (result tags)
  (apply 'notmuch-call-notmuch-process
         (append (cons "tag" (mapcar (lambda (s) (concat "-" s)) tags))
                 (cons (concat "id:" (notmuch-result-firstmsg-property result :id)) nil))))

(defun notmuch-search-result-remove-tags (result tags)
  "Remove a tag from the current message.  RESULT is not updated."
  (let ((current-tags (notmuch-result-firstmsg-property result :tags)))
    (if (intersection current-tags tags :test 'string=)
        ;; new result tags are (sort (set-difference current-tags tags :test 'string=) 'string<)
        ;; however, it's unlikely we'll need them, so no need to update
	(notmuch-result-backend-remove-tags result tags))))

(defun notmuch-search-query-current-thread ()
  (notmuch-query (list (notmuch-search-find-thread-id))))

(defun notmuch-show-pipe-current-message (command)
  "Pipe the message currently pointed at within the show mode,
through COMMAND."
  (interactive "sPipe message to command: ")
  (notmuch-pipe-file (notmuch-show-get-filename) command))

(defun notmuch-search-pipe-current-message (command)
  "Pipe the first message of the thread currently pointed at within
the search mode, through COMMAND."
  (interactive "sPipe message to command: ")
  (let* ((result (notmuch-search-query-current-thread))
         (filename (notmuch-result-firstmsg-property result :filename)))
    (notmuch-pipe-file filename command)
    result))

(setq mark-as-good-command "~/bin/stdin-is-good"
      mark-as-spam-command "~/bin/stdin-is-spam"
      spam-tagdrop-list '("inbox" "unread" "sent" "train"))

(defun make-mark-as-good (piper)
  "Mark the message as good."
  (lexical-let ((piper piper))
    (lambda ()
      (interactive)
      (if (y-or-n-p "Mark as good? ")
          (progn
            (funcall piper mark-as-good-command)
            (forward-line 1))))))

(defun make-mark-as-spam (piper searchp)
  "Mark the message as spam."
  (lexical-let ((piper piper)
                (searchp searchp))
    (lambda ()
      (interactive)
      (if (y-or-n-p "Mark as spam? ")
          (let ((maybe-result (funcall piper mark-as-spam-command)))
            (if searchp
                (progn
                  (notmuch-search-result-remove-tags maybe-result spam-tagdrop-list)
                  (forward-line 1))
                (notmuch-show-mark-read)))))))

(define-key notmuch-show-mode-map "g"    (make-mark-as-good 'notmuch-show-pipe-current-message))
(define-key notmuch-show-mode-map "s"    (make-mark-as-spam 'notmuch-show-pipe-current-message nil))
(define-key notmuch-search-mode-map "g"  (make-mark-as-good 'notmuch-search-pipe-current-message))
(define-key notmuch-search-mode-map "s"  (make-mark-as-spam 'notmuch-search-pipe-current-message t))


I'll leave it to the more qualified people to decide which part (and in
which form) is supposed to go into notmuch, and which is destined to
live in the end-user's init file.


-- 
regards,
  Samium Gromoff
--
1. http://crm114.sourceforge.net/

--
"Actually I made up the term 'object-oriented', and I can tell you I
did not have C++ in mind." - Alan Kay (OOPSLA 1997 Keynote)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Integration with training-based bayesian filters
  2010-08-16 15:38 ` Integration with training-based bayesian filters Samium Gromoff
@ 2010-08-17  7:15   ` Sebastian Spaeth
  0 siblings, 0 replies; 3+ messages in thread
From: Sebastian Spaeth @ 2010-08-17  7:15 UTC (permalink / raw)
  To: notmuch

> The easy way is to expect the user enter the spam thread, which contains
> exactly one message (never seen longer spam threads, still wondering
> why...), and then press some key and confirm the destination, station
> purple hell.  Then you exit the thread.  To enter another one...

I just have a saved view where all possible junk mails are shown. Most
of the time they really all are spam as I can see from the subjects and
I simply press "*" and "+spam" to tag all those mails as spam.

A cron script occassionally passes all spam-tagged mails to a bayesian filter.

Something like this does not require me to fudge a lot with elisp (which
I am not good with).

> So, the crux is, to pipe the file to the classifier you need the filename,
> and the filename appears to be easily available only in the 'show' mode.

An within elisp solution would be elegant for sure. I've found that
tagging messages and then later processing those messages outside of
emacs works nicely for me. Here is the 5 line python script that shows
all filenames for messages matching a certain query:
http://notmuchmail.org/howto/

Sebastian

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-08-17  7:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-30 15:06 maildir synchronisation Amit Kucheria
2010-08-16 15:38 ` Integration with training-based bayesian filters Samium Gromoff
2010-08-17  7:15   ` Sebastian Spaeth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).