On 08/03/2011 06:01 AM, moabi2000 wrote: > 1) How does notmuch detect the presence of attachments? I have some > messages that have attachments (which I can see and open when reading > the message), but for which the 'attachment' flag is not set (and > therefore don't show up in a search like "from:myfriend AND > attachment:pdf"). How can I try to work out what is going on? According to lib/index.cc (around line 366 in the current version), the tag "attachment" is added to an e-mail only if one of the MIME parts of the message has an explicit "Content-Disposition: attachment" MIME subheader. So some mail clients may be attaching files with "Content-Disposition: inline" (i do this sometimes when attaching text/* files) or without a Content-Disposition: header on the MIME part at all. Perhaps notmuch could keep a (configurable?) list of Content-Types that should be tagged with "attachment" no matter what Content-Disposition is used? I could imagine an initial list like: application/pdf application/vnd.oasis.opendocument.text application/vnd.oasis.opendocument.spreadsheet Or maybe just any mime part with "application" as the major Content type? That would be a relatively easy (though non-general) heuristic to implement. Want to take a crack at it? > 2) Is there an option for notmuch to also index the text of > attachments (like recoll does, which also uses xapian)? People tend to > save attachments with really useless filenames (report2.pdf...), what > I'd like to be able to do is a search like "from:mycolleague AND > attachment:pdf AND attachmentcontains:ourproject" This is another great suggestion for improvement, i think. There are even comments in the code (around the same part referenced above) that says: /* XXX: Would be nice to call out to something here to parse * the attachment into text and then index that. */ A generic shim here, with a configurable index that associates Content-Types with safe convert-to-text functions would be quite nice. This would probably be a new section in ~/.notmuch-config, [textconverters], where the keys would be a specific Content-Type and the values would be system calls that take the file on stdin and produce plain text to index on stdout, like so: [textconverters] application/pdf=pdf2txt /dev/stdin Starting with an initially empty set of textconverters seems reasonable and safe to me, and people could set up their own if they're interested. You'd need to re-index your message store after modifying the config, though, if you wanted to have pre-existing messages get indexed this way. Is there a way to tell notmuch to re-index a particular message? The above proposal isn't implemented at all, i'm just throwing it out for consideration. --dkg