unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* How does notmuch detect the presence of attachments?
@ 2011-08-03 10:07 moabi2000
  2011-08-25 14:21 ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 6+ messages in thread
From: moabi2000 @ 2011-08-03 10:07 UTC (permalink / raw)
  To: notmuch

Hi,

Two questions:

1) How does  notmuch detect the presence of attachments? I have some
messages that have attachments (which I can see and open when reading
the message), but for which the 'attachment' flag is not set (and
therefore don't show up in a search like "from:myfriend AND
attachment:pdf"). How can I try to work out what is going on?

2) Is there an option for notmuch to also index the text of
attachments (like recoll does, which also uses xapian)? People tend to
save attachments with really useless filenames (report2.pdf...), what
I'd like to be able to do is a search like "from:mycolleague AND
attachment:pdf AND attachmentcontains:ourproject"

Thanks for your help. Notmuch is already very useful!

M.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does notmuch detect the presence of attachments?
  2011-08-03 10:07 How does notmuch detect the presence of attachments? moabi2000
@ 2011-08-25 14:21 ` Daniel Kahn Gillmor
  2011-08-30  6:22   ` Jason Woofenden
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Kahn Gillmor @ 2011-08-25 14:21 UTC (permalink / raw)
  To: moabi2000, notmuch

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

On 08/03/2011 06:01 AM, moabi2000 wrote:
> 1) How does  notmuch detect the presence of attachments? I have some
> messages that have attachments (which I can see and open when reading
> the message), but for which the 'attachment' flag is not set (and
> therefore don't show up in a search like "from:myfriend AND
> attachment:pdf"). How can I try to work out what is going on?

According to lib/index.cc (around line 366 in the current version), the
tag "attachment" is added to an e-mail only if one of the MIME parts of
the message has an explicit "Content-Disposition: attachment" MIME
subheader.

So some mail clients may be attaching files with "Content-Disposition:
inline" (i do this sometimes when attaching text/* files) or without a
Content-Disposition: header on the MIME part at all.

Perhaps notmuch could keep a (configurable?) list of Content-Types that
should be tagged with "attachment" no matter what Content-Disposition is
used?  I could imagine an initial list like:

 application/pdf
 application/vnd.oasis.opendocument.text
 application/vnd.oasis.opendocument.spreadsheet

Or maybe just any mime part with "application" as the major Content
type?  That would be a relatively easy (though non-general) heuristic to
implement.  Want to take a crack at it?

> 2) Is there an option for notmuch to also index the text of
> attachments (like recoll does, which also uses xapian)? People tend to
> save attachments with really useless filenames (report2.pdf...), what
> I'd like to be able to do is a search like "from:mycolleague AND
> attachment:pdf AND attachmentcontains:ourproject"

This is another great suggestion for improvement, i think.  There are
even comments in the code (around the same part referenced above) that says:

	/* XXX: Would be nice to call out to something here to parse
	 * the attachment into text and then index that. */

A generic shim here, with a configurable index that associates
Content-Types with safe convert-to-text functions would be quite nice.

This would probably be a new section in ~/.notmuch-config,
[textconverters], where the keys would be a specific Content-Type and
the values would be system calls that take the file on stdin and produce
plain text to index on stdout, like so:

 [textconverters]
 application/pdf=pdf2txt /dev/stdin

Starting with an initially empty set of textconverters seems reasonable
and safe to me, and people could set up their own if they're interested.

You'd need to re-index your message store after modifying the config,
though, if you wanted to have pre-existing messages get indexed this
way.  Is there a way to tell notmuch to re-index a particular message?

The above proposal isn't implemented at all, i'm just throwing it out
for consideration.

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does notmuch detect the presence of attachments?
  2011-08-25 14:21 ` Daniel Kahn Gillmor
@ 2011-08-30  6:22   ` Jason Woofenden
  2011-08-30 21:51     ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Woofenden @ 2011-08-30  6:22 UTC (permalink / raw)
  To: notmuch

> [...]
> 
>  [textconverters]
>  application/pdf=pdf2txt /dev/stdin

Sounds awesome. I'd love the feature, and this sounds like a good
way to do it. Or maybe we should use a mailcap file like mutt
does... it has some useful features like nametemplate and maybe
test.


> Starting with an initially empty set of textconverters seems reasonable
> and safe to me, and people could set up their own if they're interested.

I also think starting empty is the right thing to do. We don't want
to be the bad guys if there's a security exploit in one of the
converter programs.


I as a user can decide that I'd like to run `abiword -t txt` on
application/msword and application/rtf mime parts. If there's a
security issue with abiword that someone can exploit by sending me
an e-mail, then FML, but at least I won't be mad at the notmuch
developers.


     - Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does notmuch detect the presence of attachments?
  2011-08-30  6:22   ` Jason Woofenden
@ 2011-08-30 21:51     ` Daniel Kahn Gillmor
  2011-08-30 22:03       ` Jameson Graef Rollins
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Kahn Gillmor @ 2011-08-30 21:51 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On 08/30/2011 02:22 AM, Jason Woofenden wrote:
>> [...]
>>
>>  [textconverters]
>>  application/pdf=pdf2txt /dev/stdin
> 
> Sounds awesome. I'd love the feature, and this sounds like a good
> way to do it. Or maybe we should use a mailcap file like mutt
> does... it has some useful features like nametemplate and maybe
> test.

hm, interesting suggestion.   I don't know enough about mailcap to know
whether it makes more sense to adopt it directly or to use a
notmuch-specific configuration.

One difference: mailcap seems to be about displaying/editing data to the
user (including, for example, opening a graphical window to display a
JPEG), whereas we need to set up a mechanism to convert whatever kind of
document we get into plain text to feed it into xapian.  So we couldn't
fully piggy-back on the mailcap infrastructure, if i'm reading the
mailcap documentation correctly.  notmuch would need to use its own
mime-types file.

Anyone with more experience with this stuff (or stronger opinions) have
any insight on what approach makes more sense?

> I as a user can decide that I'd like to run `abiword -t txt` on
> application/msword and application/rtf mime parts. If there's a
> security issue with abiword that someone can exploit by sending me
> an e-mail, then FML, but at least I won't be mad at the notmuch
> developers.

exactly :)

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does notmuch detect the presence of attachments?
  2011-08-30 21:51     ` Daniel Kahn Gillmor
@ 2011-08-30 22:03       ` Jameson Graef Rollins
  2011-08-30 22:24         ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 6+ messages in thread
From: Jameson Graef Rollins @ 2011-08-30 22:03 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, notmuch

[-- Attachment #1: Type: text/plain, Size: 759 bytes --]

On Tue, 30 Aug 2011 17:51:57 -0400, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> One difference: mailcap seems to be about displaying/editing data to the
> user (including, for example, opening a graphical window to display a
> JPEG), whereas we need to set up a mechanism to convert whatever kind of
> document we get into plain text to feed it into xapian.  So we couldn't
> fully piggy-back on the mailcap infrastructure, if i'm reading the
> mailcap documentation correctly.  notmuch would need to use its own
> mime-types file.

I think you're right that the mailcap is for instructing mail programs
how to display non-text parts to the user.  See mailcap(5) man page (at
least in Debian).

And we need something like:

jpg2thousandwords

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How does notmuch detect the presence of attachments?
  2011-08-30 22:03       ` Jameson Graef Rollins
@ 2011-08-30 22:24         ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Kahn Gillmor @ 2011-08-30 22:24 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 827 bytes --]

On 08/30/2011 06:03 PM, Jameson Graef Rollins wrote:
> And we need something like:
> 
> jpg2thousandwords

heh.

Yes, i suspect there will be some mime types we will be unable to index
in detail with xapian in the foreseeable future.  Even without that,
though, there's the possibility to store textual comments in a jpeg; a
user of notmuch could choose to feed those comments to the xapian indexer.

For example:

0 dkg@pip:~/src/fence$ exiv2 -p c in-play.jpg
a 2-player game of FENCE!
0 dkg@pip:~/src/fence$

I don't know if this would be useful in the real world, though, since
most people probably don't put comment data in their jpegs.  You can
easily stuff a thousand words into a jpeg though:

 exiv2 modify -c "$(head -n 1000 < /usr/share/dict/words)" test.jpg

proverbially yours,

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1030 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-08-30 22:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-03 10:07 How does notmuch detect the presence of attachments? moabi2000
2011-08-25 14:21 ` Daniel Kahn Gillmor
2011-08-30  6:22   ` Jason Woofenden
2011-08-30 21:51     ` Daniel Kahn Gillmor
2011-08-30 22:03       ` Jameson Graef Rollins
2011-08-30 22:24         ` Daniel Kahn Gillmor

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).