unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* RFC: Multiple filenames for email messages
@ 2009-11-21 22:37 Jan Janak
  2009-11-22  4:21 ` Carl Worth
  0 siblings, 1 reply; 2+ messages in thread
From: Jan Janak @ 2009-11-21 22:37 UTC (permalink / raw)
  To: Not Much Mail

The comment of _notmuch_message_set_filename says:

   XXX: We should still figure out if we think it's important to store
   multiple filenames for email messages with identical message IDs.

I have lots of such messages in my email collection, both in my local
copy of my Gmail account and also in the local copy of my company's
IMAP account.

My dream mail indexing tool should be able to apply tags automatically
based on, among other things, the name of the directory the message is
stored in. If there are multiple copies of the same message scattered
across multiple directories, I would like to apply more tags.

I assume that most tags will be applied (either manually or
automatically) after 'notmuch-new', I currently do some of it with a
simple shell script. The script does not apply tags based on directory
names yet, but it would make notmuch really flexible if we could do
that *and* if we could get access to all filenames of a particular
message.

I'd like to propose that we store all filenames for email messages in
the database, not just one per message. I'd be happy to work on it and
submit a patch if others think that this would be good to have.

  -- Jan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RFC: Multiple filenames for email messages
  2009-11-21 22:37 RFC: Multiple filenames for email messages Jan Janak
@ 2009-11-22  4:21 ` Carl Worth
  0 siblings, 0 replies; 2+ messages in thread
From: Carl Worth @ 2009-11-22  4:21 UTC (permalink / raw)
  To: Jan Janak, Not Much Mail

On Sat, 21 Nov 2009 23:37:24 +0100, Jan Janak <jan@ryngle.com> wrote:
> The comment of _notmuch_message_set_filename says:
> 
>    XXX: We should still figure out if we think it's important to store
>    multiple filenames for email messages with identical message IDs.
...
> I'd like to propose that we store all filenames for email messages in
> the database, not just one per message. I'd be happy to work on it and
> submit a patch if others think that this would be good to have.

Oh, sure. As soon as we start using filenames for searches, then that
makes a lot of sense.

Currently, notmuch isn't storing any filename that way, but should be,
(need to just add a prefix to the table at the top of lib/database.cc,
document it, and then make the indexing stage generate terms from the
filename with that prefix).

The term generator and query parser should do the right thing, which is
to split the filename into individual terms at each '/', store position
data with each, and then turn a search like:

	filename:some/filename/segment

into a phrase search that looks for the terms "some", "filename", and
"segment", each with the filename prefix you choose and each in
sequential position. Note that if you compile notmuch with CFLAGS
including -DDEBUG then you'll see a nice report of the post-parsed query
that's useful for debugging stuff like this.

The reason for my comment was related to the other use of the filename,
(that is, the only one we're currently using). This is with regard to
querying the database for the actual filename, rather than searching on
it. For this, we don't use terms, but instead use the "data" field of
the document. I was wondering if in the presentation of an email message
it would ever be important to have access to the multiple files.

Can anyone think of a case where they would need that? That is, a case
where you care about the distinct content of two messages that have the
same message ID?

I suppose that in the case of getting a message by two paths, (say
through a mailing list and also via CC), one might want to inspect the
different headers in the two versions. So maybe we'll need to break down
and provide this information to the interfaces.

Also, if we're going to support file deletion well, then I suppose we
really will need to store all the filenames, (so if one disappears we
can still point to the others). Also, we'll need to be able to
accurately update the filename terms when a message disappears, so that
means having all of the complete filenames around.

So I guess I'm convincing myself that we really should store all the
filenames, and also provide an interface to get a list of filenames for
a message, (but also expect that many users of the API will only want to
look at the first filename in the list).

-Carl

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-11-22  4:22 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-21 22:37 RFC: Multiple filenames for email messages Jan Janak
2009-11-22  4:21 ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).