unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Carl Worth <cworth@cworth.org>
To: Jan Janak <jan@ryngle.com>, Not Much Mail <notmuch@notmuchmail.org>
Subject: Re: RFC: Multiple filenames for email messages
Date: Sun, 22 Nov 2009 05:21:52 +0100	[thread overview]
Message-ID: <878wdz2nq7.fsf@yoom.home.cworth.org> (raw)
In-Reply-To: <f35dbb950911211437q34923ee8w14b1ef65a204b09f@mail.gmail.com>

On Sat, 21 Nov 2009 23:37:24 +0100, Jan Janak <jan@ryngle.com> wrote:
> The comment of _notmuch_message_set_filename says:
> 
>    XXX: We should still figure out if we think it's important to store
>    multiple filenames for email messages with identical message IDs.
...
> I'd like to propose that we store all filenames for email messages in
> the database, not just one per message. I'd be happy to work on it and
> submit a patch if others think that this would be good to have.

Oh, sure. As soon as we start using filenames for searches, then that
makes a lot of sense.

Currently, notmuch isn't storing any filename that way, but should be,
(need to just add a prefix to the table at the top of lib/database.cc,
document it, and then make the indexing stage generate terms from the
filename with that prefix).

The term generator and query parser should do the right thing, which is
to split the filename into individual terms at each '/', store position
data with each, and then turn a search like:

	filename:some/filename/segment

into a phrase search that looks for the terms "some", "filename", and
"segment", each with the filename prefix you choose and each in
sequential position. Note that if you compile notmuch with CFLAGS
including -DDEBUG then you'll see a nice report of the post-parsed query
that's useful for debugging stuff like this.

The reason for my comment was related to the other use of the filename,
(that is, the only one we're currently using). This is with regard to
querying the database for the actual filename, rather than searching on
it. For this, we don't use terms, but instead use the "data" field of
the document. I was wondering if in the presentation of an email message
it would ever be important to have access to the multiple files.

Can anyone think of a case where they would need that? That is, a case
where you care about the distinct content of two messages that have the
same message ID?

I suppose that in the case of getting a message by two paths, (say
through a mailing list and also via CC), one might want to inspect the
different headers in the two versions. So maybe we'll need to break down
and provide this information to the interfaces.

Also, if we're going to support file deletion well, then I suppose we
really will need to store all the filenames, (so if one disappears we
can still point to the others). Also, we'll need to be able to
accurately update the filename terms when a message disappears, so that
means having all of the complete filenames around.

So I guess I'm convincing myself that we really should store all the
filenames, and also provide an interface to get a list of filenames for
a message, (but also expect that many users of the API will only want to
look at the first filename in the list).

-Carl

      reply	other threads:[~2009-11-22  4:22 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-21 22:37 RFC: Multiple filenames for email messages Jan Janak
2009-11-22  4:21 ` Carl Worth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878wdz2nq7.fsf@yoom.home.cworth.org \
    --to=cworth@cworth.org \
    --cc=jan@ryngle.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).