unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Carl Worth <cworth@cworth.org>
To: Mikhail Gusarov <dottedmag@dottedmag.net>, notmuch@notmuchmail.org
Subject: Re: [PATCH (rebased)] Handle message renames in mail spool
Date: Thu, 03 Dec 2009 16:45:22 -0800	[thread overview]
Message-ID: <87zl5zfty5.fsf@yoom.home.cworth.org> (raw)
In-Reply-To: <1259788526-14205-1-git-send-email-dottedmag@dottedmag.net>

[-- Attachment #1: Type: text/plain, Size: 3084 bytes --]

On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov <dottedmag@dottedmag.net> wrote:
> In order to handle message renames the following changes were deemed
> necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

> * Mtime check on individual files was disabled. As files may be moved around
> without changing their mtime, it's necessary to parse them even if they appear
> old in case old message was moved. mtime check on directories was kept as moving
> files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 10000 files every time sounds pretty rough. Fortunately...

> Note that after applying this patch notmuch still does not handle copying files
> (which is harmless, database will point to the last copy of message found during
> 'notmuch new') and deleting files (which is more serious, as dangling entries
> will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

	notmuch_directory_t
	notmuch_database_read_directory (notmuch_database_t *database,
	                                 const char *path);

	notmuch_status_t
	notmuch_message_remove_filename (notmuch_message_t *message,
                                         const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for "I'm done adding
files, go ahead and delete dangling messages", or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto "just in case".

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

  reply	other threads:[~2009-12-04  0:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-26 20:23 [PATCH] Handle message renames in mail spool Mikhail Gusarov
2009-12-02 21:15 ` [PATCH (rebased)] " Mikhail Gusarov
2009-12-04  0:45   ` Carl Worth [this message]
2009-12-04 13:55     ` david
2009-12-04 18:05       ` Carl Worth
2009-12-04 18:07         ` Mikhail Gusarov
2009-12-04 18:35           ` Carl Worth
2009-12-04 18:42             ` Mikhail Gusarov
2009-12-04 19:09             ` Michael Alan Dorman
2009-12-05  0:32               ` Carl Worth
2009-12-05  0:39               ` Carl Worth
2009-12-05  0:47                 ` Mikhail Gusarov
2009-12-05  8:51                 ` Marten Veldthuis
2009-12-05 22:48                   ` Carl Worth
2009-12-16 23:51             ` Bdale Garbee
2009-12-17  0:01               ` Mikhail Gusarov
2009-12-17  2:57                 ` Bdale Garbee
2009-12-17  1:36               ` Michael Alan Dorman
2009-12-04 18:52           ` Michael Alan Dorman
2009-12-04 18:55             ` Mikhail Gusarov
2009-12-04 19:40               ` Michael Alan Dorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zl5zfty5.fsf@yoom.home.cworth.org \
    --to=cworth@cworth.org \
    --cc=dottedmag@dottedmag.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).