From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id AEB17431FBD; Thu, 3 Dec 2009 16:45:27 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bc83K61DuFi3; Thu, 3 Dec 2009 16:45:27 -0800 (PST) Received: from yoom.home.cworth.org (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 42DB5431FAE; Thu, 3 Dec 2009 16:45:24 -0800 (PST) Received: by yoom.home.cworth.org (Postfix, from userid 1000) id CCBA72542FB; Thu, 3 Dec 2009 16:45:22 -0800 (PST) From: Carl Worth To: Mikhail Gusarov , notmuch@notmuchmail.org In-Reply-To: <1259788526-14205-1-git-send-email-dottedmag@dottedmag.net> References: <1259267025-28733-1-git-send-email-dottedmag@dottedmag.net> <1259788526-14205-1-git-send-email-dottedmag@dottedmag.net> Date: Thu, 03 Dec 2009 16:45:22 -0800 Message-ID: <87zl5zfty5.fsf@yoom.home.cworth.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: Re: [PATCH (rebased)] Handle message renames in mail spool X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Dec 2009 00:45:27 -0000 --=-=-= On Thu, 3 Dec 2009 03:15:26 +0600, Mikhail Gusarov wrote: > In order to handle message renames the following changes were deemed > necessary: Hi Mikhail, Thanks for contributing this patch (twice!). I think if I had gotten to it sooner, I probably would have committed it. But now... > * Mtime check on individual files was disabled. As files may be moved around > without changing their mtime, it's necessary to parse them even if they appear > old in case old message was moved. mtime check on directories was kept as moving > files changes mtime of directory. That sounds pretty harsh. I'm having to do a lot of stat() calls already when new mail arrives. Having to also parse the message ID out of (roughly, for me) 10000 files every time sounds pretty rough. Fortunately... > Note that after applying this patch notmuch still does not handle copying files > (which is harmless, database will point to the last copy of message found during > 'notmuch new') and deleting files (which is more serious, as dangling entries > will show up in searches). Today, Keith and designed an interface that will support addition, copying, rename, and deletion of files. And it will be faster than the existing code with its mtime heuristics. The complete design is on Keith's laptop right now, and hopefully he'll appear soon with an implementation. Basically, there are only two new functions needed in the library (if we got the design right): notmuch_directory_t notmuch_database_read_directory (notmuch_database_t *database, const char *path); notmuch_status_t notmuch_message_remove_filename (notmuch_message_t *message, const char *filename); The notmuch_directory_t object will be used in place of the current notmuch_database_get_timestamp call in notmuch-new.c. In addition to the mtime that we currently read from the database, it will provide a list of all directories and files (along with message IDs) known to the database for a particular path. So notmuch-new can then quickly compare the results of scandir with this notmuch_directory_t object and then call notmuch_database_add_message and notmuch_message_remove_filename as appropriate. I'm leaving out details about how to ensure we don't delete a message too soon if it's actually a rename that will be seen as an added file later in the scan. Obviously the implementation will need to deal with that, (either with an additional library call for "I'm done adding files, go ahead and delete dangling messages", or by postponing all calls to remove_filename until later). Oh, and one idea is to do deletion by dropping all indexed terms, but saving the message ID and any tags in the database. That's small and is the only precious data, so might be worth holding onto "just in case". Anyway, I think we'll see code for that soon, so I'm not planning to commit the offered patch. But people really needing renames might want to use it for now, (and live with any performance implications it causes). -Carl --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iD8DBQFLGFui6JDdNq8qSWgRArJgAJ9/r/qgcmTOXv9DZAu1y0uTAJhoigCgpmnN /uZ6RHSHFN2Ou8YPb4XSZ/I= =tSvx -----END PGP SIGNATURE----- --=-=-=--