From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 304B5431FBC for ; Sun, 6 Apr 2014 13:19:30 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -2.3 X-Spam-Level: X-Spam-Status: No, score=-2.3 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id s62A4tGVgLJO for ; Sun, 6 Apr 2014 13:19:24 -0700 (PDT) Received: from market.scs.stanford.edu (market.scs.stanford.edu [171.66.3.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 26004431FB6 for ; Sun, 6 Apr 2014 13:19:24 -0700 (PDT) Received: from market.scs.stanford.edu (localhost.scs.stanford.edu [127.0.0.1]) by market.scs.stanford.edu (8.14.7/8.14.7) with ESMTP id s36KJLvB015571; Sun, 6 Apr 2014 13:19:21 -0700 (PDT) Received: (from dm@localhost) by market.scs.stanford.edu (8.14.7/8.14.7/Submit) id s36KJKZv025070; Sun, 6 Apr 2014 13:19:20 -0700 (PDT) X-Authentication-Warning: market.scs.stanford.edu: dm set sender to return-qins2mg6jwx9mkqf94wrbsevww@ta.scs.stanford.edu using -f From: David Mazieres To: Gaute Hope , notmuch@notmuchmail.org Subject: Re: [PATCH] Add configurable changed tag to messages that have been changed on disk In-Reply-To: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> References: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> Date: Sun, 06 Apr 2014 22:19:19 +0200 Message-ID: <87wqf2gqig.fsf@ta.scs.stanford.edu> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: David Mazieres expires 2014-07-05 CEST List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Apr 2014 20:19:30 -0000 Gaute Hope writes: > When one of the source files for a message is changed on disk, renamed, > deleted or a new source file is added. A configurable changed tag is > is added. The tag can be configured under the option 'changed_tags' in > the [new] section, the default is none. Tests have been updated to > accept the new config option. > > notmuch-setup now asks for a changed tag after the new tags question. > > This could be useful for for example 'afew' to detect remote changes in > IMAP folders and update the FolderNameFilter to also add tags or remove > tags when a _existing_ message has been added to or removed from a > maildir. I think this is the wrong way to achieve such functionality, because then the change tag A) is expensive to remove, B) is easy to misuse (remember to call fsync everywhere before deleting the change tag), and C) can be used by only one application. A better approach would be to add a new "modtime" xapian value that is updated whenever the tags or any other terms (such as XFDIRENTRY) are added to or deleted from a docid. If it's a Xapian value, rather than a term, then modtime will be queriable just like date, allowing multiple applications to query all docids modified since the last time they ran. I currently have multiple applications that could significantly benefit from such a modtime. An obvious one is proper incremental backups with notmuch-dump. Another example is a tool I have that synchromizes maildirs and notmuch tags across machines. With the current interface, there is no way to do this without scanning the entire database, because any message, even a very old one, may have changed tags or links. Moreover, something like notmuch-dump is way, way too slow to run every time you want to check for new mail. notmuch-dump costs 5-10 seconds on my 110,000-message maildir! In fact, any approach the gathers tags associated with each individual docid is a complete non-starter, forcing me to violate abstraction and examine the postlists associated with each tag and XFDIRENTRY term. Even my highly optimized implementation takes about 250 msec (1400 msec on a 32-bit machine), which adds perceptible latency to synchronizing my clients' notmuch maildirs with my server's when I poll for new mail. Yet another application is something like nottoomuch-addresses, which currently uses an occasionally incorrect heuristic to detect new messages based on the Date header. Let me make a stronger statement, which is that not only are modification times an incredibly useful and general primitive, but lack of modification times is the single thing that kept me away from notmuch despite years of wanting to switch. In the end, I invested months developing a highly-optimized change detector that efficiently diffs Xapian's Btrees against a mysql database with a snapshot of the same information. My solution works, and I now enjoy a replicated notmuch setup synchronized across three machines, including offline access on my laptop. But my 4,000-line C++ program might have been a 400-line shell script if only notmuch supported docid mod times. Also, to put this in perspective, how long does it take to remove the changed tags from a bunch of messages? If it's longer than 300 msec on a 64-bit machine, then even with a single application you'd be better off using my crazy on-the-side mysql version vector scheme. David