From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 09C44431FBD for ; Wed, 23 Apr 2014 13:59:37 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SXgPjVp1H138 for ; Wed, 23 Apr 2014 13:59:29 -0700 (PDT) Received: from dmz-mailsec-scanner-5.mit.edu (dmz-mailsec-scanner-5.mit.edu [18.7.68.34]) by olra.theworths.org (Postfix) with ESMTP id 4AFA9431FAE for ; Wed, 23 Apr 2014 13:59:29 -0700 (PDT) X-AuditID: 12074422-f79186d00000135a-cf-535829b022e6 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id 39.09.04954.0B928535; Wed, 23 Apr 2014 16:59:28 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id s3NKxQ0j020016; Wed, 23 Apr 2014 16:59:27 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s3NKxMNb008365 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 23 Apr 2014 16:59:24 -0400 Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.80) (envelope-from ) id 1Wd4GI-0004jG-1m; Wed, 23 Apr 2014 16:59:22 -0400 Date: Wed, 23 Apr 2014 16:59:20 -0400 From: Austin Clements To: David Mazieres expires 2014-07-22 PDT Subject: Re: [PATCH] Add configurable changed tag to messages that have been changed on disk Message-ID: <20140423205920.GM25817@mit.edu> References: <1396800683-9164-1-git-send-email-eg@gaute.vetsj.com> <87wqf2gqig.fsf@ta.scs.stanford.edu> <1397140962-sup-6514@qwerzila> <87wqexnqvb.fsf@ta.scs.stanford.edu> <1397163239-sup-5101@qwerzila> <87d2g9ja0h.fsf@maritornes.cs.unb.ca> <1398237865-sup-624@qwerzila> <87ioq0l8th.fsf@ta.scs.stanford.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ioq0l8th.fsf@ta.scs.stanford.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprOKsWRmVeSWpSXmKPExsUixG6nortBMyLY4OFaFYsbrd2MFk2fL7Fa HJ/+hc3i+s2ZzA4sHj/+NbN5PFt1i9nj0t9tTB5bDr1nDmCJ4rJJSc3JLEst0rdL4Mo4OMWq oEe2YsbtE6wNjNvFuxg5OSQETCSaFs1kg7DFJC7cWw9kc3EICcxmklh7dw8jhLORUWLhyVfs EM5pJol36/exQDhLGCV+Hf7KAtLPIqAq8XPZT7BZbAIaEtv2L2cEsUUEiiSur/wPFmcGsk/v 3A1WLywQJzHt8g4mEJtXQEfiQvdEZhBbSOAAk8TOP3wQcUGJkzOfsED0aknc+PcSqJ4DyJaW WP6PAyTMKWAosafrPTuILSqgIjHl5Da2CYxCs5B0z0LSPQuhewEj8ypG2ZTcKt3cxMyc4tRk 3eLkxLy81CJdU73czBK91JTSTYzg8HdR2sH486DSIUYBDkYlHt4DF8KDhVgTy4orcw8xSnIw KYnyqilGBAvxJeWnVGYkFmfEF5XmpBYfYpTgYFYS4c3SAMrxpiRWVqUW5cOkpDlYlMR531pb BQsJpCeWpGanphakFsFkZTg4lCR4z4E0ChalpqdWpGXmlCCkmTg4QYbzAA2fBja8uCAxtzgz HSJ/ilFRSpx3mzpQQgAkkVGaB9cLS0+vGMWBXhHmLQdp5wGmNrjuV0CDmYAGF0wIBxlckoiQ kmpgnF/YalHz60r/hKxd/IFSQdeuPDnm+K98SrmW9aPN0lGuSqyp1X4ZR/lOGD7dOeFfwcvV B6/qhuct3uK9TenwhMjFBTPbr0v8qV/t1ZbWdFpY/zTbvbevH3FLvWf+LLlc7vvUf+vL7S/M W3Xhcr7blGX3D9UrVyXov3+jaiW1/na0B+ORzoWs3UosxRmJhlrMRcWJAJ0nCSsqAwAA Cc: notmuch X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Apr 2014 20:59:37 -0000 Hi Dave! Quoth David Mazieres on Apr 23 at 2:00 am: > Gaute Hope writes: > > > A db-tick or a _good_ ctime solution can as far as I can see solve both > > David M's (correct me if I am wrong) and my purposes, as well as > > probably have more use cases in the future. It would even be an > > interesting direct search: show me everything that changed lately, > > sorted. > > I could live with a db-tick scheme. I would prefer a ctime scheme, > since then I can answer questions such as "what has changed in the last > five minutes"? I mean all kinds of other stuff starts to break if your > clock goes backwards on a mail server machine, not the least of which is > that incremental backups will fail silently, so you risk losing your > mail. > > A middle ground might be to use the maximum of two values: 1) the > time-of-day at which notmuch started executing, and 2) the highest ctime > in the database plus 100 microseconds (leaving plenty of slop to store > timestamps as IEEE doubles with 52 significant bits). Since the values > will be Btree-indexed, computing the max plus one will be cheap. This makes me curious if you've considered how to fit this in to Xapian. The Xapian query syntax supports range queries over document "values", but within the Xapian B-tree, values are stored in docid order, not value order, so Xapian's range query operator is actually a full scan in implementation. I assume it does this so it doesn't have to store both forward and inverse indexes of values. (I spent some time figuring out the layout of the Xapian database and have fairly detailed notes if anyone's curious.) This is still reasonably fast in practice because it's a sequential scan and only requires a few bytes per message, but it's probably not what you'd expect. That said, Xapian does track per-value statistics that would suffice for the particular problem of monotonic time stamps (e.g., Database::get_value_upper_bound). In principle it would be possible to use user metadata or even document terms to support true B-tree range scans by ctime order, but I don't think it's possible to express queries over this using Xapian's query parser. I've written about 90% of a (new) custom query parser for Notmuch that would enable this, but little things like my looming thesis deadline have interfered with me finishing it. > Incidentally, if you are really this paranoid about time stamps, it > should bother you that notmuch's directory timestamps only have one > second granularity. It's not that hard to get a new message delivered > in the same second that notmuch new finished running. In my > synchronizer, I convert st_mtim (a struct timespec) into a double and > keep that plus size in the database to decide if I need to re-hash > files. But for directories, I'm stuck with NOTMUCH_VALUE_TIMESTAMP, > which are quantized to the second. (Ironically, I think > Xapian::sortable_serialize converts time_ts to doubles anyway, so > avoiding st_mtim is not really helping performance.) This is historical (and, I agree, unfortunate). But nobody's complained, so it hasn't been worth changing the libnotmuch interface to support sub-second directory mtimes. However, notmuch new does correctly handle deliveries in the same second it runs. If the wall-clock time when it starts is the same as the on-disk directory mtime, it skips updating the in-database directory mtime at the end. Hence, on the next run, it will still consider the directory out-of-date. It's a bit of a hack, but it's a hack that would be necessary for supporting older file systems even if we did support sub-second timestamps.