* [WIP PATCH 1/4] lib: Only sync modified message documents
2014-10-13 6:19 [WIP PATCH 0/4] Add message revision tracking Austin Clements
@ 2014-10-13 6:20 ` Austin Clements
2014-10-13 6:20 ` [WIP PATCH 2/4] lib: Add per-message last modification tracking Austin Clements
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Austin Clements @ 2014-10-13 6:20 UTC (permalink / raw)
To: notmuch
From: Austin Clements <amdragon@mit.edu>
Previously, we updated the database copy of a message on every call to
_notmuch_message_sync, even if nothing had changed. In particular,
this always happens on a thaw, so a freeze/thaw pair with no
modifications between still caused a database update.
We only modify message documents in a handful of places, so keep track
of whether the document has been modified and only sync it when
necessary. This will be particularly important when we add message
revision tracking.
---
lib/message.cc | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/lib/message.cc b/lib/message.cc
index a7a13cc..cf2fd7c 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -43,6 +43,9 @@ struct visible _notmuch_message {
* if each flag has been initialized. */
unsigned long lazy_flags;
+ /* Message document modified since last sync */
+ notmuch_bool_t modified;
+
Xapian::Document doc;
Xapian::termcount termpos;
};
@@ -538,6 +541,7 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
try {
message->doc.remove_term ((*i));
+ message->modified = TRUE;
} catch (const Xapian::InvalidArgumentError) {
/* Ignore failure to remove non-existent term. */
}
@@ -791,6 +795,7 @@ void
_notmuch_message_clear_data (notmuch_message_t *message)
{
message->doc.set_data ("");
+ message->modified = TRUE;
}
static void
@@ -988,6 +993,7 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
Xapian::sortable_serialise (time_value));
message->doc.add_value (NOTMUCH_VALUE_FROM, from);
message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
+ message->modified = TRUE;
}
/* Synchronize changes made to message->doc out into the database. */
@@ -999,8 +1005,12 @@ _notmuch_message_sync (notmuch_message_t *message)
if (message->notmuch->mode == NOTMUCH_DATABASE_MODE_READ_ONLY)
return;
+ if (! message->modified)
+ return;
+
db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
db->replace_document (message->doc_id, message->doc);
+ message->modified = FALSE;
}
/* Delete a message document from the database. */
@@ -1075,6 +1085,7 @@ _notmuch_message_add_term (notmuch_message_t *message,
return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
message->doc.add_term (term, 0);
+ message->modified = TRUE;
talloc_free (term);
@@ -1143,6 +1154,7 @@ _notmuch_message_remove_term (notmuch_message_t *message,
try {
message->doc.remove_term (term);
+ message->modified = TRUE;
} catch (const Xapian::InvalidArgumentError) {
/* We'll let the philosopher's try to wrestle with the
* question of whether failing to remove that which was not
--
2.1.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [WIP PATCH 2/4] lib: Add per-message last modification tracking
2014-10-13 6:19 [WIP PATCH 0/4] Add message revision tracking Austin Clements
2014-10-13 6:20 ` [WIP PATCH 1/4] lib: Only sync modified message documents Austin Clements
@ 2014-10-13 6:20 ` Austin Clements
2014-10-13 6:20 ` [WIP PATCH 3/4] lib: API to retrieve database revision and UUID Austin Clements
2014-10-13 6:20 ` [WIP PATCH 4/4] lib: Add "lastmod:" queries for filtering by last modification Austin Clements
3 siblings, 0 replies; 8+ messages in thread
From: Austin Clements @ 2014-10-13 6:20 UTC (permalink / raw)
To: notmuch
From: Austin Clements <amdragon@mit.edu>
This adds a new document value that stores the revision of the last
modification to message metadata, where the revision number increases
monotonically with each database commit.
An alternative would be to store the wall-clock time of the last
modification of each message. In principle this is simpler and has
the advantage that any process can determine the current timestamp
without support from libnotmuch. However, even assuming a computer's
clock never goes backward and ignoring clock skew in networked
environments, this has a fatal flaw. Xapian uses (optimistic)
snapshot isolation, which means reads can be concurrent with writes.
Given this, consider the following time line with a write and two read
transactions:
write |-X-A--------------|
read 1 |---B---|
read 2 |---|
The write transaction modifies message X and records the wall-clock
time of the modification at A. The writer hangs around for a while
and later commits its change. Read 1 is concurrent with the write, so
it doesn't see the change to X. It does some query and records the
wall-clock time of its results at B. Transaction read 2 later starts
after the write commits and queries for changes since wall-clock time
B (say the reads are performing an incremental backup). Even though
read 1 could not see the change to X, read 2 is told (correctly) that
X has not changed since B, the time of the last read. In fact, X
changed before wall-clock time A, but the change was not visible until
*after* wall-clock time B, so read 2 misses the change to X.
This is tricky to solve in full-blown snapshot isolation, but because
Xapian serializes writes, we can use a simple, monotonically
increasing database revision number. Furthermore, maintaining this
revision number requires no more IO than a wall-clock time solution
because Xapian already maintains statistics on the upper (and lower)
bound of each value stream.
---
lib/database-private.h | 15 ++++++++++++++-
lib/database.cc | 49 +++++++++++++++++++++++++++++++++++++++++++++++--
lib/message.cc | 22 ++++++++++++++++++++++
lib/notmuch-private.h | 10 +++++++++-
4 files changed, 92 insertions(+), 4 deletions(-)
diff --git a/lib/database-private.h b/lib/database-private.h
index 15e03cc..465065d 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -92,6 +92,12 @@ enum _notmuch_features {
*
* Introduced: version 3. */
NOTMUCH_FEATURE_GHOSTS = 1 << 4,
+
+ /* If set, messages store the revision number of the last
+ * modification in NOTMUCH_VALUE_LAST_MOD.
+ *
+ * Introduced: version 3. */
+ NOTMUCH_FEATURE_LAST_MOD = 1 << 5,
};
/* In C++, a named enum is its own type, so define bitwise operators
@@ -137,6 +143,8 @@ struct _notmuch_database {
notmuch_database_mode_t mode;
int atomic_nesting;
+ /* TRUE if changes have been made in this atomic section */
+ notmuch_bool_t atomic_dirty;
Xapian::Database *xapian_db;
/* Bit mask of features used by this database. This is a
@@ -145,6 +153,10 @@ struct _notmuch_database {
unsigned int last_doc_id;
uint64_t last_thread_id;
+ /* Highest committed revision number. Modifications are recorded
+ * under a higher revision number, which can be generated with
+ * notmuch_database_new_revision. */
+ unsigned long revision;
Xapian::QueryParser *query_parser;
Xapian::TermGenerator *term_gen;
@@ -166,7 +178,8 @@ struct _notmuch_database {
* databases will have it). */
#define NOTMUCH_FEATURES_CURRENT \
(NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
- NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS)
+ NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS | \
+ NOTMUCH_FEATURE_LAST_MOD)
/* Return the list of terms from the given iterator matching a prefix.
* The prefix will be stripped from the strings in the returned list.
diff --git a/lib/database.cc b/lib/database.cc
index 6e51a72..45d32ab 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -101,6 +101,9 @@ typedef struct {
*
* SUBJECT: The value of the "Subject" header
*
+ * LAST_MOD: The revision number as of the last tag or
+ * filename change.
+ *
* In addition, terms from the content of the message are added with
* "from", "to", "attachment", and "subject" prefixes for use by the
* user in searching. Similarly, terms from the path of the mail
@@ -304,6 +307,8 @@ static const struct {
"exact folder:/path: search", "rw" },
{ NOTMUCH_FEATURE_GHOSTS,
"mail documents for missing messages", "w"},
+ { NOTMUCH_FEATURE_LAST_MOD,
+ "modification tracking", "w"},
};
const char *
@@ -678,6 +683,23 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch)
return NOTMUCH_STATUS_SUCCESS;
}
+/* Allocate a revision number for the next change. */
+unsigned long
+_notmuch_database_new_revision (notmuch_database_t *notmuch)
+{
+ unsigned long new_revision = notmuch->revision + 1;
+
+ /* If we're in an atomic section, hold off on updating the
+ * committed revision number until we commit the atomic section.
+ */
+ if (notmuch->atomic_nesting)
+ notmuch->atomic_dirty = TRUE;
+ else
+ notmuch->revision = new_revision;
+
+ return new_revision;
+}
+
/* Parse a database features string from the given database version.
* Returns the feature bit set.
*
@@ -817,6 +839,7 @@ notmuch_database_open (const char *path,
notmuch->atomic_nesting = 0;
try {
string last_thread_id;
+ string last_mod;
if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE) {
notmuch->xapian_db = new Xapian::WritableDatabase (xapian_path,
@@ -875,6 +898,14 @@ notmuch_database_open (const char *path,
INTERNAL_ERROR ("Malformed database last_thread_id: %s", str);
}
+ /* Get current highest revision number. */
+ last_mod = notmuch->xapian_db->get_value_upper_bound (
+ NOTMUCH_VALUE_LAST_MOD);
+ if (last_mod.empty ())
+ notmuch->revision = 0;
+ else
+ notmuch->revision = Xapian::sortable_unserialise (last_mod);
+
notmuch->query_parser = new Xapian::QueryParser;
notmuch->term_gen = new Xapian::TermGenerator;
notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
@@ -1266,7 +1297,8 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
/* Figure out how much total work we need to do. */
if (new_features &
- (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_BOOL_FOLDER)) {
+ (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_BOOL_FOLDER |
+ NOTMUCH_FEATURE_LAST_MOD)) {
notmuch_query_t *query = notmuch_query_create (notmuch, "");
total += notmuch_query_count_messages (query);
notmuch_query_destroy (query);
@@ -1293,7 +1325,8 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
/* Perform per-message upgrades. */
if (new_features &
- (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_BOOL_FOLDER)) {
+ (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_BOOL_FOLDER |
+ NOTMUCH_FEATURE_LAST_MOD)) {
notmuch_query_t *query = notmuch_query_create (notmuch, "");
notmuch_messages_t *messages;
notmuch_message_t *message;
@@ -1330,6 +1363,13 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
if (new_features & NOTMUCH_FEATURE_BOOL_FOLDER)
_notmuch_message_upgrade_folder (message);
+ /* Prior to NOTMUCH_FEATURE_LAST_MOD, messages did not
+ * track modification revisions. Give all messages a
+ * revision of 1.
+ */
+ if (new_features & NOTMUCH_FEATURE_LAST_MOD)
+ _notmuch_message_upgrade_last_mod (message);
+
_notmuch_message_sync (message);
notmuch_message_destroy (message);
@@ -1512,6 +1552,11 @@ notmuch_database_end_atomic (notmuch_database_t *notmuch)
return NOTMUCH_STATUS_XAPIAN_EXCEPTION;
}
+ if (notmuch->atomic_dirty) {
+ ++notmuch->revision;
+ notmuch->atomic_dirty = FALSE;
+ }
+
DONE:
notmuch->atomic_nesting--;
return NOTMUCH_STATUS_SUCCESS;
diff --git a/lib/message.cc b/lib/message.cc
index cf2fd7c..767f0ab 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -996,6 +996,16 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
message->modified = TRUE;
}
+/* Upgrade a message to support NOTMUCH_FEATURE_LAST_MOD. The caller
+ * must call _notmuch_message_sync. */
+void
+_notmuch_message_upgrade_last_mod (notmuch_message_t *message)
+{
+ /* _notmuch_message_sync will update the last modification
+ * revision; we just have to ask it to. */
+ message->modified = TRUE;
+}
+
/* Synchronize changes made to message->doc out into the database. */
void
_notmuch_message_sync (notmuch_message_t *message)
@@ -1008,6 +1018,18 @@ _notmuch_message_sync (notmuch_message_t *message)
if (! message->modified)
return;
+ /* Update the last modification of this message. */
+ if (message->notmuch->features & NOTMUCH_FEATURE_LAST_MOD)
+ /* sortable_serialise gives a reasonably compact encoding,
+ * which directly translates to reduced IO when scanning the
+ * value stream. Since it's built for doubles, we only get 53
+ * effective bits, but that's still enough for the database to
+ * last a few centuries at 1 million revisions per second. */
+ message->doc.add_value (NOTMUCH_VALUE_LAST_MOD,
+ Xapian::sortable_serialise (
+ _notmuch_database_new_revision (
+ message->notmuch)));
+
db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
db->replace_document (message->doc_id, message->doc);
message->modified = FALSE;
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 2f43c1d..cb85738 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -108,7 +108,8 @@ typedef enum {
NOTMUCH_VALUE_TIMESTAMP = 0,
NOTMUCH_VALUE_MESSAGE_ID,
NOTMUCH_VALUE_FROM,
- NOTMUCH_VALUE_SUBJECT
+ NOTMUCH_VALUE_SUBJECT,
+ NOTMUCH_VALUE_LAST_MOD,
} notmuch_value_t;
/* Xapian (with flint backend) complains if we provide a term longer
@@ -191,6 +192,9 @@ _notmuch_message_id_compressed (void *ctx, const char *message_id);
notmuch_status_t
_notmuch_database_ensure_writable (notmuch_database_t *notmuch);
+unsigned long
+_notmuch_database_new_revision (notmuch_database_t *notmuch);
+
const char *
_notmuch_database_relative_path (notmuch_database_t *notmuch,
const char *path);
@@ -302,6 +306,10 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
const char *date,
const char *from,
const char *subject);
+
+void
+_notmuch_message_upgrade_last_mod (notmuch_message_t *message);
+
void
_notmuch_message_sync (notmuch_message_t *message);
--
2.1.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [WIP PATCH 4/4] lib: Add "lastmod:" queries for filtering by last modification
2014-10-13 6:19 [WIP PATCH 0/4] Add message revision tracking Austin Clements
` (2 preceding siblings ...)
2014-10-13 6:20 ` [WIP PATCH 3/4] lib: API to retrieve database revision and UUID Austin Clements
@ 2014-10-13 6:20 ` Austin Clements
2015-01-15 21:08 ` David Bremner
3 siblings, 1 reply; 8+ messages in thread
From: Austin Clements @ 2014-10-13 6:20 UTC (permalink / raw)
To: notmuch
From: Austin Clements <amdragon@mit.edu>
XXX Includes reference to notmuch search --db-revision, which doesn't
exist.
---
doc/man7/notmuch-search-terms.rst | 8 ++++++++
lib/database-private.h | 1 +
lib/database.cc | 4 ++++
3 files changed, 13 insertions(+)
diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index 1acdaa0..df76e39 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -52,6 +52,8 @@ indicate user-supplied values):
- date:<since>..<until>
+- lastmod:<since>..<until>
+
The **from:** prefix is used to match the name or address of the sender
of an email message.
@@ -118,6 +120,12 @@ The time range can also be specified using timestamps with a syntax of:
Each timestamp is a number representing the number of seconds since
1970-01-01 00:00:00 UTC.
+The **lastmod:** prefix can be used to restrict the result by the
+database revision number of when messages were last modified (tags
+were added/removed or filenames changed). This is usually used in
+conjunction with the **--db-revision** argument to **notmuch search**
+to find messages that have changed since an earlier query.
+
In addition to individual terms, multiple terms can be combined with
Boolean operators ( **and**, **or**, **not** , etc.). Each term in the
query will be implicitly connected by a logical AND if no explicit
diff --git a/lib/database-private.h b/lib/database-private.h
index 0977229..cbca1de 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -163,6 +163,7 @@ struct _notmuch_database {
Xapian::TermGenerator *term_gen;
Xapian::ValueRangeProcessor *value_range_processor;
Xapian::ValueRangeProcessor *date_range_processor;
+ Xapian::ValueRangeProcessor *last_mod_range_processor;
};
/* Prior to database version 3, features were implied by the database
diff --git a/lib/database.cc b/lib/database.cc
index 9bec170..f9aa45d 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -913,6 +913,7 @@ notmuch_database_open (const char *path,
notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
notmuch->date_range_processor = new ParseTimeValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
+ notmuch->last_mod_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_LAST_MOD, "lastmod:");
notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
notmuch->query_parser->set_database (*notmuch->xapian_db);
@@ -920,6 +921,7 @@ notmuch_database_open (const char *path,
notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
notmuch->query_parser->add_valuerangeprocessor (notmuch->date_range_processor);
+ notmuch->query_parser->add_valuerangeprocessor (notmuch->last_mod_range_processor);
for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
@@ -991,6 +993,8 @@ notmuch_database_close (notmuch_database_t *notmuch)
notmuch->value_range_processor = NULL;
delete notmuch->date_range_processor;
notmuch->date_range_processor = NULL;
+ delete notmuch->last_mod_range_processor;
+ notmuch->last_mod_range_processor = NULL;
return status;
}
--
2.1.0
^ permalink raw reply related [flat|nested] 8+ messages in thread