unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 0/5] Store message modification times in the DB
@ 2011-12-13 17:11 Thomas Jost
  2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

Hello world,

This is a patch series I've been working on for some time in order to be
able to sync my tags on several computers. I'm posting it now, but
please consider it as a RFC rather than something that is ready to be
pushed.

The basic idea is to the last time each message was modified, i.e. "the
message was added to the DB", "a tag was added" or "a tag was removed".

This mtime is accessible through a library function and in the JSON
output of "notmuch show". It is also searchable with the "mtime:" prefix
and with timestamp ranges, like for searching messages by date:

    notmuch search mtime:$(date +%s 2011-12-01)..$(date +%s)

This can then be used in scripts or helper programs to do incremental
dumps or tags synchronization. (I already have a script to do
incremental backups, but it needs some cleaning, and I'm still working
on something for sync'ing tags, but it's starting to work really well;
I'll post them later).

This can be seen as an alternative to David Bremner's jlog branch, but
with several differences:

+ no external dependency
+ everything is stored in the notmuch DB: atomicity for free!
- when a message is removed we lose everything about it, which makes the
  sync process more complicated
- for a human, it's harder to manipulate timestamps than log messages
- this can store much less data than a proper log system

On IRC amdragon suggested using a simple sequence number instead of a
timestamp. This would indeed eliminate the need for proper time
synchronization between computers one would want to keep in sync, and it
would reduce the risk of time-going-backward problems, but IMHO it would
cause more problems: no global clock --> no simple way to tell if DB #A
is more recent than DB #B.

So, here are the patches:
- first a little fix to the comments describing the DB schema (not
  specific to this patch series at all, I just noticed it when rebasing
  this series)
- the second commit adds the MTIME value to the database schema, and
  creates the functions used to update and access this value.
- the third commit makes the MTIME value searchable with a range syntax.
- the fourth commit adds the MTIME to the JSON output of "notmuch show".
- the fifth and last commit adds Message.get_mtime() to the Python
  bindings.

Please tell me what you think of this.

Best regards,
Thomas

Thomas Jost (5):
  Fix comments about what is stored in the database
  lib: Add a MTIME value to every mail document
  lib: Make MTIME values searchable
  show: include mtime in JSON output
  python: add get_mtime() to the Message class

 bindings/python/notmuch/message.py |   20 ++++++++++++++++++++
 lib/database-private.h             |    1 +
 lib/database.cc                    |   14 +++++++++++++-
 lib/message.cc                     |   32 ++++++++++++++++++++++++++++++++
 lib/notmuch-private.h              |    6 +++++-
 lib/notmuch.h                      |    4 ++++
 notmuch-show.c                     |    7 ++++---
 notmuch.1                          |   14 ++++++++++++--
 notmuch.c                          |   13 ++++++++++---
 9 files changed, 101 insertions(+), 10 deletions(-)

-- 
1.7.8

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] Fix comments about what is stored in the database
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
@ 2011-12-13 17:11 ` Thomas Jost
  2011-12-23 19:10   ` David Bremner
  2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

Commit 567bcbc2 introduced two new values for each message (content of the
"From" and "Subject" headers), but the comments about the database schema had
not been updated accordingly.
---
 lib/database.cc |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 98f101e..2025189 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -81,13 +81,17 @@ typedef struct {
  *		        STRING is the name of a file within that
  *		        directory for this mail message.
  *
- *    A mail document also has two values:
+ *    A mail document also has four values:
  *
  *	TIMESTAMP:	The time_t value corresponding to the message's
  *			Date header.
  *
  *	MESSAGE_ID:	The unique ID of the mail mess (see "id" above)
  *
+ *	FROM:		The value of the "From" header
+ *
+ *	SUBJECT:	The value of the "Subject" header
+ *
  * In addition, terms from the content of the message are added with
  * "from", "to", "attachment", and "subject" prefixes for use by the
  * user in searching. Similarly, terms from the path of the mail
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] lib: Add a MTIME value to every mail document
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
  2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
@ 2011-12-13 17:11 ` Thomas Jost
  2011-12-14 21:54   ` Mark Anderson
  2011-12-15  0:45   ` Austin Clements
  2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

This is a time_t value, similar to the message date (TIMESTAMP). It is first set
when the message is added to the database, and is then updated every time a tag
is added or removed. It can thus be used for doing incremental dumps of the
database or for synchronizing it between several computers.

This value can be read freely (with notmuch_message_get_mtime()) but for now it
can't be set to an arbitrary value: it can only be set to "now" when updated.
There's no specific reason for this except that I don't really see a real use
case for setting it to an arbitrary value.
---
 lib/database.cc       |    7 ++++++-
 lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
 lib/notmuch-private.h |    6 +++++-
 lib/notmuch.h         |    4 ++++
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 2025189..6dc6f73 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -81,7 +81,7 @@ typedef struct {
  *		        STRING is the name of a file within that
  *		        directory for this mail message.
  *
- *    A mail document also has four values:
+ *    A mail document also has five values:
  *
  *	TIMESTAMP:	The time_t value corresponding to the message's
  *			Date header.
@@ -92,6 +92,9 @@ typedef struct {
  *
  *	SUBJECT:	The value of the "Subject" header
  *
+ *	MTIME:		The time_t value corresponding to the last time
+ *			a tag was added or removed on the message.
+ *
  * In addition, terms from the content of the message are added with
  * "from", "to", "attachment", and "subject" prefixes for use by the
  * user in searching. Similarly, terms from the path of the mail
@@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 	    date = notmuch_message_file_get_header (message_file, "date");
 	    _notmuch_message_set_header_values (message, date, from, subject);
 
+            _notmuch_message_update_mtime (message);
+
 	    _notmuch_message_index_file (message, filename);
 	} else {
 	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
diff --git a/lib/message.cc b/lib/message.cc
index 0075425..0c98589 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
     message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
 }
 
+/* Get the message mtime, i.e. when it was added or the last time a tag was
+ * added/removed. */
+time_t
+notmuch_message_get_mtime (notmuch_message_t *message)
+{
+    std::string value;
+
+    try {
+	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
+    } catch (Xapian::Error &error) {
+	INTERNAL_ERROR ("Failed to read mtime value from document.");
+	return 0;
+    }
+
+    return Xapian::sortable_unserialise (value);
+}
+
+/* Set the message mtime to "now". */
+void
+_notmuch_message_update_mtime (notmuch_message_t *message)
+{
+    time_t time_value;
+
+    time_value = time (NULL);
+    message->doc.add_value (NOTMUCH_VALUE_MTIME,
+                            Xapian::sortable_serialise (time_value));
+}
+
 /* Synchronize changes made to message->doc out into the database. */
 void
 _notmuch_message_sync (notmuch_message_t *message)
@@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 			private_status);
     }
 
+    _notmuch_message_update_mtime (message);
+
     if (! message->frozen)
 	_notmuch_message_sync (message);
 
@@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
 			private_status);
     }
 
+    _notmuch_message_update_mtime (message);
+
     if (! message->frozen)
 	_notmuch_message_sync (message);
 
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 60a932f..9859872 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -95,7 +95,8 @@ typedef enum {
     NOTMUCH_VALUE_TIMESTAMP = 0,
     NOTMUCH_VALUE_MESSAGE_ID,
     NOTMUCH_VALUE_FROM,
-    NOTMUCH_VALUE_SUBJECT
+    NOTMUCH_VALUE_SUBJECT,
+    NOTMUCH_VALUE_MTIME
 } notmuch_value_t;
 
 /* Xapian (with flint backend) complains if we provide a term longer
@@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
 				    const char *from,
 				    const char *subject);
 void
+_notmuch_message_update_mtime (notmuch_message_t *message);
+
+void
 _notmuch_message_sync (notmuch_message_t *message);
 
 notmuch_status_t
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 9f23a10..643ebce 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
 time_t
 notmuch_message_get_date  (notmuch_message_t *message);
 
+/* Get the mtime of 'message' as a time_t value. */
+time_t
+notmuch_message_get_mtime (notmuch_message_t *message);
+
 /* Get the value of the specified header from 'message'.
  *
  * The value will be read from the actual message file, not from the
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] lib: Make MTIME values searchable
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
  2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
  2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
@ 2011-12-13 17:11 ` Thomas Jost
  2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

Tag modification times are now searchable as ranges (just like regular message
dates) with the "mtime:" prefix.
---
 lib/database-private.h |    1 +
 lib/database.cc        |    3 +++
 notmuch.1              |   14 ++++++++++++--
 notmuch.c              |   13 ++++++++++---
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 88532d5..e71c8e4 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -52,6 +52,7 @@ struct _notmuch_database {
     Xapian::QueryParser *query_parser;
     Xapian::TermGenerator *term_gen;
     Xapian::ValueRangeProcessor *value_range_processor;
+    Xapian::ValueRangeProcessor *mtime_value_range_processor;
 };
 
 /* Return the list of terms from the given iterator matching a prefix.
diff --git a/lib/database.cc b/lib/database.cc
index 6dc6f73..cc970c1 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -677,12 +677,14 @@ notmuch_database_open (const char *path,
 	notmuch->term_gen = new Xapian::TermGenerator;
 	notmuch->term_gen->set_stemmer (Xapian::Stem ("english"));
 	notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP);
+	notmuch->mtime_value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_MTIME, "mtime:");
 
 	notmuch->query_parser->set_default_op (Xapian::Query::OP_AND);
 	notmuch->query_parser->set_database (*notmuch->xapian_db);
 	notmuch->query_parser->set_stemmer (Xapian::Stem ("english"));
 	notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME);
 	notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor);
+	notmuch->query_parser->add_valuerangeprocessor (notmuch->mtime_value_range_processor);
 
 	for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) {
 	    prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i];
@@ -726,6 +728,7 @@ notmuch_database_close (notmuch_database_t *notmuch)
     delete notmuch->query_parser;
     delete notmuch->xapian_db;
     delete notmuch->value_range_processor;
+    delete notmuch->mtime_value_range_processor;
     talloc_free (notmuch);
 }
 
diff --git a/notmuch.1 b/notmuch.1
index 3dbd67e..2235096 100644
--- a/notmuch.1
+++ b/notmuch.1
@@ -644,6 +644,8 @@ terms to match against specific portions of an email, (where
 
 	folder:<directory-path>
 
+	mtime:<timestamp-range>
+
 The
 .B from:
 prefix is used to match the name or address of the sender of an email
@@ -707,8 +709,8 @@ operators, but will have to be protected from interpretation by the
 shell, (such as by putting quotation marks around any parenthesized
 expression).
 
-Finally, results can be restricted to only messages within a
-particular time range, (based on the Date: header) with a syntax of:
+Results can be restricted to only messages within a particular time range,
+(based on the Date: header) with a syntax of:
 
 	<initial-timestamp>..<final-timestamp>
 
@@ -721,6 +723,14 @@ specify a date range to return messages from 2009\-10\-01 until the
 current time:
 
 	$(date +%s \-d 2009\-10\-01)..$(date +%s)
+
+Finally, the
+.B mtime:
+prefix can be used to search for messages which were modified (e.g. tags were
+added or removed) within a particular time range, with the same syntax as
+before:
+
+	mtime:<initial-timestamp>..<final-timestamp>
 .SH HOOKS
 Hooks are scripts (or arbitrary executables or symlinks to such) that notmuch
 invokes before and after certain actions. These scripts reside in
diff --git a/notmuch.c b/notmuch.c
index c0ce026..443cf59 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -71,6 +71,7 @@ static const char search_terms_help[] =
     "\t\tid:<message-id>\n"
     "\t\tthread:<thread-id>\n"
     "\t\tfolder:<directory-path>\n"
+    "\t\tmtime:<timestamp-range>\n"
     "\n"
     "\tThe from: prefix is used to match the name or address of\n"
     "\tthe sender of an email message.\n"
@@ -112,8 +113,8 @@ static const char search_terms_help[] =
     "\tinterpretation by the shell, (such as by putting quotation\n"
     "\tmarks around any parenthesized expression).\n"
     "\n"
-    "\tFinally, results can be restricted to only messages within a\n"
-    "\tparticular time range, (based on the Date: header) with:\n"
+    "\tResults can be restricted to only messages within a particular\n"
+    "\ttime range, (based on the Date: header) with:\n"
     "\n"
     "\t\t<intial-timestamp>..<final-timestamp>\n"
     "\n"
@@ -125,7 +126,13 @@ static const char search_terms_help[] =
     "\tfollowing syntax would specify a date range to return messages\n"
     "\tfrom 2009-10-01 until the current time:\n"
     "\n"
-    "\t\t$(date +%%s -d 2009-10-01)..$(date +%%s)\n\n";
+    "\t\t$(date +%%s -d 2009-10-01)..$(date +%%s)\n\n"
+    "\n"
+    "\tFinally, the mtime: prefix can be used to search for messages\n"
+    "\twhich were modified (e.g. tags were added or removed) within a\n"
+    "\tparticular time range, with the same syntax as before:\n"
+    "\n"
+    "\t\tmtime:<initial-timestamp>..<final-timestamp>\n";
 
 static const char hooks_help[] =
     "\tHooks are scripts (or arbitrary executables or symlinks to such) that\n"
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] show: include mtime in JSON output
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
                   ` (2 preceding siblings ...)
  2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost
@ 2011-12-13 17:11 ` Thomas Jost
  2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
  2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
  5 siblings, 0 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

This could be used by a UI implementation somehow.
---
 notmuch-show.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/notmuch-show.c b/notmuch-show.c
index 873a7c4..7279601 100644
--- a/notmuch-show.c
+++ b/notmuch-show.c
@@ -202,17 +202,18 @@ format_message_json (const void *ctx, notmuch_message_t *message, unused (int in
     notmuch_tags_t *tags;
     int first = 1;
     void *ctx_quote = talloc_new (ctx);
-    time_t date;
+    time_t date, mtime;
     const char *relative_date;
 
     date = notmuch_message_get_date (message);
     relative_date = notmuch_time_relative_date (ctx, date);
+    mtime = notmuch_message_get_mtime (message);
 
-    printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, \"date_relative\": \"%s\", \"tags\": [",
+    printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, \"date_relative\": \"%s\", \"mtime\": %ld, \"tags\": [",
 	    json_quote_str (ctx_quote, notmuch_message_get_message_id (message)),
 	    notmuch_message_get_flag (message, NOTMUCH_MESSAGE_FLAG_MATCH) ? "true" : "false",
 	    json_quote_str (ctx_quote, notmuch_message_get_filename (message)),
-	    date, relative_date);
+	    date, relative_date, mtime);
 
     for (tags = notmuch_message_get_tags (message);
 	 notmuch_tags_valid (tags);
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] python: add get_mtime() to the Message class
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
                   ` (3 preceding siblings ...)
  2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost
@ 2011-12-13 17:11 ` Thomas Jost
  2012-01-02 14:56   ` Sebastian Spaeth
  2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
  5 siblings, 1 reply; 17+ messages in thread
From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw)
  To: notmuch

---
 bindings/python/notmuch/message.py |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index ce8e718..56f56c2 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -293,6 +293,10 @@ class Message(object):
     _get_date.argtypes = [NotmuchMessageP]
     _get_date.restype = c_long
 
+    _get_mtime = nmlib.notmuch_message_get_mtime
+    _get_mtime.argtypes = [NotmuchMessageP]
+    _get_mtime.restype = c_long
+
     _get_header = nmlib.notmuch_message_get_header
     _get_header.argtypes = [NotmuchMessageP, c_char_p]
     _get_header.restype = c_char_p
@@ -401,6 +405,22 @@ class Message(object):
             raise NotmuchError(STATUS.NOT_INITIALIZED)
         return Message._get_date(self._msg)
 
+    def get_mtime(self):
+        """Returns time_t of the message mtime
+
+        The mtime is the timestamp of the last time the message was modified,
+        e.g. the time it was added to the database or the last time a tag was
+        added or removed.
+
+        :returns: A time_t timestamp.
+        :rtype: c_unit64
+        :exception: :exc:`NotmuchError` STATUS.NOT_INITIALIZED if the message
+                    is not initialized.
+        """
+        if self._msg is None:
+            raise NotmuchError(STATUS.NOT_INITIALIZED)
+        return Message._get_mtime(self._msg)
+
     def get_header(self, header):
         """Get the value of the specified header.
 
-- 
1.7.8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document
  2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
@ 2011-12-14 21:54   ` Mark Anderson
  2011-12-21  0:34     ` Thomas Jost
  2011-12-15  0:45   ` Austin Clements
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Anderson @ 2011-12-14 21:54 UTC (permalink / raw)
  To: Thomas Jost, notmuch@notmuchmail.org

On Tue, 13 Dec 2011 11:11:42 -0600, Thomas Jost <schnouki@schnouki.net> wrote:
> This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> when the message is added to the database, and is then updated every time a tag
> is added or removed. It can thus be used for doing incremental dumps of the
> database or for synchronizing it between several computers.
> 
> This value can be read freely (with notmuch_message_get_mtime()) but for now it
> can't be set to an arbitrary value: it can only be set to "now" when updated.
> There's no specific reason for this except that I don't really see a real use
> case for setting it to an arbitrary value.

I think it would be easier to write some testcases if the last modified
time could be touched directly.  Perhaps they aren't in the set of "must
have", but it's what comes to mind.

-Mark

> ---
>  lib/database.cc       |    7 ++++++-
>  lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
>  lib/notmuch-private.h |    6 +++++-
>  lib/notmuch.h         |    4 ++++
>  4 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/database.cc b/lib/database.cc
> index 2025189..6dc6f73 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -81,7 +81,7 @@ typedef struct {
>   *		        STRING is the name of a file within that
>   *		        directory for this mail message.
>   *
> - *    A mail document also has four values:
> + *    A mail document also has five values:
>   *
>   *	TIMESTAMP:	The time_t value corresponding to the message's
>   *			Date header.
> @@ -92,6 +92,9 @@ typedef struct {
>   *
>   *	SUBJECT:	The value of the "Subject" header
>   *
> + *	MTIME:		The time_t value corresponding to the last time
> + *			a tag was added or removed on the message.
> + *
>   * In addition, terms from the content of the message are added with
>   * "from", "to", "attachment", and "subject" prefixes for use by the
>   * user in searching. Similarly, terms from the path of the mail
> @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
>  	    date = notmuch_message_file_get_header (message_file, "date");
>  	    _notmuch_message_set_header_values (message, date, from, subject);
>  
> +            _notmuch_message_update_mtime (message);
> +
>  	    _notmuch_message_index_file (message, filename);
>  	} else {
>  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> diff --git a/lib/message.cc b/lib/message.cc
> index 0075425..0c98589 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
>      message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
>  }
>  
> +/* Get the message mtime, i.e. when it was added or the last time a tag was
> + * added/removed. */
> +time_t
> +notmuch_message_get_mtime (notmuch_message_t *message)
> +{
> +    std::string value;
> +
> +    try {
> +	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
> +    } catch (Xapian::Error &error) {
> +	INTERNAL_ERROR ("Failed to read mtime value from document.");
> +	return 0;
> +    }
> +
> +    return Xapian::sortable_unserialise (value);
> +}
> +
> +/* Set the message mtime to "now". */
> +void
> +_notmuch_message_update_mtime (notmuch_message_t *message)
> +{
> +    time_t time_value;
> +
> +    time_value = time (NULL);
> +    message->doc.add_value (NOTMUCH_VALUE_MTIME,
> +                            Xapian::sortable_serialise (time_value));
> +}
> +
>  /* Synchronize changes made to message->doc out into the database. */
>  void
>  _notmuch_message_sync (notmuch_message_t *message)
> @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
>  			private_status);
>      }
>  
> +    _notmuch_message_update_mtime (message);
> +
>      if (! message->frozen)
>  	_notmuch_message_sync (message);
>  
> @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
>  			private_status);
>      }
>  
> +    _notmuch_message_update_mtime (message);
> +
>      if (! message->frozen)
>  	_notmuch_message_sync (message);
>  
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 60a932f..9859872 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -95,7 +95,8 @@ typedef enum {
>      NOTMUCH_VALUE_TIMESTAMP = 0,
>      NOTMUCH_VALUE_MESSAGE_ID,
>      NOTMUCH_VALUE_FROM,
> -    NOTMUCH_VALUE_SUBJECT
> +    NOTMUCH_VALUE_SUBJECT,
> +    NOTMUCH_VALUE_MTIME
>  } notmuch_value_t;
>  
>  /* Xapian (with flint backend) complains if we provide a term longer
> @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
>  				    const char *from,
>  				    const char *subject);
>  void
> +_notmuch_message_update_mtime (notmuch_message_t *message);
> +
> +void
>  _notmuch_message_sync (notmuch_message_t *message);
>  
>  notmuch_status_t
> diff --git a/lib/notmuch.h b/lib/notmuch.h
> index 9f23a10..643ebce 100644
> --- a/lib/notmuch.h
> +++ b/lib/notmuch.h
> @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
>  time_t
>  notmuch_message_get_date  (notmuch_message_t *message);
>  
> +/* Get the mtime of 'message' as a time_t value. */
> +time_t
> +notmuch_message_get_mtime (notmuch_message_t *message);
> +
>  /* Get the value of the specified header from 'message'.
>   *
>   * The value will be read from the actual message file, not from the
> -- 
> 1.7.8
> 
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document
  2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
  2011-12-14 21:54   ` Mark Anderson
@ 2011-12-15  0:45   ` Austin Clements
  2011-12-21  1:00     ` Thomas Jost
  1 sibling, 1 reply; 17+ messages in thread
From: Austin Clements @ 2011-12-15  0:45 UTC (permalink / raw)
  To: Thomas Jost; +Cc: notmuch

A few minor comments below.

At a higher level, I'm curious what the tag synchronization protocol
you're building on top of this is.  I can't think of one that doesn't
have race conditions, but maybe I'm not thinking about it right.

Quoth Thomas Jost on Dec 13 at  6:11 pm:
> This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> when the message is added to the database, and is then updated every time a tag
> is added or removed. It can thus be used for doing incremental dumps of the
> database or for synchronizing it between several computers.
> 
> This value can be read freely (with notmuch_message_get_mtime()) but for now it
> can't be set to an arbitrary value: it can only be set to "now" when updated.
> There's no specific reason for this except that I don't really see a real use
> case for setting it to an arbitrary value.
> ---
>  lib/database.cc       |    7 ++++++-
>  lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
>  lib/notmuch-private.h |    6 +++++-
>  lib/notmuch.h         |    4 ++++
>  4 files changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/database.cc b/lib/database.cc
> index 2025189..6dc6f73 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -81,7 +81,7 @@ typedef struct {
>   *		        STRING is the name of a file within that
>   *		        directory for this mail message.
>   *
> - *    A mail document also has four values:
> + *    A mail document also has five values:
>   *
>   *	TIMESTAMP:	The time_t value corresponding to the message's
>   *			Date header.
> @@ -92,6 +92,9 @@ typedef struct {
>   *
>   *	SUBJECT:	The value of the "Subject" header
>   *
> + *	MTIME:		The time_t value corresponding to the last time
> + *			a tag was added or removed on the message.
> + *
>   * In addition, terms from the content of the message are added with
>   * "from", "to", "attachment", and "subject" prefixes for use by the
>   * user in searching. Similarly, terms from the path of the mail
> @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
>  	    date = notmuch_message_file_get_header (message_file, "date");
>  	    _notmuch_message_set_header_values (message, date, from, subject);
>  
> +            _notmuch_message_update_mtime (message);

Indentation.

> +
>  	    _notmuch_message_index_file (message, filename);
>  	} else {
>  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> diff --git a/lib/message.cc b/lib/message.cc
> index 0075425..0c98589 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
>      message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
>  }
>  
> +/* Get the message mtime, i.e. when it was added or the last time a tag was
> + * added/removed. */
> +time_t
> +notmuch_message_get_mtime (notmuch_message_t *message)
> +{
> +    std::string value;
> +
> +    try {
> +	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
> +    } catch (Xapian::Error &error) {
> +	INTERNAL_ERROR ("Failed to read mtime value from document.");
> +	return 0;
> +    }

For compatibility, this should handle the case when
NOTMUCH_VALUE_MTIME is missing, probably by just returning 0.  As it
is, value will be an empty string and sortable_unserialise is
undefined on strings that weren't produced by sortable_serialise.

> +
> +    return Xapian::sortable_unserialise (value);
> +}
> +
> +/* Set the message mtime to "now". */
> +void
> +_notmuch_message_update_mtime (notmuch_message_t *message)
> +{
> +    time_t time_value;
> +
> +    time_value = time (NULL);
> +    message->doc.add_value (NOTMUCH_VALUE_MTIME,
> +                            Xapian::sortable_serialise (time_value));

Indentation.

> +}
> +
>  /* Synchronize changes made to message->doc out into the database. */
>  void
>  _notmuch_message_sync (notmuch_message_t *message)
> @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
>  			private_status);
>      }
>  
> +    _notmuch_message_update_mtime (message);
> +
>      if (! message->frozen)
>  	_notmuch_message_sync (message);
>  
> @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
>  			private_status);
>      }
>  
> +    _notmuch_message_update_mtime (message);
> +
>      if (! message->frozen)
>  	_notmuch_message_sync (message);
>  
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 60a932f..9859872 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -95,7 +95,8 @@ typedef enum {
>      NOTMUCH_VALUE_TIMESTAMP = 0,
>      NOTMUCH_VALUE_MESSAGE_ID,
>      NOTMUCH_VALUE_FROM,
> -    NOTMUCH_VALUE_SUBJECT
> +    NOTMUCH_VALUE_SUBJECT,
> +    NOTMUCH_VALUE_MTIME
>  } notmuch_value_t;
>  
>  /* Xapian (with flint backend) complains if we provide a term longer
> @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
>  				    const char *from,
>  				    const char *subject);
>  void
> +_notmuch_message_update_mtime (notmuch_message_t *message);
> +
> +void
>  _notmuch_message_sync (notmuch_message_t *message);
>  
>  notmuch_status_t
> diff --git a/lib/notmuch.h b/lib/notmuch.h
> index 9f23a10..643ebce 100644
> --- a/lib/notmuch.h
> +++ b/lib/notmuch.h
> @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
>  time_t
>  notmuch_message_get_date  (notmuch_message_t *message);
>  
> +/* Get the mtime of 'message' as a time_t value. */
> +time_t
> +notmuch_message_get_mtime (notmuch_message_t *message);
> +
>  /* Get the value of the specified header from 'message'.
>   *
>   * The value will be read from the actual message file, not from the

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Store message modification times in the DB
  2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
                   ` (4 preceding siblings ...)
  2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
@ 2011-12-19 16:34 ` David Edmondson
  2011-12-19 19:48   ` Austin Clements
  5 siblings, 1 reply; 17+ messages in thread
From: David Edmondson @ 2011-12-19 16:34 UTC (permalink / raw)
  To: Thomas Jost, notmuch

On Tue, 13 Dec 2011 18:11:40 +0100, Thomas Jost <schnouki@schnouki.net> wrote:
> This is a patch series I've been working on for some time in order to be
> able to sync my tags on several computers. I'm posting it now, but
> please consider it as a RFC rather than something that is ready to be
> pushed.
> 
> The basic idea is to the last time each message was modified, i.e. "the
> message was added to the DB", "a tag was added" or "a tag was removed".

Thomas, this is interesting. Do you have a (back of the envelope?)
design for how you will use this information to implement tag sync?

My gut feeling is that we need a log of when a change occurred rather
than the last modification time, but I haven't really thought that all
through properly.

dme.
-- 
David Edmondson, http://dme.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Store message modification times in the DB
  2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
@ 2011-12-19 19:48   ` Austin Clements
  2011-12-19 22:56     ` Tom Prince
  2011-12-20  8:32     ` David Edmondson
  0 siblings, 2 replies; 17+ messages in thread
From: Austin Clements @ 2011-12-19 19:48 UTC (permalink / raw)
  To: David Edmondson; +Cc: notmuch

Quoth David Edmondson on Dec 19 at  4:34 pm:
> On Tue, 13 Dec 2011 18:11:40 +0100, Thomas Jost <schnouki@schnouki.net> wrote:
> > This is a patch series I've been working on for some time in order to be
> > able to sync my tags on several computers. I'm posting it now, but
> > please consider it as a RFC rather than something that is ready to be
> > pushed.
> > 
> > The basic idea is to the last time each message was modified, i.e. "the
> > message was added to the DB", "a tag was added" or "a tag was removed".
> 
> Thomas, this is interesting. Do you have a (back of the envelope?)
> design for how you will use this information to implement tag sync?
> 
> My gut feeling is that we need a log of when a change occurred rather
> than the last modification time, but I haven't really thought that all
> through properly.

Here are sketches for two sync algorithms with different properties.
I haven't proven these to be correct, but I believe they are.  In
both, R is the remote host and L is the local host.  They're both
one-way (they only update tags on L), but should be symmetrically
stable.


== Two-way "merge" from host R to host L ==

Per-host state:
- last_mtime: Map from remote hosts to last sync mtime

new_mtime = last_mtime[R]
For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]:
  If mtime > local mtime of msgid:
    Set local tags of msgid to tags
  new_mtime = max(new_mtime, mtime)
last_mtime[R] = new_mtime

This has the advantage of keeping very little state, but the
synchronization is also quite primitive.  If two hosts change a
message's tags in different ways between synchronizations, the more
recent of the two will override the full set of tags on that message.
This does not strictly require tombstones, though if you make a tag
change and then delete the message before a sync, the tag change will
be lost without some record of that state.  Also, this obviously
depends heavily on synchronized clocks.


== Three-way merge from host R to host L ==

Per-host state:
- last_mtime: Map from remote hosts to last sync mtime
- last_sync: Map from remote hosts to the tag database as of the last sync

new_mtime = last_mtime[R]
for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]:
  my_tags = local tags of msgid
  last_tags = last_sync[R][msgid]
  for each tag that differs between my_tags and r_tags:
    if tag is in last_tags: remove tag locally
    else: add tag locally
  last_sync[R][msgid] = tags
  new_mtime = max(new_mtime, mtime)
Delete stale messages from last_sync[R] (using tombstones or something)
last_mtime[R] = new_mtime

This protocol requires significantly more state, but can also
reconstruct per-tag changes.  Conflict resolution is equivalent to
what git would do and is based solely on the current local and remote
state and the common ancestor state.  This can lead to unintuitive
results if a tag on a message has gone through multiple changes on
both hosts since the last sync (though, I argue, there are no
intuitive results in such situations).  Tombstones are only required
to garbage collect sync state (and other techniques could be used for
that).  This also does not depend on time synchronization (though,
like any mtime solution, it does depend on mtime monotonicity).  The
algorithm would work equally well with sequence numbers.


I tried coming up with a third algorithm that used mtimes to resolve
tagging conflicts, but without per-tag mtimes it degenerated into the
first algorithm.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Store message modification times in the DB
  2011-12-19 19:48   ` Austin Clements
@ 2011-12-19 22:56     ` Tom Prince
  2011-12-20  8:32     ` David Edmondson
  1 sibling, 0 replies; 17+ messages in thread
From: Tom Prince @ 2011-12-19 22:56 UTC (permalink / raw)
  To: Austin Clements, David Edmondson; +Cc: notmuch

On Mon, 19 Dec 2011 14:48:21 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> This protocol requires significantly more state, but can also
> reconstruct per-tag changes.  Conflict resolution is equivalent to
> what git would do and is based solely on the current local and remote
> state and the common ancestor state.

This seems like exactly what one would get if one stored the tag state
in git, which seems like a reasonable thing to do anyway. 

> This can lead to unintuitive results if a tag on a message has gone
> through multiple changes on both hosts since the last sync (though, I
> argue, there are no intuitive results in such situations).

I certainly agree that there isn't a universally good resolution to
this. I suspect that the same person, making the same tag changes with
the same mtimes, will want different resolutions at different
times. This is because there is no good way to record the intent of the
changes.

> Tombstones are only required to garbage collect sync state (and other
> techniques could be used for that).

I wonder how many people using notmuch actually delete mail? I know I
don't bother to, anymore.

One use case that was mentioned, is having a limited amount of mail on a
portable device, and syncing tags on those message present. Using git to
record the tag state, one would just need to record the state before
deleting files, to avoid the need for tombstones in the notmuch db.

  Tom

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Store message modification times in the DB
  2011-12-19 19:48   ` Austin Clements
  2011-12-19 22:56     ` Tom Prince
@ 2011-12-20  8:32     ` David Edmondson
  2011-12-20 15:05       ` Austin Clements
  1 sibling, 1 reply; 17+ messages in thread
From: David Edmondson @ 2011-12-20  8:32 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

On Mon, 19 Dec 2011 14:48:21 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> Here are sketches for two sync algorithms with different properties.
> I haven't proven these to be correct, but I believe they are.  In
> both, R is the remote host and L is the local host.  They're both
> one-way (they only update tags on L), but should be symmetrically
> stable.

Thanks for these.

> == Two-way "merge" from host R to host L ==
> 
> Per-host state:
> - last_mtime: Map from remote hosts to last sync mtime

With the proposed changes it seems that the state required on each host
would live within the Xapian database (to be extracted with 'dump').

> new_mtime = last_mtime[R]
> For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]:
>   If mtime > local mtime of msgid:
>     Set local tags of msgid to tags
>   new_mtime = max(new_mtime, mtime)
> last_mtime[R] = new_mtime
> 
> This has the advantage of keeping very little state, but the
> synchronization is also quite primitive.  If two hosts change a
> message's tags in different ways between synchronizations, the more
> recent of the two will override the full set of tags on that message.
> This does not strictly require tombstones, though if you make a tag
> change and then delete the message before a sync, the tag change will
> be lost without some record of that state.

Does this matter? If the tag on a deleted message is changed, does
anyone care?

> Also, this obviously depends heavily on synchronized clocks.
> 
> 
> == Three-way merge from host R to host L ==
> 
> Per-host state:
> - last_mtime: Map from remote hosts to last sync mtime
> - last_sync: Map from remote hosts to the tag database as of the last sync

Any ideas where this state might be kept?

> new_mtime = last_mtime[R]
> for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]:
>   my_tags = local tags of msgid
>   last_tags = last_sync[R][msgid]
>   for each tag that differs between my_tags and r_tags:
>     if tag is in last_tags: remove tag locally
>     else: add tag locally
>   last_sync[R][msgid] = tags
>   new_mtime = max(new_mtime, mtime)
> Delete stale messages from last_sync[R] (using tombstones or something)
> last_mtime[R] = new_mtime
> 
> This protocol requires significantly more state, but can also
> reconstruct per-tag changes.  Conflict resolution is equivalent to
> what git would do and is based solely on the current local and remote
> state and the common ancestor state.  This can lead to unintuitive
> results if a tag on a message has gone through multiple changes on
> both hosts since the last sync (though, I argue, there are no
> intuitive results in such situations).  Tombstones are only required
> to garbage collect sync state (and other techniques could be used for
> that).  This also does not depend on time synchronization (though,
> like any mtime solution, it does depend on mtime monotonicity).  The
> algorithm would work equally well with sequence numbers.
> 
> I tried coming up with a third algorithm that used mtimes to resolve
> tagging conflicts, but without per-tag mtimes it degenerated into the
> first algorithm.

dme.
-- 
David Edmondson, http://dme.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] Store message modification times in the DB
  2011-12-20  8:32     ` David Edmondson
@ 2011-12-20 15:05       ` Austin Clements
  0 siblings, 0 replies; 17+ messages in thread
From: Austin Clements @ 2011-12-20 15:05 UTC (permalink / raw)
  To: David Edmondson; +Cc: notmuch

Quoth David Edmondson on Dec 20 at  8:32 am:
> > == Two-way "merge" from host R to host L ==
> > 
> > Per-host state:
> > - last_mtime: Map from remote hosts to last sync mtime
> 
> With the proposed changes it seems that the state required on each host
> would live within the Xapian database (to be extracted with 'dump').

It certainly could.  I haven't thought about how any of this would
integrate with dump, or if it necessarily should.  A related question
is how bootstrap should work.  For example, if you add another host,
what's the best way to bring it up to speed without, say, overwriting
your tags everywhere with your initial tags?  In general, when a new
message arrives, how do you get the hosts to agree on its tags and
what happens if one host tags it before another host sees it?

> > new_mtime = last_mtime[R]
> > For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]:
> >   If mtime > local mtime of msgid:
> >     Set local tags of msgid to tags
> >   new_mtime = max(new_mtime, mtime)
> > last_mtime[R] = new_mtime
> > 
> > This has the advantage of keeping very little state, but the
> > synchronization is also quite primitive.  If two hosts change a
> > message's tags in different ways between synchronizations, the more
> > recent of the two will override the full set of tags on that message.
> > This does not strictly require tombstones, though if you make a tag
> > change and then delete the message before a sync, the tag change will
> > be lost without some record of that state.
> 
> Does this matter? If the tag on a deleted message is changed, does
> anyone care?

That depends on what sort of synchronization model you're expecting.
If you're expecting git-style synchronization where all that matters
is the state and not the order things happened in, then this is
exactly what you'd expect.  If you're expecting something more nuanced
that knows about the order you did things in across hosts between
synchronizations (which I think can only lead to more unintuitive
corner-cases, but some people seem to expect), then this could be
surprising.

> > Also, this obviously depends heavily on synchronized clocks.
> > 
> > 
> > == Three-way merge from host R to host L ==
> > 
> > Per-host state:
> > - last_mtime: Map from remote hosts to last sync mtime
> > - last_sync: Map from remote hosts to the tag database as of the last sync
> 
> Any ideas where this state might be kept?

It could also be stored in Xapian (in user keys or as additional
message metadata).  That would certainly be simplest and would avoid
hairy atomicity issues.  OTOH, it's not the end of the world if
last_sync doesn't get updated atomically, especially if we can at
least guarantee last_sync is fully updated and on disk before we
update last_mtime.

> > new_mtime = last_mtime[R]
> > for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]:
> >   my_tags = local tags of msgid
> >   last_tags = last_sync[R][msgid]
> >   for each tag that differs between my_tags and r_tags:
> >     if tag is in last_tags: remove tag locally
> >     else: add tag locally
> >   last_sync[R][msgid] = tags
> >   new_mtime = max(new_mtime, mtime)
> > Delete stale messages from last_sync[R] (using tombstones or something)
> > last_mtime[R] = new_mtime
> > 
> > This protocol requires significantly more state, but can also
> > reconstruct per-tag changes.  Conflict resolution is equivalent to
> > what git would do and is based solely on the current local and remote
> > state and the common ancestor state.  This can lead to unintuitive
> > results if a tag on a message has gone through multiple changes on
> > both hosts since the last sync (though, I argue, there are no
> > intuitive results in such situations).  Tombstones are only required
> > to garbage collect sync state (and other techniques could be used for
> > that).  This also does not depend on time synchronization (though,
> > like any mtime solution, it does depend on mtime monotonicity).  The
> > algorithm would work equally well with sequence numbers.
> > 
> > I tried coming up with a third algorithm that used mtimes to resolve
> > tagging conflicts, but without per-tag mtimes it degenerated into the
> > first algorithm.
> 
> dme.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document
  2011-12-14 21:54   ` Mark Anderson
@ 2011-12-21  0:34     ` Thomas Jost
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-21  0:34 UTC (permalink / raw)
  To: Mark Anderson, notmuch@notmuchmail.org

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Wed, 14 Dec 2011 14:54:10 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
> On Tue, 13 Dec 2011 11:11:42 -0600, Thomas Jost <schnouki@schnouki.net> wrote:
> > This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> > when the message is added to the database, and is then updated every time a tag
> > is added or removed. It can thus be used for doing incremental dumps of the
> > database or for synchronizing it between several computers.
> > 
> > This value can be read freely (with notmuch_message_get_mtime()) but for now it
> > can't be set to an arbitrary value: it can only be set to "now" when updated.
> > There's no specific reason for this except that I don't really see a real use
> > case for setting it to an arbitrary value.
> 
> I think it would be easier to write some testcases if the last modified
> time could be touched directly.  Perhaps they aren't in the set of "must
> have", but it's what comes to mind.

Well since I posted this, I found other good reasons to have a set_mtime
function. I'll post an updated series lated which will include it -- and
possibly some tests too :)

Thanks,

-- 
Thomas/Schnouki

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document
  2011-12-15  0:45   ` Austin Clements
@ 2011-12-21  1:00     ` Thomas Jost
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Jost @ 2011-12-21  1:00 UTC (permalink / raw)
  To: Austin Clements; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 8832 bytes --]

On Wed, 14 Dec 2011 19:45:07 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> A few minor comments below.
> 
> At a higher level, I'm curious what the tag synchronization protocol
> you're building on top of this is.  I can't think of one that doesn't
> have race conditions, but maybe I'm not thinking about it right.

The approach I've used is quite different from what you described in
id:"20111219194821.GA10376@mit.edu". I don't directly sync host A to
host B but I use a server in the middle. (A is my laptop --> not always
on, B is my work PC --> turned off when I'm out of office, so a direct
sync would be harder to do).

My nm-sync script is written in Python 2 (2.7, may work with 2.6) and is
present on both my PCs and on my server. It can operate in two modes :
client (when run from one of my PCs) or server (called *from the client*
through ssh, running on my server).

When running in server mode, the script manipulates a small DB stored as
a Python dictionary (and stored on disk with the pickle module). It does
not even need notmuch to be installed on the server. Here is what this
DB looks like:
  {
    "lastseen": {
      "pc_A": 1324428029,
      "pc_B": 1323952028
    },
    "messages": {
      "msgid_001": (mtime, tag1, tag2, ..., tagN),
      "msgid_002": (mtime, tag1, tag2, ..., tagM),
      ...
    }
  }

So when running the client, here is what happens:
1. client starts a subprocess: "ssh myserver ~/nm-sync server"
2. client and server check that their sha1sum match (to avoid version
   mismatch)
3. client identifies itself with its hostname ("pc_A" in the example
   above), server replies with its "lastseen" value and updates its in
   the DB
4. server sends to client messages with mtime > lastseen (msgid + mtime
   + tags), client updates the notmuch DB with these values
5. client queries the notmuch DB for messages with mtime > lastseen and
   sends them (msgid + mtime + tags) to the server, which stores them in
   the DB
6. cleanup: server removes messages with mtime < min(lastseen) from its
   DB

So basically this approach assumes that all clocks are synchronized
(everyone uses ntp, right?...) and does not even try to detect
conflicts: if a message has been modified both locally and remotely,
then the local version will be overwritten by the remote one, period. It
should also work with more than 2 hosts (but not tested yet). No sync
data is kept in the notmuch DB.

Right now all of this fits in about 250 lines of Python (could be made
shorter) and works quite well for me. I'll put it online after doing
some cleanup.


> Quoth Thomas Jost on Dec 13 at  6:11 pm:
> > This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> > when the message is added to the database, and is then updated every time a tag
> > is added or removed. It can thus be used for doing incremental dumps of the
> > database or for synchronizing it between several computers.
> > 
> > This value can be read freely (with notmuch_message_get_mtime()) but for now it
> > can't be set to an arbitrary value: it can only be set to "now" when updated.
> > There's no specific reason for this except that I don't really see a real use
> > case for setting it to an arbitrary value.
> > ---
> >  lib/database.cc       |    7 ++++++-
> >  lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
> >  lib/notmuch-private.h |    6 +++++-
> >  lib/notmuch.h         |    4 ++++
> >  4 files changed, 47 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/database.cc b/lib/database.cc
> > index 2025189..6dc6f73 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -81,7 +81,7 @@ typedef struct {
> >   *		        STRING is the name of a file within that
> >   *		        directory for this mail message.
> >   *
> > - *    A mail document also has four values:
> > + *    A mail document also has five values:
> >   *
> >   *	TIMESTAMP:	The time_t value corresponding to the message's
> >   *			Date header.
> > @@ -92,6 +92,9 @@ typedef struct {
> >   *
> >   *	SUBJECT:	The value of the "Subject" header
> >   *
> > + *	MTIME:		The time_t value corresponding to the last time
> > + *			a tag was added or removed on the message.
> > + *
> >   * In addition, terms from the content of the message are added with
> >   * "from", "to", "attachment", and "subject" prefixes for use by the
> >   * user in searching. Similarly, terms from the path of the mail
> > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> >  	    date = notmuch_message_file_get_header (message_file, "date");
> >  	    _notmuch_message_set_header_values (message, date, from, subject);
> >  
> > +            _notmuch_message_update_mtime (message);
> 
> Indentation.

Fixed, thanks.

> 
> > +
> >  	    _notmuch_message_index_file (message, filename);
> >  	} else {
> >  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> > diff --git a/lib/message.cc b/lib/message.cc
> > index 0075425..0c98589 100644
> > --- a/lib/message.cc
> > +++ b/lib/message.cc
> > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >      message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> >  }
> >  
> > +/* Get the message mtime, i.e. when it was added or the last time a tag was
> > + * added/removed. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message)
> > +{
> > +    std::string value;
> > +
> > +    try {
> > +	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
> > +    } catch (Xapian::Error &error) {
> > +	INTERNAL_ERROR ("Failed to read mtime value from document.");
> > +	return 0;
> > +    }
> 
> For compatibility, this should handle the case when
> NOTMUCH_VALUE_MTIME is missing, probably by just returning 0.  As it
> is, value will be an empty string and sortable_unserialise is
> undefined on strings that weren't produced by sortable_serialise.

Right. I think I rebuilt my DB just after implementing this, which
explains why I did not notice that myself. Thanks!

> > +
> > +    return Xapian::sortable_unserialise (value);
> > +}
> > +
> > +/* Set the message mtime to "now". */
> > +void
> > +_notmuch_message_update_mtime (notmuch_message_t *message)
> > +{
> > +    time_t time_value;
> > +
> > +    time_value = time (NULL);
> > +    message->doc.add_value (NOTMUCH_VALUE_MTIME,
> > +                            Xapian::sortable_serialise (time_value));
> 
> Indentation.

Noted too. It's really time I start using dtrt-indent.

> 
> > +}
> > +
> >  /* Synchronize changes made to message->doc out into the database. */
> >  void
> >  _notmuch_message_sync (notmuch_message_t *message)
> > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> > index 60a932f..9859872 100644
> > --- a/lib/notmuch-private.h
> > +++ b/lib/notmuch-private.h
> > @@ -95,7 +95,8 @@ typedef enum {
> >      NOTMUCH_VALUE_TIMESTAMP = 0,
> >      NOTMUCH_VALUE_MESSAGE_ID,
> >      NOTMUCH_VALUE_FROM,
> > -    NOTMUCH_VALUE_SUBJECT
> > +    NOTMUCH_VALUE_SUBJECT,
> > +    NOTMUCH_VALUE_MTIME
> >  } notmuch_value_t;
> >  
> >  /* Xapian (with flint backend) complains if we provide a term longer
> > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >  				    const char *from,
> >  				    const char *subject);
> >  void
> > +_notmuch_message_update_mtime (notmuch_message_t *message);
> > +
> > +void
> >  _notmuch_message_sync (notmuch_message_t *message);
> >  
> >  notmuch_status_t
> > diff --git a/lib/notmuch.h b/lib/notmuch.h
> > index 9f23a10..643ebce 100644
> > --- a/lib/notmuch.h
> > +++ b/lib/notmuch.h
> > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
> >  time_t
> >  notmuch_message_get_date  (notmuch_message_t *message);
> >  
> > +/* Get the mtime of 'message' as a time_t value. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message);
> > +
> >  /* Get the value of the specified header from 'message'.
> >   *
> >   * The value will be read from the actual message file, not from the

-- 
Thomas/Schnouki

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] Fix comments about what is stored in the database
  2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
@ 2011-12-23 19:10   ` David Bremner
  0 siblings, 0 replies; 17+ messages in thread
From: David Bremner @ 2011-12-23 19:10 UTC (permalink / raw)
  To: Thomas Jost, notmuch

On Tue, 13 Dec 2011 18:11:41 +0100, Thomas Jost <schnouki@schnouki.net> wrote:
> Commit 567bcbc2 introduced two new values for each message (content of the
> "From" and "Subject" headers), but the comments about the database schema had
> not been updated accordingly.

Pushed this one.

d

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] python: add get_mtime() to the Message class
  2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
@ 2012-01-02 14:56   ` Sebastian Spaeth
  0 siblings, 0 replies; 17+ messages in thread
From: Sebastian Spaeth @ 2012-01-02 14:56 UTC (permalink / raw)
  To: Thomas Jost, notmuch

[-- Attachment #1: Type: text/plain, Size: 304 bytes --]

On Tue, 13 Dec 2011 18:11:45 +0100, Thomas Jost <schnouki@schnouki.net> wrote:
> ---
>  bindings/python/notmuch/message.py |   20 ++++++++++++++++++++
>  1 files changed, 20 insertions(+), 0 deletions(-)

The patch looks good, so once this goes into libnotmuch, +1 for also
applying this one.

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-01-02 14:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
2011-12-23 19:10   ` David Bremner
2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
2011-12-14 21:54   ` Mark Anderson
2011-12-21  0:34     ` Thomas Jost
2011-12-15  0:45   ` Austin Clements
2011-12-21  1:00     ` Thomas Jost
2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost
2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost
2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
2012-01-02 14:56   ` Sebastian Spaeth
2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
2011-12-19 19:48   ` Austin Clements
2011-12-19 22:56     ` Tom Prince
2011-12-20  8:32     ` David Edmondson
2011-12-20 15:05       ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).