* [PATCH 0/5] Store message modification times in the DB @ 2011-12-13 17:11 Thomas Jost 2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost ` (5 more replies) 0 siblings, 6 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch Hello world, This is a patch series I've been working on for some time in order to be able to sync my tags on several computers. I'm posting it now, but please consider it as a RFC rather than something that is ready to be pushed. The basic idea is to the last time each message was modified, i.e. "the message was added to the DB", "a tag was added" or "a tag was removed". This mtime is accessible through a library function and in the JSON output of "notmuch show". It is also searchable with the "mtime:" prefix and with timestamp ranges, like for searching messages by date: notmuch search mtime:$(date +%s 2011-12-01)..$(date +%s) This can then be used in scripts or helper programs to do incremental dumps or tags synchronization. (I already have a script to do incremental backups, but it needs some cleaning, and I'm still working on something for sync'ing tags, but it's starting to work really well; I'll post them later). This can be seen as an alternative to David Bremner's jlog branch, but with several differences: + no external dependency + everything is stored in the notmuch DB: atomicity for free! - when a message is removed we lose everything about it, which makes the sync process more complicated - for a human, it's harder to manipulate timestamps than log messages - this can store much less data than a proper log system On IRC amdragon suggested using a simple sequence number instead of a timestamp. This would indeed eliminate the need for proper time synchronization between computers one would want to keep in sync, and it would reduce the risk of time-going-backward problems, but IMHO it would cause more problems: no global clock --> no simple way to tell if DB #A is more recent than DB #B. So, here are the patches: - first a little fix to the comments describing the DB schema (not specific to this patch series at all, I just noticed it when rebasing this series) - the second commit adds the MTIME value to the database schema, and creates the functions used to update and access this value. - the third commit makes the MTIME value searchable with a range syntax. - the fourth commit adds the MTIME to the JSON output of "notmuch show". - the fifth and last commit adds Message.get_mtime() to the Python bindings. Please tell me what you think of this. Best regards, Thomas Thomas Jost (5): Fix comments about what is stored in the database lib: Add a MTIME value to every mail document lib: Make MTIME values searchable show: include mtime in JSON output python: add get_mtime() to the Message class bindings/python/notmuch/message.py | 20 ++++++++++++++++++++ lib/database-private.h | 1 + lib/database.cc | 14 +++++++++++++- lib/message.cc | 32 ++++++++++++++++++++++++++++++++ lib/notmuch-private.h | 6 +++++- lib/notmuch.h | 4 ++++ notmuch-show.c | 7 ++++--- notmuch.1 | 14 ++++++++++++-- notmuch.c | 13 ++++++++++--- 9 files changed, 101 insertions(+), 10 deletions(-) -- 1.7.8 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/5] Fix comments about what is stored in the database 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost @ 2011-12-13 17:11 ` Thomas Jost 2011-12-23 19:10 ` David Bremner 2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost ` (4 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch Commit 567bcbc2 introduced two new values for each message (content of the "From" and "Subject" headers), but the comments about the database schema had not been updated accordingly. --- lib/database.cc | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 98f101e..2025189 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -81,13 +81,17 @@ typedef struct { * STRING is the name of a file within that * directory for this mail message. * - * A mail document also has two values: + * A mail document also has four values: * * TIMESTAMP: The time_t value corresponding to the message's * Date header. * * MESSAGE_ID: The unique ID of the mail mess (see "id" above) * + * FROM: The value of the "From" header + * + * SUBJECT: The value of the "Subject" header + * * In addition, terms from the content of the message are added with * "from", "to", "attachment", and "subject" prefixes for use by the * user in searching. Similarly, terms from the path of the mail -- 1.7.8 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/5] Fix comments about what is stored in the database 2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost @ 2011-12-23 19:10 ` David Bremner 0 siblings, 0 replies; 17+ messages in thread From: David Bremner @ 2011-12-23 19:10 UTC (permalink / raw) To: Thomas Jost, notmuch On Tue, 13 Dec 2011 18:11:41 +0100, Thomas Jost <schnouki@schnouki.net> wrote: > Commit 567bcbc2 introduced two new values for each message (content of the > "From" and "Subject" headers), but the comments about the database schema had > not been updated accordingly. Pushed this one. d ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/5] lib: Add a MTIME value to every mail document 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost 2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost @ 2011-12-13 17:11 ` Thomas Jost 2011-12-14 21:54 ` Mark Anderson 2011-12-15 0:45 ` Austin Clements 2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost ` (3 subsequent siblings) 5 siblings, 2 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch This is a time_t value, similar to the message date (TIMESTAMP). It is first set when the message is added to the database, and is then updated every time a tag is added or removed. It can thus be used for doing incremental dumps of the database or for synchronizing it between several computers. This value can be read freely (with notmuch_message_get_mtime()) but for now it can't be set to an arbitrary value: it can only be set to "now" when updated. There's no specific reason for this except that I don't really see a real use case for setting it to an arbitrary value. --- lib/database.cc | 7 ++++++- lib/message.cc | 32 ++++++++++++++++++++++++++++++++ lib/notmuch-private.h | 6 +++++- lib/notmuch.h | 4 ++++ 4 files changed, 47 insertions(+), 2 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 2025189..6dc6f73 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -81,7 +81,7 @@ typedef struct { * STRING is the name of a file within that * directory for this mail message. * - * A mail document also has four values: + * A mail document also has five values: * * TIMESTAMP: The time_t value corresponding to the message's * Date header. @@ -92,6 +92,9 @@ typedef struct { * * SUBJECT: The value of the "Subject" header * + * MTIME: The time_t value corresponding to the last time + * a tag was added or removed on the message. + * * In addition, terms from the content of the message are added with * "from", "to", "attachment", and "subject" prefixes for use by the * user in searching. Similarly, terms from the path of the mail @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch, date = notmuch_message_file_get_header (message_file, "date"); _notmuch_message_set_header_values (message, date, from, subject); + _notmuch_message_update_mtime (message); + _notmuch_message_index_file (message, filename); } else { ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; diff --git a/lib/message.cc b/lib/message.cc index 0075425..0c98589 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message, message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); } +/* Get the message mtime, i.e. when it was added or the last time a tag was + * added/removed. */ +time_t +notmuch_message_get_mtime (notmuch_message_t *message) +{ + std::string value; + + try { + value = message->doc.get_value (NOTMUCH_VALUE_MTIME); + } catch (Xapian::Error &error) { + INTERNAL_ERROR ("Failed to read mtime value from document."); + return 0; + } + + return Xapian::sortable_unserialise (value); +} + +/* Set the message mtime to "now". */ +void +_notmuch_message_update_mtime (notmuch_message_t *message) +{ + time_t time_value; + + time_value = time (NULL); + message->doc.add_value (NOTMUCH_VALUE_MTIME, + Xapian::sortable_serialise (time_value)); +} + /* Synchronize changes made to message->doc out into the database. */ void _notmuch_message_sync (notmuch_message_t *message) @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag) private_status); } + _notmuch_message_update_mtime (message); + if (! message->frozen) _notmuch_message_sync (message); @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag) private_status); } + _notmuch_message_update_mtime (message); + if (! message->frozen) _notmuch_message_sync (message); diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 60a932f..9859872 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -95,7 +95,8 @@ typedef enum { NOTMUCH_VALUE_TIMESTAMP = 0, NOTMUCH_VALUE_MESSAGE_ID, NOTMUCH_VALUE_FROM, - NOTMUCH_VALUE_SUBJECT + NOTMUCH_VALUE_SUBJECT, + NOTMUCH_VALUE_MTIME } notmuch_value_t; /* Xapian (with flint backend) complains if we provide a term longer @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message, const char *from, const char *subject); void +_notmuch_message_update_mtime (notmuch_message_t *message); + +void _notmuch_message_sync (notmuch_message_t *message); notmuch_status_t diff --git a/lib/notmuch.h b/lib/notmuch.h index 9f23a10..643ebce 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message, time_t notmuch_message_get_date (notmuch_message_t *message); +/* Get the mtime of 'message' as a time_t value. */ +time_t +notmuch_message_get_mtime (notmuch_message_t *message); + /* Get the value of the specified header from 'message'. * * The value will be read from the actual message file, not from the -- 1.7.8 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document 2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost @ 2011-12-14 21:54 ` Mark Anderson 2011-12-21 0:34 ` Thomas Jost 2011-12-15 0:45 ` Austin Clements 1 sibling, 1 reply; 17+ messages in thread From: Mark Anderson @ 2011-12-14 21:54 UTC (permalink / raw) To: Thomas Jost, notmuch@notmuchmail.org On Tue, 13 Dec 2011 11:11:42 -0600, Thomas Jost <schnouki@schnouki.net> wrote: > This is a time_t value, similar to the message date (TIMESTAMP). It is first set > when the message is added to the database, and is then updated every time a tag > is added or removed. It can thus be used for doing incremental dumps of the > database or for synchronizing it between several computers. > > This value can be read freely (with notmuch_message_get_mtime()) but for now it > can't be set to an arbitrary value: it can only be set to "now" when updated. > There's no specific reason for this except that I don't really see a real use > case for setting it to an arbitrary value. I think it would be easier to write some testcases if the last modified time could be touched directly. Perhaps they aren't in the set of "must have", but it's what comes to mind. -Mark > --- > lib/database.cc | 7 ++++++- > lib/message.cc | 32 ++++++++++++++++++++++++++++++++ > lib/notmuch-private.h | 6 +++++- > lib/notmuch.h | 4 ++++ > 4 files changed, 47 insertions(+), 2 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 2025189..6dc6f73 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -81,7 +81,7 @@ typedef struct { > * STRING is the name of a file within that > * directory for this mail message. > * > - * A mail document also has four values: > + * A mail document also has five values: > * > * TIMESTAMP: The time_t value corresponding to the message's > * Date header. > @@ -92,6 +92,9 @@ typedef struct { > * > * SUBJECT: The value of the "Subject" header > * > + * MTIME: The time_t value corresponding to the last time > + * a tag was added or removed on the message. > + * > * In addition, terms from the content of the message are added with > * "from", "to", "attachment", and "subject" prefixes for use by the > * user in searching. Similarly, terms from the path of the mail > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > date = notmuch_message_file_get_header (message_file, "date"); > _notmuch_message_set_header_values (message, date, from, subject); > > + _notmuch_message_update_mtime (message); > + > _notmuch_message_index_file (message, filename); > } else { > ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; > diff --git a/lib/message.cc b/lib/message.cc > index 0075425..0c98589 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > } > > +/* Get the message mtime, i.e. when it was added or the last time a tag was > + * added/removed. */ > +time_t > +notmuch_message_get_mtime (notmuch_message_t *message) > +{ > + std::string value; > + > + try { > + value = message->doc.get_value (NOTMUCH_VALUE_MTIME); > + } catch (Xapian::Error &error) { > + INTERNAL_ERROR ("Failed to read mtime value from document."); > + return 0; > + } > + > + return Xapian::sortable_unserialise (value); > +} > + > +/* Set the message mtime to "now". */ > +void > +_notmuch_message_update_mtime (notmuch_message_t *message) > +{ > + time_t time_value; > + > + time_value = time (NULL); > + message->doc.add_value (NOTMUCH_VALUE_MTIME, > + Xapian::sortable_serialise (time_value)); > +} > + > /* Synchronize changes made to message->doc out into the database. */ > void > _notmuch_message_sync (notmuch_message_t *message) > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag) > private_status); > } > > + _notmuch_message_update_mtime (message); > + > if (! message->frozen) > _notmuch_message_sync (message); > > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag) > private_status); > } > > + _notmuch_message_update_mtime (message); > + > if (! message->frozen) > _notmuch_message_sync (message); > > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > index 60a932f..9859872 100644 > --- a/lib/notmuch-private.h > +++ b/lib/notmuch-private.h > @@ -95,7 +95,8 @@ typedef enum { > NOTMUCH_VALUE_TIMESTAMP = 0, > NOTMUCH_VALUE_MESSAGE_ID, > NOTMUCH_VALUE_FROM, > - NOTMUCH_VALUE_SUBJECT > + NOTMUCH_VALUE_SUBJECT, > + NOTMUCH_VALUE_MTIME > } notmuch_value_t; > > /* Xapian (with flint backend) complains if we provide a term longer > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > const char *from, > const char *subject); > void > +_notmuch_message_update_mtime (notmuch_message_t *message); > + > +void > _notmuch_message_sync (notmuch_message_t *message); > > notmuch_status_t > diff --git a/lib/notmuch.h b/lib/notmuch.h > index 9f23a10..643ebce 100644 > --- a/lib/notmuch.h > +++ b/lib/notmuch.h > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message, > time_t > notmuch_message_get_date (notmuch_message_t *message); > > +/* Get the mtime of 'message' as a time_t value. */ > +time_t > +notmuch_message_get_mtime (notmuch_message_t *message); > + > /* Get the value of the specified header from 'message'. > * > * The value will be read from the actual message file, not from the > -- > 1.7.8 > > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document 2011-12-14 21:54 ` Mark Anderson @ 2011-12-21 0:34 ` Thomas Jost 0 siblings, 0 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-21 0:34 UTC (permalink / raw) To: Mark Anderson, notmuch@notmuchmail.org [-- Attachment #1: Type: text/plain, Size: 1194 bytes --] On Wed, 14 Dec 2011 14:54:10 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote: > On Tue, 13 Dec 2011 11:11:42 -0600, Thomas Jost <schnouki@schnouki.net> wrote: > > This is a time_t value, similar to the message date (TIMESTAMP). It is first set > > when the message is added to the database, and is then updated every time a tag > > is added or removed. It can thus be used for doing incremental dumps of the > > database or for synchronizing it between several computers. > > > > This value can be read freely (with notmuch_message_get_mtime()) but for now it > > can't be set to an arbitrary value: it can only be set to "now" when updated. > > There's no specific reason for this except that I don't really see a real use > > case for setting it to an arbitrary value. > > I think it would be easier to write some testcases if the last modified > time could be touched directly. Perhaps they aren't in the set of "must > have", but it's what comes to mind. Well since I posted this, I found other good reasons to have a set_mtime function. I'll post an updated series lated which will include it -- and possibly some tests too :) Thanks, -- Thomas/Schnouki [-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document 2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost 2011-12-14 21:54 ` Mark Anderson @ 2011-12-15 0:45 ` Austin Clements 2011-12-21 1:00 ` Thomas Jost 1 sibling, 1 reply; 17+ messages in thread From: Austin Clements @ 2011-12-15 0:45 UTC (permalink / raw) To: Thomas Jost; +Cc: notmuch A few minor comments below. At a higher level, I'm curious what the tag synchronization protocol you're building on top of this is. I can't think of one that doesn't have race conditions, but maybe I'm not thinking about it right. Quoth Thomas Jost on Dec 13 at 6:11 pm: > This is a time_t value, similar to the message date (TIMESTAMP). It is first set > when the message is added to the database, and is then updated every time a tag > is added or removed. It can thus be used for doing incremental dumps of the > database or for synchronizing it between several computers. > > This value can be read freely (with notmuch_message_get_mtime()) but for now it > can't be set to an arbitrary value: it can only be set to "now" when updated. > There's no specific reason for this except that I don't really see a real use > case for setting it to an arbitrary value. > --- > lib/database.cc | 7 ++++++- > lib/message.cc | 32 ++++++++++++++++++++++++++++++++ > lib/notmuch-private.h | 6 +++++- > lib/notmuch.h | 4 ++++ > 4 files changed, 47 insertions(+), 2 deletions(-) > > diff --git a/lib/database.cc b/lib/database.cc > index 2025189..6dc6f73 100644 > --- a/lib/database.cc > +++ b/lib/database.cc > @@ -81,7 +81,7 @@ typedef struct { > * STRING is the name of a file within that > * directory for this mail message. > * > - * A mail document also has four values: > + * A mail document also has five values: > * > * TIMESTAMP: The time_t value corresponding to the message's > * Date header. > @@ -92,6 +92,9 @@ typedef struct { > * > * SUBJECT: The value of the "Subject" header > * > + * MTIME: The time_t value corresponding to the last time > + * a tag was added or removed on the message. > + * > * In addition, terms from the content of the message are added with > * "from", "to", "attachment", and "subject" prefixes for use by the > * user in searching. Similarly, terms from the path of the mail > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > date = notmuch_message_file_get_header (message_file, "date"); > _notmuch_message_set_header_values (message, date, from, subject); > > + _notmuch_message_update_mtime (message); Indentation. > + > _notmuch_message_index_file (message, filename); > } else { > ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; > diff --git a/lib/message.cc b/lib/message.cc > index 0075425..0c98589 100644 > --- a/lib/message.cc > +++ b/lib/message.cc > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > } > > +/* Get the message mtime, i.e. when it was added or the last time a tag was > + * added/removed. */ > +time_t > +notmuch_message_get_mtime (notmuch_message_t *message) > +{ > + std::string value; > + > + try { > + value = message->doc.get_value (NOTMUCH_VALUE_MTIME); > + } catch (Xapian::Error &error) { > + INTERNAL_ERROR ("Failed to read mtime value from document."); > + return 0; > + } For compatibility, this should handle the case when NOTMUCH_VALUE_MTIME is missing, probably by just returning 0. As it is, value will be an empty string and sortable_unserialise is undefined on strings that weren't produced by sortable_serialise. > + > + return Xapian::sortable_unserialise (value); > +} > + > +/* Set the message mtime to "now". */ > +void > +_notmuch_message_update_mtime (notmuch_message_t *message) > +{ > + time_t time_value; > + > + time_value = time (NULL); > + message->doc.add_value (NOTMUCH_VALUE_MTIME, > + Xapian::sortable_serialise (time_value)); Indentation. > +} > + > /* Synchronize changes made to message->doc out into the database. */ > void > _notmuch_message_sync (notmuch_message_t *message) > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag) > private_status); > } > > + _notmuch_message_update_mtime (message); > + > if (! message->frozen) > _notmuch_message_sync (message); > > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag) > private_status); > } > > + _notmuch_message_update_mtime (message); > + > if (! message->frozen) > _notmuch_message_sync (message); > > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > index 60a932f..9859872 100644 > --- a/lib/notmuch-private.h > +++ b/lib/notmuch-private.h > @@ -95,7 +95,8 @@ typedef enum { > NOTMUCH_VALUE_TIMESTAMP = 0, > NOTMUCH_VALUE_MESSAGE_ID, > NOTMUCH_VALUE_FROM, > - NOTMUCH_VALUE_SUBJECT > + NOTMUCH_VALUE_SUBJECT, > + NOTMUCH_VALUE_MTIME > } notmuch_value_t; > > /* Xapian (with flint backend) complains if we provide a term longer > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > const char *from, > const char *subject); > void > +_notmuch_message_update_mtime (notmuch_message_t *message); > + > +void > _notmuch_message_sync (notmuch_message_t *message); > > notmuch_status_t > diff --git a/lib/notmuch.h b/lib/notmuch.h > index 9f23a10..643ebce 100644 > --- a/lib/notmuch.h > +++ b/lib/notmuch.h > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message, > time_t > notmuch_message_get_date (notmuch_message_t *message); > > +/* Get the mtime of 'message' as a time_t value. */ > +time_t > +notmuch_message_get_mtime (notmuch_message_t *message); > + > /* Get the value of the specified header from 'message'. > * > * The value will be read from the actual message file, not from the ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/5] lib: Add a MTIME value to every mail document 2011-12-15 0:45 ` Austin Clements @ 2011-12-21 1:00 ` Thomas Jost 0 siblings, 0 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-21 1:00 UTC (permalink / raw) To: Austin Clements; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 8832 bytes --] On Wed, 14 Dec 2011 19:45:07 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > A few minor comments below. > > At a higher level, I'm curious what the tag synchronization protocol > you're building on top of this is. I can't think of one that doesn't > have race conditions, but maybe I'm not thinking about it right. The approach I've used is quite different from what you described in id:"20111219194821.GA10376@mit.edu". I don't directly sync host A to host B but I use a server in the middle. (A is my laptop --> not always on, B is my work PC --> turned off when I'm out of office, so a direct sync would be harder to do). My nm-sync script is written in Python 2 (2.7, may work with 2.6) and is present on both my PCs and on my server. It can operate in two modes : client (when run from one of my PCs) or server (called *from the client* through ssh, running on my server). When running in server mode, the script manipulates a small DB stored as a Python dictionary (and stored on disk with the pickle module). It does not even need notmuch to be installed on the server. Here is what this DB looks like: { "lastseen": { "pc_A": 1324428029, "pc_B": 1323952028 }, "messages": { "msgid_001": (mtime, tag1, tag2, ..., tagN), "msgid_002": (mtime, tag1, tag2, ..., tagM), ... } } So when running the client, here is what happens: 1. client starts a subprocess: "ssh myserver ~/nm-sync server" 2. client and server check that their sha1sum match (to avoid version mismatch) 3. client identifies itself with its hostname ("pc_A" in the example above), server replies with its "lastseen" value and updates its in the DB 4. server sends to client messages with mtime > lastseen (msgid + mtime + tags), client updates the notmuch DB with these values 5. client queries the notmuch DB for messages with mtime > lastseen and sends them (msgid + mtime + tags) to the server, which stores them in the DB 6. cleanup: server removes messages with mtime < min(lastseen) from its DB So basically this approach assumes that all clocks are synchronized (everyone uses ntp, right?...) and does not even try to detect conflicts: if a message has been modified both locally and remotely, then the local version will be overwritten by the remote one, period. It should also work with more than 2 hosts (but not tested yet). No sync data is kept in the notmuch DB. Right now all of this fits in about 250 lines of Python (could be made shorter) and works quite well for me. I'll put it online after doing some cleanup. > Quoth Thomas Jost on Dec 13 at 6:11 pm: > > This is a time_t value, similar to the message date (TIMESTAMP). It is first set > > when the message is added to the database, and is then updated every time a tag > > is added or removed. It can thus be used for doing incremental dumps of the > > database or for synchronizing it between several computers. > > > > This value can be read freely (with notmuch_message_get_mtime()) but for now it > > can't be set to an arbitrary value: it can only be set to "now" when updated. > > There's no specific reason for this except that I don't really see a real use > > case for setting it to an arbitrary value. > > --- > > lib/database.cc | 7 ++++++- > > lib/message.cc | 32 ++++++++++++++++++++++++++++++++ > > lib/notmuch-private.h | 6 +++++- > > lib/notmuch.h | 4 ++++ > > 4 files changed, 47 insertions(+), 2 deletions(-) > > > > diff --git a/lib/database.cc b/lib/database.cc > > index 2025189..6dc6f73 100644 > > --- a/lib/database.cc > > +++ b/lib/database.cc > > @@ -81,7 +81,7 @@ typedef struct { > > * STRING is the name of a file within that > > * directory for this mail message. > > * > > - * A mail document also has four values: > > + * A mail document also has five values: > > * > > * TIMESTAMP: The time_t value corresponding to the message's > > * Date header. > > @@ -92,6 +92,9 @@ typedef struct { > > * > > * SUBJECT: The value of the "Subject" header > > * > > + * MTIME: The time_t value corresponding to the last time > > + * a tag was added or removed on the message. > > + * > > * In addition, terms from the content of the message are added with > > * "from", "to", "attachment", and "subject" prefixes for use by the > > * user in searching. Similarly, terms from the path of the mail > > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch, > > date = notmuch_message_file_get_header (message_file, "date"); > > _notmuch_message_set_header_values (message, date, from, subject); > > > > + _notmuch_message_update_mtime (message); > > Indentation. Fixed, thanks. > > > + > > _notmuch_message_index_file (message, filename); > > } else { > > ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; > > diff --git a/lib/message.cc b/lib/message.cc > > index 0075425..0c98589 100644 > > --- a/lib/message.cc > > +++ b/lib/message.cc > > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > > message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject); > > } > > > > +/* Get the message mtime, i.e. when it was added or the last time a tag was > > + * added/removed. */ > > +time_t > > +notmuch_message_get_mtime (notmuch_message_t *message) > > +{ > > + std::string value; > > + > > + try { > > + value = message->doc.get_value (NOTMUCH_VALUE_MTIME); > > + } catch (Xapian::Error &error) { > > + INTERNAL_ERROR ("Failed to read mtime value from document."); > > + return 0; > > + } > > For compatibility, this should handle the case when > NOTMUCH_VALUE_MTIME is missing, probably by just returning 0. As it > is, value will be an empty string and sortable_unserialise is > undefined on strings that weren't produced by sortable_serialise. Right. I think I rebuilt my DB just after implementing this, which explains why I did not notice that myself. Thanks! > > + > > + return Xapian::sortable_unserialise (value); > > +} > > + > > +/* Set the message mtime to "now". */ > > +void > > +_notmuch_message_update_mtime (notmuch_message_t *message) > > +{ > > + time_t time_value; > > + > > + time_value = time (NULL); > > + message->doc.add_value (NOTMUCH_VALUE_MTIME, > > + Xapian::sortable_serialise (time_value)); > > Indentation. Noted too. It's really time I start using dtrt-indent. > > > +} > > + > > /* Synchronize changes made to message->doc out into the database. */ > > void > > _notmuch_message_sync (notmuch_message_t *message) > > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag) > > private_status); > > } > > > > + _notmuch_message_update_mtime (message); > > + > > if (! message->frozen) > > _notmuch_message_sync (message); > > > > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag) > > private_status); > > } > > > > + _notmuch_message_update_mtime (message); > > + > > if (! message->frozen) > > _notmuch_message_sync (message); > > > > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h > > index 60a932f..9859872 100644 > > --- a/lib/notmuch-private.h > > +++ b/lib/notmuch-private.h > > @@ -95,7 +95,8 @@ typedef enum { > > NOTMUCH_VALUE_TIMESTAMP = 0, > > NOTMUCH_VALUE_MESSAGE_ID, > > NOTMUCH_VALUE_FROM, > > - NOTMUCH_VALUE_SUBJECT > > + NOTMUCH_VALUE_SUBJECT, > > + NOTMUCH_VALUE_MTIME > > } notmuch_value_t; > > > > /* Xapian (with flint backend) complains if we provide a term longer > > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message, > > const char *from, > > const char *subject); > > void > > +_notmuch_message_update_mtime (notmuch_message_t *message); > > + > > +void > > _notmuch_message_sync (notmuch_message_t *message); > > > > notmuch_status_t > > diff --git a/lib/notmuch.h b/lib/notmuch.h > > index 9f23a10..643ebce 100644 > > --- a/lib/notmuch.h > > +++ b/lib/notmuch.h > > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message, > > time_t > > notmuch_message_get_date (notmuch_message_t *message); > > > > +/* Get the mtime of 'message' as a time_t value. */ > > +time_t > > +notmuch_message_get_mtime (notmuch_message_t *message); > > + > > /* Get the value of the specified header from 'message'. > > * > > * The value will be read from the actual message file, not from the -- Thomas/Schnouki [-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3/5] lib: Make MTIME values searchable 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost 2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost 2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost @ 2011-12-13 17:11 ` Thomas Jost 2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost ` (2 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch Tag modification times are now searchable as ranges (just like regular message dates) with the "mtime:" prefix. --- lib/database-private.h | 1 + lib/database.cc | 3 +++ notmuch.1 | 14 ++++++++++++-- notmuch.c | 13 ++++++++++--- 4 files changed, 26 insertions(+), 5 deletions(-) diff --git a/lib/database-private.h b/lib/database-private.h index 88532d5..e71c8e4 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -52,6 +52,7 @@ struct _notmuch_database { Xapian::QueryParser *query_parser; Xapian::TermGenerator *term_gen; Xapian::ValueRangeProcessor *value_range_processor; + Xapian::ValueRangeProcessor *mtime_value_range_processor; }; /* Return the list of terms from the given iterator matching a prefix. diff --git a/lib/database.cc b/lib/database.cc index 6dc6f73..cc970c1 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -677,12 +677,14 @@ notmuch_database_open (const char *path, notmuch->term_gen = new Xapian::TermGenerator; notmuch->term_gen->set_stemmer (Xapian::Stem ("english")); notmuch->value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_TIMESTAMP); + notmuch->mtime_value_range_processor = new Xapian::NumberValueRangeProcessor (NOTMUCH_VALUE_MTIME, "mtime:"); notmuch->query_parser->set_default_op (Xapian::Query::OP_AND); notmuch->query_parser->set_database (*notmuch->xapian_db); notmuch->query_parser->set_stemmer (Xapian::Stem ("english")); notmuch->query_parser->set_stemming_strategy (Xapian::QueryParser::STEM_SOME); notmuch->query_parser->add_valuerangeprocessor (notmuch->value_range_processor); + notmuch->query_parser->add_valuerangeprocessor (notmuch->mtime_value_range_processor); for (i = 0; i < ARRAY_SIZE (BOOLEAN_PREFIX_EXTERNAL); i++) { prefix_t *prefix = &BOOLEAN_PREFIX_EXTERNAL[i]; @@ -726,6 +728,7 @@ notmuch_database_close (notmuch_database_t *notmuch) delete notmuch->query_parser; delete notmuch->xapian_db; delete notmuch->value_range_processor; + delete notmuch->mtime_value_range_processor; talloc_free (notmuch); } diff --git a/notmuch.1 b/notmuch.1 index 3dbd67e..2235096 100644 --- a/notmuch.1 +++ b/notmuch.1 @@ -644,6 +644,8 @@ terms to match against specific portions of an email, (where folder:<directory-path> + mtime:<timestamp-range> + The .B from: prefix is used to match the name or address of the sender of an email @@ -707,8 +709,8 @@ operators, but will have to be protected from interpretation by the shell, (such as by putting quotation marks around any parenthesized expression). -Finally, results can be restricted to only messages within a -particular time range, (based on the Date: header) with a syntax of: +Results can be restricted to only messages within a particular time range, +(based on the Date: header) with a syntax of: <initial-timestamp>..<final-timestamp> @@ -721,6 +723,14 @@ specify a date range to return messages from 2009\-10\-01 until the current time: $(date +%s \-d 2009\-10\-01)..$(date +%s) + +Finally, the +.B mtime: +prefix can be used to search for messages which were modified (e.g. tags were +added or removed) within a particular time range, with the same syntax as +before: + + mtime:<initial-timestamp>..<final-timestamp> .SH HOOKS Hooks are scripts (or arbitrary executables or symlinks to such) that notmuch invokes before and after certain actions. These scripts reside in diff --git a/notmuch.c b/notmuch.c index c0ce026..443cf59 100644 --- a/notmuch.c +++ b/notmuch.c @@ -71,6 +71,7 @@ static const char search_terms_help[] = "\t\tid:<message-id>\n" "\t\tthread:<thread-id>\n" "\t\tfolder:<directory-path>\n" + "\t\tmtime:<timestamp-range>\n" "\n" "\tThe from: prefix is used to match the name or address of\n" "\tthe sender of an email message.\n" @@ -112,8 +113,8 @@ static const char search_terms_help[] = "\tinterpretation by the shell, (such as by putting quotation\n" "\tmarks around any parenthesized expression).\n" "\n" - "\tFinally, results can be restricted to only messages within a\n" - "\tparticular time range, (based on the Date: header) with:\n" + "\tResults can be restricted to only messages within a particular\n" + "\ttime range, (based on the Date: header) with:\n" "\n" "\t\t<intial-timestamp>..<final-timestamp>\n" "\n" @@ -125,7 +126,13 @@ static const char search_terms_help[] = "\tfollowing syntax would specify a date range to return messages\n" "\tfrom 2009-10-01 until the current time:\n" "\n" - "\t\t$(date +%%s -d 2009-10-01)..$(date +%%s)\n\n"; + "\t\t$(date +%%s -d 2009-10-01)..$(date +%%s)\n\n" + "\n" + "\tFinally, the mtime: prefix can be used to search for messages\n" + "\twhich were modified (e.g. tags were added or removed) within a\n" + "\tparticular time range, with the same syntax as before:\n" + "\n" + "\t\tmtime:<initial-timestamp>..<final-timestamp>\n"; static const char hooks_help[] = "\tHooks are scripts (or arbitrary executables or symlinks to such) that\n" -- 1.7.8 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/5] show: include mtime in JSON output 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost ` (2 preceding siblings ...) 2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost @ 2011-12-13 17:11 ` Thomas Jost 2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost 2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson 5 siblings, 0 replies; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch This could be used by a UI implementation somehow. --- notmuch-show.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/notmuch-show.c b/notmuch-show.c index 873a7c4..7279601 100644 --- a/notmuch-show.c +++ b/notmuch-show.c @@ -202,17 +202,18 @@ format_message_json (const void *ctx, notmuch_message_t *message, unused (int in notmuch_tags_t *tags; int first = 1; void *ctx_quote = talloc_new (ctx); - time_t date; + time_t date, mtime; const char *relative_date; date = notmuch_message_get_date (message); relative_date = notmuch_time_relative_date (ctx, date); + mtime = notmuch_message_get_mtime (message); - printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, \"date_relative\": \"%s\", \"tags\": [", + printf ("\"id\": %s, \"match\": %s, \"filename\": %s, \"timestamp\": %ld, \"date_relative\": \"%s\", \"mtime\": %ld, \"tags\": [", json_quote_str (ctx_quote, notmuch_message_get_message_id (message)), notmuch_message_get_flag (message, NOTMUCH_MESSAGE_FLAG_MATCH) ? "true" : "false", json_quote_str (ctx_quote, notmuch_message_get_filename (message)), - date, relative_date); + date, relative_date, mtime); for (tags = notmuch_message_get_tags (message); notmuch_tags_valid (tags); -- 1.7.8 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/5] python: add get_mtime() to the Message class 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost ` (3 preceding siblings ...) 2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost @ 2011-12-13 17:11 ` Thomas Jost 2012-01-02 14:56 ` Sebastian Spaeth 2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson 5 siblings, 1 reply; 17+ messages in thread From: Thomas Jost @ 2011-12-13 17:11 UTC (permalink / raw) To: notmuch --- bindings/python/notmuch/message.py | 20 ++++++++++++++++++++ 1 files changed, 20 insertions(+), 0 deletions(-) diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py index ce8e718..56f56c2 100644 --- a/bindings/python/notmuch/message.py +++ b/bindings/python/notmuch/message.py @@ -293,6 +293,10 @@ class Message(object): _get_date.argtypes = [NotmuchMessageP] _get_date.restype = c_long + _get_mtime = nmlib.notmuch_message_get_mtime + _get_mtime.argtypes = [NotmuchMessageP] + _get_mtime.restype = c_long + _get_header = nmlib.notmuch_message_get_header _get_header.argtypes = [NotmuchMessageP, c_char_p] _get_header.restype = c_char_p @@ -401,6 +405,22 @@ class Message(object): raise NotmuchError(STATUS.NOT_INITIALIZED) return Message._get_date(self._msg) + def get_mtime(self): + """Returns time_t of the message mtime + + The mtime is the timestamp of the last time the message was modified, + e.g. the time it was added to the database or the last time a tag was + added or removed. + + :returns: A time_t timestamp. + :rtype: c_unit64 + :exception: :exc:`NotmuchError` STATUS.NOT_INITIALIZED if the message + is not initialized. + """ + if self._msg is None: + raise NotmuchError(STATUS.NOT_INITIALIZED) + return Message._get_mtime(self._msg) + def get_header(self, header): """Get the value of the specified header. -- 1.7.8 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 5/5] python: add get_mtime() to the Message class 2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost @ 2012-01-02 14:56 ` Sebastian Spaeth 0 siblings, 0 replies; 17+ messages in thread From: Sebastian Spaeth @ 2012-01-02 14:56 UTC (permalink / raw) To: Thomas Jost, notmuch [-- Attachment #1: Type: text/plain, Size: 304 bytes --] On Tue, 13 Dec 2011 18:11:45 +0100, Thomas Jost <schnouki@schnouki.net> wrote: > --- > bindings/python/notmuch/message.py | 20 ++++++++++++++++++++ > 1 files changed, 20 insertions(+), 0 deletions(-) The patch looks good, so once this goes into libnotmuch, +1 for also applying this one. Sebastian [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Store message modification times in the DB 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost ` (4 preceding siblings ...) 2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost @ 2011-12-19 16:34 ` David Edmondson 2011-12-19 19:48 ` Austin Clements 5 siblings, 1 reply; 17+ messages in thread From: David Edmondson @ 2011-12-19 16:34 UTC (permalink / raw) To: Thomas Jost, notmuch On Tue, 13 Dec 2011 18:11:40 +0100, Thomas Jost <schnouki@schnouki.net> wrote: > This is a patch series I've been working on for some time in order to be > able to sync my tags on several computers. I'm posting it now, but > please consider it as a RFC rather than something that is ready to be > pushed. > > The basic idea is to the last time each message was modified, i.e. "the > message was added to the DB", "a tag was added" or "a tag was removed". Thomas, this is interesting. Do you have a (back of the envelope?) design for how you will use this information to implement tag sync? My gut feeling is that we need a log of when a change occurred rather than the last modification time, but I haven't really thought that all through properly. dme. -- David Edmondson, http://dme.org ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Store message modification times in the DB 2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson @ 2011-12-19 19:48 ` Austin Clements 2011-12-19 22:56 ` Tom Prince 2011-12-20 8:32 ` David Edmondson 0 siblings, 2 replies; 17+ messages in thread From: Austin Clements @ 2011-12-19 19:48 UTC (permalink / raw) To: David Edmondson; +Cc: notmuch Quoth David Edmondson on Dec 19 at 4:34 pm: > On Tue, 13 Dec 2011 18:11:40 +0100, Thomas Jost <schnouki@schnouki.net> wrote: > > This is a patch series I've been working on for some time in order to be > > able to sync my tags on several computers. I'm posting it now, but > > please consider it as a RFC rather than something that is ready to be > > pushed. > > > > The basic idea is to the last time each message was modified, i.e. "the > > message was added to the DB", "a tag was added" or "a tag was removed". > > Thomas, this is interesting. Do you have a (back of the envelope?) > design for how you will use this information to implement tag sync? > > My gut feeling is that we need a log of when a change occurred rather > than the last modification time, but I haven't really thought that all > through properly. Here are sketches for two sync algorithms with different properties. I haven't proven these to be correct, but I believe they are. In both, R is the remote host and L is the local host. They're both one-way (they only update tags on L), but should be symmetrically stable. == Two-way "merge" from host R to host L == Per-host state: - last_mtime: Map from remote hosts to last sync mtime new_mtime = last_mtime[R] For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]: If mtime > local mtime of msgid: Set local tags of msgid to tags new_mtime = max(new_mtime, mtime) last_mtime[R] = new_mtime This has the advantage of keeping very little state, but the synchronization is also quite primitive. If two hosts change a message's tags in different ways between synchronizations, the more recent of the two will override the full set of tags on that message. This does not strictly require tombstones, though if you make a tag change and then delete the message before a sync, the tag change will be lost without some record of that state. Also, this obviously depends heavily on synchronized clocks. == Three-way merge from host R to host L == Per-host state: - last_mtime: Map from remote hosts to last sync mtime - last_sync: Map from remote hosts to the tag database as of the last sync new_mtime = last_mtime[R] for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]: my_tags = local tags of msgid last_tags = last_sync[R][msgid] for each tag that differs between my_tags and r_tags: if tag is in last_tags: remove tag locally else: add tag locally last_sync[R][msgid] = tags new_mtime = max(new_mtime, mtime) Delete stale messages from last_sync[R] (using tombstones or something) last_mtime[R] = new_mtime This protocol requires significantly more state, but can also reconstruct per-tag changes. Conflict resolution is equivalent to what git would do and is based solely on the current local and remote state and the common ancestor state. This can lead to unintuitive results if a tag on a message has gone through multiple changes on both hosts since the last sync (though, I argue, there are no intuitive results in such situations). Tombstones are only required to garbage collect sync state (and other techniques could be used for that). This also does not depend on time synchronization (though, like any mtime solution, it does depend on mtime monotonicity). The algorithm would work equally well with sequence numbers. I tried coming up with a third algorithm that used mtimes to resolve tagging conflicts, but without per-tag mtimes it degenerated into the first algorithm. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Store message modification times in the DB 2011-12-19 19:48 ` Austin Clements @ 2011-12-19 22:56 ` Tom Prince 2011-12-20 8:32 ` David Edmondson 1 sibling, 0 replies; 17+ messages in thread From: Tom Prince @ 2011-12-19 22:56 UTC (permalink / raw) To: Austin Clements, David Edmondson; +Cc: notmuch On Mon, 19 Dec 2011 14:48:21 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > This protocol requires significantly more state, but can also > reconstruct per-tag changes. Conflict resolution is equivalent to > what git would do and is based solely on the current local and remote > state and the common ancestor state. This seems like exactly what one would get if one stored the tag state in git, which seems like a reasonable thing to do anyway. > This can lead to unintuitive results if a tag on a message has gone > through multiple changes on both hosts since the last sync (though, I > argue, there are no intuitive results in such situations). I certainly agree that there isn't a universally good resolution to this. I suspect that the same person, making the same tag changes with the same mtimes, will want different resolutions at different times. This is because there is no good way to record the intent of the changes. > Tombstones are only required to garbage collect sync state (and other > techniques could be used for that). I wonder how many people using notmuch actually delete mail? I know I don't bother to, anymore. One use case that was mentioned, is having a limited amount of mail on a portable device, and syncing tags on those message present. Using git to record the tag state, one would just need to record the state before deleting files, to avoid the need for tombstones in the notmuch db. Tom ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Store message modification times in the DB 2011-12-19 19:48 ` Austin Clements 2011-12-19 22:56 ` Tom Prince @ 2011-12-20 8:32 ` David Edmondson 2011-12-20 15:05 ` Austin Clements 1 sibling, 1 reply; 17+ messages in thread From: David Edmondson @ 2011-12-20 8:32 UTC (permalink / raw) To: Austin Clements; +Cc: notmuch On Mon, 19 Dec 2011 14:48:21 -0500, Austin Clements <amdragon@MIT.EDU> wrote: > Here are sketches for two sync algorithms with different properties. > I haven't proven these to be correct, but I believe they are. In > both, R is the remote host and L is the local host. They're both > one-way (they only update tags on L), but should be symmetrically > stable. Thanks for these. > == Two-way "merge" from host R to host L == > > Per-host state: > - last_mtime: Map from remote hosts to last sync mtime With the proposed changes it seems that the state required on each host would live within the Xapian database (to be extracted with 'dump'). > new_mtime = last_mtime[R] > For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]: > If mtime > local mtime of msgid: > Set local tags of msgid to tags > new_mtime = max(new_mtime, mtime) > last_mtime[R] = new_mtime > > This has the advantage of keeping very little state, but the > synchronization is also quite primitive. If two hosts change a > message's tags in different ways between synchronizations, the more > recent of the two will override the full set of tags on that message. > This does not strictly require tombstones, though if you make a tag > change and then delete the message before a sync, the tag change will > be lost without some record of that state. Does this matter? If the tag on a deleted message is changed, does anyone care? > Also, this obviously depends heavily on synchronized clocks. > > > == Three-way merge from host R to host L == > > Per-host state: > - last_mtime: Map from remote hosts to last sync mtime > - last_sync: Map from remote hosts to the tag database as of the last sync Any ideas where this state might be kept? > new_mtime = last_mtime[R] > for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]: > my_tags = local tags of msgid > last_tags = last_sync[R][msgid] > for each tag that differs between my_tags and r_tags: > if tag is in last_tags: remove tag locally > else: add tag locally > last_sync[R][msgid] = tags > new_mtime = max(new_mtime, mtime) > Delete stale messages from last_sync[R] (using tombstones or something) > last_mtime[R] = new_mtime > > This protocol requires significantly more state, but can also > reconstruct per-tag changes. Conflict resolution is equivalent to > what git would do and is based solely on the current local and remote > state and the common ancestor state. This can lead to unintuitive > results if a tag on a message has gone through multiple changes on > both hosts since the last sync (though, I argue, there are no > intuitive results in such situations). Tombstones are only required > to garbage collect sync state (and other techniques could be used for > that). This also does not depend on time synchronization (though, > like any mtime solution, it does depend on mtime monotonicity). The > algorithm would work equally well with sequence numbers. > > I tried coming up with a third algorithm that used mtimes to resolve > tagging conflicts, but without per-tag mtimes it degenerated into the > first algorithm. dme. -- David Edmondson, http://dme.org ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/5] Store message modification times in the DB 2011-12-20 8:32 ` David Edmondson @ 2011-12-20 15:05 ` Austin Clements 0 siblings, 0 replies; 17+ messages in thread From: Austin Clements @ 2011-12-20 15:05 UTC (permalink / raw) To: David Edmondson; +Cc: notmuch Quoth David Edmondson on Dec 20 at 8:32 am: > > == Two-way "merge" from host R to host L == > > > > Per-host state: > > - last_mtime: Map from remote hosts to last sync mtime > > With the proposed changes it seems that the state required on each host > would live within the Xapian database (to be extracted with 'dump'). It certainly could. I haven't thought about how any of this would integrate with dump, or if it necessarily should. A related question is how bootstrap should work. For example, if you add another host, what's the best way to bring it up to speed without, say, overwriting your tags everywhere with your initial tags? In general, when a new message arrives, how do you get the hosts to agree on its tags and what happens if one host tags it before another host sees it? > > new_mtime = last_mtime[R] > > For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]: > > If mtime > local mtime of msgid: > > Set local tags of msgid to tags > > new_mtime = max(new_mtime, mtime) > > last_mtime[R] = new_mtime > > > > This has the advantage of keeping very little state, but the > > synchronization is also quite primitive. If two hosts change a > > message's tags in different ways between synchronizations, the more > > recent of the two will override the full set of tags on that message. > > This does not strictly require tombstones, though if you make a tag > > change and then delete the message before a sync, the tag change will > > be lost without some record of that state. > > Does this matter? If the tag on a deleted message is changed, does > anyone care? That depends on what sort of synchronization model you're expecting. If you're expecting git-style synchronization where all that matters is the state and not the order things happened in, then this is exactly what you'd expect. If you're expecting something more nuanced that knows about the order you did things in across hosts between synchronizations (which I think can only lead to more unintuitive corner-cases, but some people seem to expect), then this could be surprising. > > Also, this obviously depends heavily on synchronized clocks. > > > > > > == Three-way merge from host R to host L == > > > > Per-host state: > > - last_mtime: Map from remote hosts to last sync mtime > > - last_sync: Map from remote hosts to the tag database as of the last sync > > Any ideas where this state might be kept? It could also be stored in Xapian (in user keys or as additional message metadata). That would certainly be simplest and would avoid hairy atomicity issues. OTOH, it's not the end of the world if last_sync doesn't get updated atomically, especially if we can at least guarantee last_sync is fully updated and on disk before we update last_mtime. > > new_mtime = last_mtime[R] > > for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]: > > my_tags = local tags of msgid > > last_tags = last_sync[R][msgid] > > for each tag that differs between my_tags and r_tags: > > if tag is in last_tags: remove tag locally > > else: add tag locally > > last_sync[R][msgid] = tags > > new_mtime = max(new_mtime, mtime) > > Delete stale messages from last_sync[R] (using tombstones or something) > > last_mtime[R] = new_mtime > > > > This protocol requires significantly more state, but can also > > reconstruct per-tag changes. Conflict resolution is equivalent to > > what git would do and is based solely on the current local and remote > > state and the common ancestor state. This can lead to unintuitive > > results if a tag on a message has gone through multiple changes on > > both hosts since the last sync (though, I argue, there are no > > intuitive results in such situations). Tombstones are only required > > to garbage collect sync state (and other techniques could be used for > > that). This also does not depend on time synchronization (though, > > like any mtime solution, it does depend on mtime monotonicity). The > > algorithm would work equally well with sequence numbers. > > > > I tried coming up with a third algorithm that used mtimes to resolve > > tagging conflicts, but without per-tag mtimes it degenerated into the > > first algorithm. > > dme. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-01-02 14:57 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost 2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost 2011-12-23 19:10 ` David Bremner 2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost 2011-12-14 21:54 ` Mark Anderson 2011-12-21 0:34 ` Thomas Jost 2011-12-15 0:45 ` Austin Clements 2011-12-21 1:00 ` Thomas Jost 2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost 2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost 2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost 2012-01-02 14:56 ` Sebastian Spaeth 2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson 2011-12-19 19:48 ` Austin Clements 2011-12-19 22:56 ` Tom Prince 2011-12-20 8:32 ` David Edmondson 2011-12-20 15:05 ` Austin Clements
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).