unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Thomas Jost <schnouki@schnouki.net>
To: Austin Clements <amdragon@MIT.EDU>
Cc: notmuch@notmuchmail.org
Subject: Re: [PATCH 2/5] lib: Add a MTIME value to every mail document
Date: Wed, 21 Dec 2011 02:00:53 +0100	[thread overview]
Message-ID: <878vm6agca.fsf@schnouki.net> (raw)
In-Reply-To: <20111215004507.GF2760@mit.edu>

[-- Attachment #1: Type: text/plain, Size: 8832 bytes --]

On Wed, 14 Dec 2011 19:45:07 -0500, Austin Clements <amdragon@MIT.EDU> wrote:
> A few minor comments below.
> 
> At a higher level, I'm curious what the tag synchronization protocol
> you're building on top of this is.  I can't think of one that doesn't
> have race conditions, but maybe I'm not thinking about it right.

The approach I've used is quite different from what you described in
id:"20111219194821.GA10376@mit.edu". I don't directly sync host A to
host B but I use a server in the middle. (A is my laptop --> not always
on, B is my work PC --> turned off when I'm out of office, so a direct
sync would be harder to do).

My nm-sync script is written in Python 2 (2.7, may work with 2.6) and is
present on both my PCs and on my server. It can operate in two modes :
client (when run from one of my PCs) or server (called *from the client*
through ssh, running on my server).

When running in server mode, the script manipulates a small DB stored as
a Python dictionary (and stored on disk with the pickle module). It does
not even need notmuch to be installed on the server. Here is what this
DB looks like:
  {
    "lastseen": {
      "pc_A": 1324428029,
      "pc_B": 1323952028
    },
    "messages": {
      "msgid_001": (mtime, tag1, tag2, ..., tagN),
      "msgid_002": (mtime, tag1, tag2, ..., tagM),
      ...
    }
  }

So when running the client, here is what happens:
1. client starts a subprocess: "ssh myserver ~/nm-sync server"
2. client and server check that their sha1sum match (to avoid version
   mismatch)
3. client identifies itself with its hostname ("pc_A" in the example
   above), server replies with its "lastseen" value and updates its in
   the DB
4. server sends to client messages with mtime > lastseen (msgid + mtime
   + tags), client updates the notmuch DB with these values
5. client queries the notmuch DB for messages with mtime > lastseen and
   sends them (msgid + mtime + tags) to the server, which stores them in
   the DB
6. cleanup: server removes messages with mtime < min(lastseen) from its
   DB

So basically this approach assumes that all clocks are synchronized
(everyone uses ntp, right?...) and does not even try to detect
conflicts: if a message has been modified both locally and remotely,
then the local version will be overwritten by the remote one, period. It
should also work with more than 2 hosts (but not tested yet). No sync
data is kept in the notmuch DB.

Right now all of this fits in about 250 lines of Python (could be made
shorter) and works quite well for me. I'll put it online after doing
some cleanup.


> Quoth Thomas Jost on Dec 13 at  6:11 pm:
> > This is a time_t value, similar to the message date (TIMESTAMP). It is first set
> > when the message is added to the database, and is then updated every time a tag
> > is added or removed. It can thus be used for doing incremental dumps of the
> > database or for synchronizing it between several computers.
> > 
> > This value can be read freely (with notmuch_message_get_mtime()) but for now it
> > can't be set to an arbitrary value: it can only be set to "now" when updated.
> > There's no specific reason for this except that I don't really see a real use
> > case for setting it to an arbitrary value.
> > ---
> >  lib/database.cc       |    7 ++++++-
> >  lib/message.cc        |   32 ++++++++++++++++++++++++++++++++
> >  lib/notmuch-private.h |    6 +++++-
> >  lib/notmuch.h         |    4 ++++
> >  4 files changed, 47 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/database.cc b/lib/database.cc
> > index 2025189..6dc6f73 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -81,7 +81,7 @@ typedef struct {
> >   *		        STRING is the name of a file within that
> >   *		        directory for this mail message.
> >   *
> > - *    A mail document also has four values:
> > + *    A mail document also has five values:
> >   *
> >   *	TIMESTAMP:	The time_t value corresponding to the message's
> >   *			Date header.
> > @@ -92,6 +92,9 @@ typedef struct {
> >   *
> >   *	SUBJECT:	The value of the "Subject" header
> >   *
> > + *	MTIME:		The time_t value corresponding to the last time
> > + *			a tag was added or removed on the message.
> > + *
> >   * In addition, terms from the content of the message are added with
> >   * "from", "to", "attachment", and "subject" prefixes for use by the
> >   * user in searching. Similarly, terms from the path of the mail
> > @@ -1735,6 +1738,8 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
> >  	    date = notmuch_message_file_get_header (message_file, "date");
> >  	    _notmuch_message_set_header_values (message, date, from, subject);
> >  
> > +            _notmuch_message_update_mtime (message);
> 
> Indentation.

Fixed, thanks.

> 
> > +
> >  	    _notmuch_message_index_file (message, filename);
> >  	} else {
> >  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> > diff --git a/lib/message.cc b/lib/message.cc
> > index 0075425..0c98589 100644
> > --- a/lib/message.cc
> > +++ b/lib/message.cc
> > @@ -830,6 +830,34 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >      message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> >  }
> >  
> > +/* Get the message mtime, i.e. when it was added or the last time a tag was
> > + * added/removed. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message)
> > +{
> > +    std::string value;
> > +
> > +    try {
> > +	value = message->doc.get_value (NOTMUCH_VALUE_MTIME);
> > +    } catch (Xapian::Error &error) {
> > +	INTERNAL_ERROR ("Failed to read mtime value from document.");
> > +	return 0;
> > +    }
> 
> For compatibility, this should handle the case when
> NOTMUCH_VALUE_MTIME is missing, probably by just returning 0.  As it
> is, value will be an empty string and sortable_unserialise is
> undefined on strings that weren't produced by sortable_serialise.

Right. I think I rebuilt my DB just after implementing this, which
explains why I did not notice that myself. Thanks!

> > +
> > +    return Xapian::sortable_unserialise (value);
> > +}
> > +
> > +/* Set the message mtime to "now". */
> > +void
> > +_notmuch_message_update_mtime (notmuch_message_t *message)
> > +{
> > +    time_t time_value;
> > +
> > +    time_value = time (NULL);
> > +    message->doc.add_value (NOTMUCH_VALUE_MTIME,
> > +                            Xapian::sortable_serialise (time_value));
> 
> Indentation.

Noted too. It's really time I start using dtrt-indent.

> 
> > +}
> > +
> >  /* Synchronize changes made to message->doc out into the database. */
> >  void
> >  _notmuch_message_sync (notmuch_message_t *message)
> > @@ -994,6 +1022,8 @@ notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > @@ -1022,6 +1052,8 @@ notmuch_message_remove_tag (notmuch_message_t *message, const char *tag)
> >  			private_status);
> >      }
> >  
> > +    _notmuch_message_update_mtime (message);
> > +
> >      if (! message->frozen)
> >  	_notmuch_message_sync (message);
> >  
> > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> > index 60a932f..9859872 100644
> > --- a/lib/notmuch-private.h
> > +++ b/lib/notmuch-private.h
> > @@ -95,7 +95,8 @@ typedef enum {
> >      NOTMUCH_VALUE_TIMESTAMP = 0,
> >      NOTMUCH_VALUE_MESSAGE_ID,
> >      NOTMUCH_VALUE_FROM,
> > -    NOTMUCH_VALUE_SUBJECT
> > +    NOTMUCH_VALUE_SUBJECT,
> > +    NOTMUCH_VALUE_MTIME
> >  } notmuch_value_t;
> >  
> >  /* Xapian (with flint backend) complains if we provide a term longer
> > @@ -276,6 +277,9 @@ _notmuch_message_set_header_values (notmuch_message_t *message,
> >  				    const char *from,
> >  				    const char *subject);
> >  void
> > +_notmuch_message_update_mtime (notmuch_message_t *message);
> > +
> > +void
> >  _notmuch_message_sync (notmuch_message_t *message);
> >  
> >  notmuch_status_t
> > diff --git a/lib/notmuch.h b/lib/notmuch.h
> > index 9f23a10..643ebce 100644
> > --- a/lib/notmuch.h
> > +++ b/lib/notmuch.h
> > @@ -910,6 +910,10 @@ notmuch_message_set_flag (notmuch_message_t *message,
> >  time_t
> >  notmuch_message_get_date  (notmuch_message_t *message);
> >  
> > +/* Get the mtime of 'message' as a time_t value. */
> > +time_t
> > +notmuch_message_get_mtime (notmuch_message_t *message);
> > +
> >  /* Get the value of the specified header from 'message'.
> >   *
> >   * The value will be read from the actual message file, not from the

-- 
Thomas/Schnouki

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

  reply	other threads:[~2011-12-21  1:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
2011-12-23 19:10   ` David Bremner
2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
2011-12-14 21:54   ` Mark Anderson
2011-12-21  0:34     ` Thomas Jost
2011-12-15  0:45   ` Austin Clements
2011-12-21  1:00     ` Thomas Jost [this message]
2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost
2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost
2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
2012-01-02 14:56   ` Sebastian Spaeth
2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
2011-12-19 19:48   ` Austin Clements
2011-12-19 22:56     ` Tom Prince
2011-12-20  8:32     ` David Edmondson
2011-12-20 15:05       ` Austin Clements

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878vm6agca.fsf@schnouki.net \
    --to=schnouki@schnouki.net \
    --cc=amdragon@MIT.EDU \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).