unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Austin Clements <amdragon@MIT.EDU>
To: David Edmondson <dme@dme.org>
Cc: notmuch@notmuchmail.org
Subject: Re: [PATCH 0/5] Store message modification times in the DB
Date: Tue, 20 Dec 2011 10:05:03 -0500	[thread overview]
Message-ID: <20111220150503.GE10376@mit.edu> (raw)
In-Reply-To: <cun8vm7zlqv.fsf@hotblack-desiato.hh.sledj.net>

Quoth David Edmondson on Dec 20 at  8:32 am:
> > == Two-way "merge" from host R to host L ==
> > 
> > Per-host state:
> > - last_mtime: Map from remote hosts to last sync mtime
> 
> With the proposed changes it seems that the state required on each host
> would live within the Xapian database (to be extracted with 'dump').

It certainly could.  I haven't thought about how any of this would
integrate with dump, or if it necessarily should.  A related question
is how bootstrap should work.  For example, if you add another host,
what's the best way to bring it up to speed without, say, overwriting
your tags everywhere with your initial tags?  In general, when a new
message arrives, how do you get the hosts to agree on its tags and
what happens if one host tags it before another host sees it?

> > new_mtime = last_mtime[R]
> > For msgid, mtime, tags in messages on host R with mtime >= last_mtime[R]:
> >   If mtime > local mtime of msgid:
> >     Set local tags of msgid to tags
> >   new_mtime = max(new_mtime, mtime)
> > last_mtime[R] = new_mtime
> > 
> > This has the advantage of keeping very little state, but the
> > synchronization is also quite primitive.  If two hosts change a
> > message's tags in different ways between synchronizations, the more
> > recent of the two will override the full set of tags on that message.
> > This does not strictly require tombstones, though if you make a tag
> > change and then delete the message before a sync, the tag change will
> > be lost without some record of that state.
> 
> Does this matter? If the tag on a deleted message is changed, does
> anyone care?

That depends on what sort of synchronization model you're expecting.
If you're expecting git-style synchronization where all that matters
is the state and not the order things happened in, then this is
exactly what you'd expect.  If you're expecting something more nuanced
that knows about the order you did things in across hosts between
synchronizations (which I think can only lead to more unintuitive
corner-cases, but some people seem to expect), then this could be
surprising.

> > Also, this obviously depends heavily on synchronized clocks.
> > 
> > 
> > == Three-way merge from host R to host L ==
> > 
> > Per-host state:
> > - last_mtime: Map from remote hosts to last sync mtime
> > - last_sync: Map from remote hosts to the tag database as of the last sync
> 
> Any ideas where this state might be kept?

It could also be stored in Xapian (in user keys or as additional
message metadata).  That would certainly be simplest and would avoid
hairy atomicity issues.  OTOH, it's not the end of the world if
last_sync doesn't get updated atomically, especially if we can at
least guarantee last_sync is fully updated and on disk before we
update last_mtime.

> > new_mtime = last_mtime[R]
> > for msgid, mtime, r_tags in messages on host R with mtime >= last_mtime[R]:
> >   my_tags = local tags of msgid
> >   last_tags = last_sync[R][msgid]
> >   for each tag that differs between my_tags and r_tags:
> >     if tag is in last_tags: remove tag locally
> >     else: add tag locally
> >   last_sync[R][msgid] = tags
> >   new_mtime = max(new_mtime, mtime)
> > Delete stale messages from last_sync[R] (using tombstones or something)
> > last_mtime[R] = new_mtime
> > 
> > This protocol requires significantly more state, but can also
> > reconstruct per-tag changes.  Conflict resolution is equivalent to
> > what git would do and is based solely on the current local and remote
> > state and the common ancestor state.  This can lead to unintuitive
> > results if a tag on a message has gone through multiple changes on
> > both hosts since the last sync (though, I argue, there are no
> > intuitive results in such situations).  Tombstones are only required
> > to garbage collect sync state (and other techniques could be used for
> > that).  This also does not depend on time synchronization (though,
> > like any mtime solution, it does depend on mtime monotonicity).  The
> > algorithm would work equally well with sequence numbers.
> > 
> > I tried coming up with a third algorithm that used mtimes to resolve
> > tagging conflicts, but without per-tag mtimes it degenerated into the
> > first algorithm.
> 
> dme.

      reply	other threads:[~2011-12-20 15:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-13 17:11 [PATCH 0/5] Store message modification times in the DB Thomas Jost
2011-12-13 17:11 ` [PATCH 1/5] Fix comments about what is stored in the database Thomas Jost
2011-12-23 19:10   ` David Bremner
2011-12-13 17:11 ` [PATCH 2/5] lib: Add a MTIME value to every mail document Thomas Jost
2011-12-14 21:54   ` Mark Anderson
2011-12-21  0:34     ` Thomas Jost
2011-12-15  0:45   ` Austin Clements
2011-12-21  1:00     ` Thomas Jost
2011-12-13 17:11 ` [PATCH 3/5] lib: Make MTIME values searchable Thomas Jost
2011-12-13 17:11 ` [PATCH 4/5] show: include mtime in JSON output Thomas Jost
2011-12-13 17:11 ` [PATCH 5/5] python: add get_mtime() to the Message class Thomas Jost
2012-01-02 14:56   ` Sebastian Spaeth
2011-12-19 16:34 ` [PATCH 0/5] Store message modification times in the DB David Edmondson
2011-12-19 19:48   ` Austin Clements
2011-12-19 22:56     ` Tom Prince
2011-12-20  8:32     ` David Edmondson
2011-12-20 15:05       ` Austin Clements [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111220150503.GE10376@mit.edu \
    --to=amdragon@mit.edu \
    --cc=dme@dme.org \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).