unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Synchronization success stories?
@ 2014-04-10 18:51 Brian Sniffen
  2014-04-11 11:21 ` David Bremner
  2014-04-13 11:53 ` Tilmann Singer
  0 siblings, 2 replies; 10+ messages in thread
From: Brian Sniffen @ 2014-04-10 18:51 UTC (permalink / raw)
  To: notmuch

I'm thrilled by using notmuch to manage my mail.  Low-latency search is
very important to me.  But I use computers in a couple of
places---several of which are laptops.  Has anyone stories to share of
successful multi-computer notmuch sync, for a corpus of a
quarter-million messages or so?  

I've tried offlineimap---it (and my Exchange sever) get grouchy with
mailboxes of that size.  I tried keeping ~/Maildir/ in Google Drive; it
took weeks to do the initial sync and I gave up.

I'm trying bittorrent-sync now, with no obivous failures.

-Brian

-- 
Brian Sniffen
Information Security
Akamai Technologies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-10 18:51 Synchronization success stories? Brian Sniffen
@ 2014-04-11 11:21 ` David Bremner
  2014-04-11 18:02   ` David Mazieres
  2014-04-13 11:53 ` Tilmann Singer
  1 sibling, 1 reply; 10+ messages in thread
From: David Bremner @ 2014-04-11 11:21 UTC (permalink / raw)
  To: Brian Sniffen, notmuch

Brian Sniffen <bsniffen@akamai.com> writes:

> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
> very important to me.  But I use computers in a couple of
> places---several of which are laptops.  Has anyone stories to share of
> successful multi-computer notmuch sync, for a corpus of a
> quarter-million messages or so?  

I use syncmaildir to sync the actual messages, and a copy of the output
of "notmuch dump" in git to sync the metadata.

It works OK. A bit slow; depends how often you need to fetch new mail.

d

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-11 11:21 ` David Bremner
@ 2014-04-11 18:02   ` David Mazieres
  2014-04-14  3:56     ` Brian Sniffen
  0 siblings, 1 reply; 10+ messages in thread
From: David Mazieres @ 2014-04-11 18:02 UTC (permalink / raw)
  To: David Bremner, Brian Sniffen, notmuch

David Bremner <david@tethera.net> writes:

> Brian Sniffen <bsniffen@akamai.com> writes:
>
>> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
>> very important to me.  But I use computers in a couple of
>> places---several of which are laptops.  Has anyone stories to share of
>> successful multi-computer notmuch sync, for a corpus of a
>> quarter-million messages or so?  
>
> I use syncmaildir to sync the actual messages, and a copy of the output
> of "notmuch dump" in git to sync the metadata.
>
> It works OK. A bit slow; depends how often you need to fetch new mail.

If you want to see my solution, it is here:

        http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

I'm a little embarrassed by this code, as I just started to test it a
week ago then instantly became completely dependent on it.  I will
probably change the name (from muchsync to syncmuch) and the database
format before releasing.  But if you feel like beta-testing and giving
me feedback, have a look.

Beware that if you have been using notmuch dump, you may become
instantly hooked on my solution...

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-10 18:51 Synchronization success stories? Brian Sniffen
  2014-04-11 11:21 ` David Bremner
@ 2014-04-13 11:53 ` Tilmann Singer
  2014-04-13 12:51   ` David Bremner
  2014-04-13 15:23   ` David Mazieres
  1 sibling, 2 replies; 10+ messages in thread
From: Tilmann Singer @ 2014-04-13 11:53 UTC (permalink / raw)
  To: Brian Sniffen, notmuch

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

I have experimented with offlineimap, syncmaildir and rsync.  The
append-only approach of notmuch makes synchronization of the mail corpus
simpler, so there are lots of options.  With ssh access to the server, I
found rsync to be conceptually the simplest, but it turned out to be too
slow for me (with ~110k mails) when frequently checking for new mails.

What I have settled with is a hacked together ruby script that uses the
notmuch command line both on the server and on the client to determine
unsynced mails, and then runs rsync explicitely for the necessary files.

The notmuch index on the server is only used to find new files for this
synchronization process, and is different from the notmuch indexes I
have on my client machines.

A prerequisite for this is of course ssh access and the ability to set
up notmuch on the server.

The steps performed on a sync run are roughly like this:

- local: notmuch new
- local: notmuch search --output=messages <some time ago>..<now>
- remote: notmuch new
- remote: notmuch search --output=messages <some time ago>..<now>
- compare search results
- run rsync for mails that only exist locally
  (using notmuch search --output=files to get the filenames)
- run rsync for mails that only exist remotely
  (using notmuch search --output=files to get the filenames)

With a reused ssh connection this is sufficiently fast for me (<2s).  If
there is interest I can clean up the script of hardcoded paths etc. and
put it on github.

Synchronization of the notmuch tags database is only necessary when I
switch between different client computers, which happens less
frequently. Like David I have a dump file committed to git for that. I
found it useful to sort the output before adding it to git, to avoid
huge unreadable diffs:

notmuch dump --format=batch-tag | sort > /path/to/notmuch.dump


Til

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-13 11:53 ` Tilmann Singer
@ 2014-04-13 12:51   ` David Bremner
  2014-04-15 22:55     ` Tilmann Singer
  2014-04-13 15:23   ` David Mazieres
  1 sibling, 1 reply; 10+ messages in thread
From: David Bremner @ 2014-04-13 12:51 UTC (permalink / raw)
  To: Tilmann Singer, Brian Sniffen, notmuch

Tilmann Singer <tils@tils.net> writes:

>
> With a reused ssh connection this is sufficiently fast for me (<2s).  If
> there is interest I can clean up the script of hardcoded paths etc. and
> put it on github.

Sure, sounds at least as good as what I am using. Also, syncmaildir
recently did something pretty annoying for upward compatibility, so in
the long term I'm interested in alternatives.

Maybe we should start a notmuchmail.org wiki page with the various
solutions on it.

> Synchronization of the notmuch tags database is only necessary when I
> switch between different client computers, which happens less
> frequently. Like David I have a dump file committed to git for that. I
> found it useful to sort the output before adding it to git, to avoid
> huge unreadable diffs:
>
> notmuch dump --format=batch-tag | sort > /path/to/notmuch.dump

same here.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-13 11:53 ` Tilmann Singer
  2014-04-13 12:51   ` David Bremner
@ 2014-04-13 15:23   ` David Mazieres
  2014-04-13 19:08     ` Tilmann Singer
  1 sibling, 1 reply; 10+ messages in thread
From: David Mazieres @ 2014-04-13 15:23 UTC (permalink / raw)
  To: Tilmann Singer, Brian Sniffen, notmuch

Tilmann Singer <tils@tils.net> writes:

> The steps performed on a sync run are roughly like this:
>
> - local: notmuch new
> - local: notmuch search --output=messages <some time ago>..<now>
> - remote: notmuch new
> - remote: notmuch search --output=messages <some time ago>..<now>
> - compare search results
> - run rsync for mails that only exist locally
>   (using notmuch search --output=files to get the filenames)
> - run rsync for mails that only exist remotely
>   (using notmuch search --output=files to get the filenames)

What happens if you get a message that's been stuck in a queue for a few
days and has an old Date: header?  Or if you get new messages that have
the same Message-ID as old ones?

> Synchronization of the notmuch tags database is only necessary when I
> switch between different client computers, which happens less
> frequently.

Do you use a laptop everywhere?  I've found that for switching between
my desktop machine at home, my laptop on the train, and my desktop at
work (which amounts to five switches a day), the notmuch dump time is
painfully slow--like well over 10 seconds for 100,000 messages.  Hook
that into notmuch-poll and you have a recipe for hanging emacs every
time you type "G".

Of course, I'm also experiencing the problem of "notmuch new" itself
being painfully slow, but at least that's now my bottleneck in switching
machines.  I suspect the source of the notmuch new problem is that I
have some huge, huge mailboxes.  Some of my maildir/cur directories are
multiple megabytes on a BSD FFS file system (no hashing, so linear
filename lookups in something that doesn't fit in the dcache).  On linux
ext4 things are much faster.  I intend to reorganize my maildir so that
there is a top-level directory with the year and hence no single
directory ever contains mail from more than one year.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-13 15:23   ` David Mazieres
@ 2014-04-13 19:08     ` Tilmann Singer
  2014-04-13 19:52       ` dm-list-email-notmuch
  0 siblings, 1 reply; 10+ messages in thread
From: Tilmann Singer @ 2014-04-13 19:08 UTC (permalink / raw)
  To: David Mazieres expires 2014-07-12 PDT, Brian Sniffen, notmuch

[-- Attachment #1: Type: text/plain, Size: 2117 bytes --]

David Mazieres <dm-list-email-notmuch@scs.stanford.edu> writes:
> What happens if you get a message that's been stuck in a queue for a few
> days and has an old Date: header?

It would be missed.  I have set the timespan to look backwards for new
mail to one month to be a bit safer against the stuck-in-queue cases,
but mails with older Date: headers would definitely get missed.

The current output of notmuch count "*" is the same on both the client
and the server, so it seems I didn't run into this problem yet (maybe I
was just lucky).

> Or if you get new messages that have
> the same Message-ID as old ones?

Is that even possible?  I thought that notmuch guarantees the uniqueness
of indexed message ids.  The only reference I could find without trying
to read the code was this thread id:87mwyz3s9d.fsf@star.eba from 2012,
which supports the assumption.

>> Synchronization of the notmuch tags database is only necessary when I
>> switch between different client computers, which happens less
>> frequently.
>
> Do you use a laptop everywhere?  I've found that for switching between
> my desktop machine at home, my laptop on the train, and my desktop at
> work (which amounts to five switches a day), the notmuch dump time is
> painfully slow--like well over 10 seconds for 100,000 messages.  Hook
> that into notmuch-poll and you have a recipe for hanging emacs every
> time you type "G".

I have one laptop and one desktop and switch between them almost daily,
and run a hibernate script that does notmuch dump + git push, and a
resume script that does git pull + notmuch restore.  For hibernate /
resume the speed of those operations is acceptable, but I wouldn't want
to incur that wait for every time checking for new mail.

Here is how long they take (on a machine with an SSD, which certainly
helps):

$ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump
real    0m3.643s
user    0m3.593s
sys     0m0.140s
$ time notmuch restore < /tmp/notmuch.dump
real    0m3.719s
user    0m3.357s
sys     0m0.357s
$ notmuch count 
117118



Til

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-13 19:08     ` Tilmann Singer
@ 2014-04-13 19:52       ` dm-list-email-notmuch
  0 siblings, 0 replies; 10+ messages in thread
From: dm-list-email-notmuch @ 2014-04-13 19:52 UTC (permalink / raw)
  To: Tilmann Singer, Brian Sniffen, notmuch

Tilmann Singer <tils@tils.net> writes:

> David Mazieres <dm-list-email-notmuch@scs.stanford.edu> writes:
>> What happens if you get a message that's been stuck in a queue for a few
>> days and has an old Date: header?
>
> It would be missed.  I have set the timespan to look backwards for new
> mail to one month to be a bit safer against the stuck-in-queue cases,
> but mails with older Date: headers would definitely get missed.
>
> The current output of notmuch count "*" is the same on both the client
> and the server, so it seems I didn't run into this problem yet (maybe I
> was just lucky).

I've been playing around with reorganizing my maildir, and found a
couple of messages (on mailing lists) with clearly invalid dates years
in the past.  But checking with notmuch count is a good idea.  Then you
can always fall back to the slow path in the unlikely event that your
counts don't match up.  Well, except that A) count is just unique
message-IDs, not messages, and B) when synchronizing in both directions
you could still miss something.  You have to assume that the invalid
dates are only ever going to occur at one end of a synchronization
event.

>> Or if you get new messages that have
>> the same Message-ID as old ones?
>
> Is that even possible?  I thought that notmuch guarantees the uniqueness
> of indexed message ids.  The only reference I could find without trying
> to read the code was this thread id:87mwyz3s9d.fsf@star.eba from 2012,
> which supports the assumption.

Sadly, yes it is quite possible, and even opens up a slight security
issue.  Suppose I know you are on a mailing list, and some message
appears on that mailing list that I don't want you to see.  I can send
you an innocuous-looking message that just happens to have the same
message-id, and you may never see the original mailing list message.
Even better, depending on how your spam filtering is setup, if I include
the GTUBE string in my message you may never see mine or the original.

That's why with muchsync, I replicate actual mail messages, rather than
message-IDs.  Then you can always periodically check for message-IDs
that appear in more than one file.  (In fact, thought I haven't
published an interface for this, the SQL database kept my muchsync makes
it trivial to check for this and detect certain attacks.)

I understand why notmuch went with message IDs.  For instance you have
sent this reply both directly to me and to a mailing list I am
subscribed to.  So I will get two slightly different copies of the
message (one will have the standard notmuch mailing list signature, the
other won't).  And this way once I've marked it read, the message will
be read even once the second copy comes in.  But personally I'd rather
see the occasional duplicate message than risk not seeing messages.  In
particular, if the goal is to see fewer unread messages, some sort of
feature that pro-actively skips all future messages in a thread or
subthread would be more useful...

> Here is how long they take (on a machine with an SSD, which certainly
> helps):
>
> $ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump
> real    0m3.643s
> user    0m3.593s
> sys     0m0.140s
> $ time notmuch restore < /tmp/notmuch.dump
> real    0m3.719s
> user    0m3.357s
> sys     0m0.357s
> $ notmuch count 
> 117118

That's crazy.  I'm jealous.  Then again, this is how fast muchsync runs
(including a full database scan to detect changed messages and tags)
when there is no new mail:

$ time ./muchsync -v
[notmuch] No new mail.
synchronizing muchsync database with Xapian... 0.038506 (+0.038506)
starting scan of Xapian database... 0.039069 (+0.000563)
opened Xapian... 0.040851 (+0.001782)
scanned message IDs... 0.137647 (+0.096796)
scanned tags... 0.170404 (+0.032757)
scanned directories in xapian... 0.172100 (+0.001696)
scanned filenames in xapian... 0.172376 (+0.000276)
adjusted link counts... 0.199461 (+0.027085)
finished synchronizing muchsync database with Xapian... 0.212965 (+0.013505)

real    0m0.220s
user    0m0.173s
sys     0m0.023s

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-11 18:02   ` David Mazieres
@ 2014-04-14  3:56     ` Brian Sniffen
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Sniffen @ 2014-04-14  3:56 UTC (permalink / raw)
  To: David Mazieres expires 2014-07-10 PDT, David Bremner,
	notmuch@notmuchmail.org

David Mazieres <dm-list-email-notmuch@scs.stanford.edu> writes:

> David Bremner <david@tethera.net> writes:
>
>> Brian Sniffen <bsniffen@akamai.com> writes:
>>
>>> I'm thrilled by using notmuch to manage my mail.  Low-latency search is
>>> very important to me.  But I use computers in a couple of
>>> places---several of which are laptops.  Has anyone stories to share of
>>> successful multi-computer notmuch sync, for a corpus of a
>>> quarter-million messages or so?  
>>
>> I use syncmaildir to sync the actual messages, and a copy of the output
>> of "notmuch dump" in git to sync the metadata.
>>
>> It works OK. A bit slow; depends how often you need to fetch new mail.
>
> If you want to see my solution, it is here:
>
>         http://www.scs.stanford.edu/~dm/muchsync-0.tar.gz

Thanks!  Much sync.  "Wow."

It sounds like you're paying very careful attention to correctness and
performance, so I'm very glad to be able to start from that basis.

-Brian

> I'm a little embarrassed by this code, as I just started to test it a
> week ago then instantly became completely dependent on it.  I will
> probably change the name (from muchsync to syncmuch) and the database
> format before releasing.  But if you feel like beta-testing and giving
> me feedback, have a look.
>
> Beware that if you have been using notmuch dump, you may become
> instantly hooked on my solution...
>
> David

-- 
Brian Sniffen
Information Security
Akamai Technologies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Synchronization success stories?
  2014-04-13 12:51   ` David Bremner
@ 2014-04-15 22:55     ` Tilmann Singer
  0 siblings, 0 replies; 10+ messages in thread
From: Tilmann Singer @ 2014-04-15 22:55 UTC (permalink / raw)
  To: David Bremner, Brian Sniffen, notmuch

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

David Bremner <david@tethera.net> writes:
>> With a reused ssh connection this is sufficiently fast for me (<2s).  If
>> there is interest I can clean up the script of hardcoded paths etc. and
>> put it on github.
>
> Sure, sounds at least as good as what I am using. Also, syncmaildir
> recently did something pretty annoying for upward compatibility, so in
> the long term I'm interested in alternatives.

I've put the ruby script with a README on github:
https://github.com/til/notmuch-rsync


Til

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-15 22:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 18:51 Synchronization success stories? Brian Sniffen
2014-04-11 11:21 ` David Bremner
2014-04-11 18:02   ` David Mazieres
2014-04-14  3:56     ` Brian Sniffen
2014-04-13 11:53 ` Tilmann Singer
2014-04-13 12:51   ` David Bremner
2014-04-15 22:55     ` Tilmann Singer
2014-04-13 15:23   ` David Mazieres
2014-04-13 19:08     ` Tilmann Singer
2014-04-13 19:52       ` dm-list-email-notmuch

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).