unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: dm-list-email-notmuch@scs.stanford.edu
To: Tilmann Singer <tils@tils.net>,
	Brian Sniffen <bsniffen@akamai.com>,
	notmuch@notmuchmail.org
Subject: Re: Synchronization success stories?
Date: Sun, 13 Apr 2014 12:52:47 -0700	[thread overview]
Message-ID: <87ppklknw0.fsf@ta.scs.stanford.edu> (raw)
In-Reply-To: <8738hhvygu.fsf@tils.net>

Tilmann Singer <tils@tils.net> writes:

> David Mazieres <dm-list-email-notmuch@scs.stanford.edu> writes:
>> What happens if you get a message that's been stuck in a queue for a few
>> days and has an old Date: header?
>
> It would be missed.  I have set the timespan to look backwards for new
> mail to one month to be a bit safer against the stuck-in-queue cases,
> but mails with older Date: headers would definitely get missed.
>
> The current output of notmuch count "*" is the same on both the client
> and the server, so it seems I didn't run into this problem yet (maybe I
> was just lucky).

I've been playing around with reorganizing my maildir, and found a
couple of messages (on mailing lists) with clearly invalid dates years
in the past.  But checking with notmuch count is a good idea.  Then you
can always fall back to the slow path in the unlikely event that your
counts don't match up.  Well, except that A) count is just unique
message-IDs, not messages, and B) when synchronizing in both directions
you could still miss something.  You have to assume that the invalid
dates are only ever going to occur at one end of a synchronization
event.

>> Or if you get new messages that have
>> the same Message-ID as old ones?
>
> Is that even possible?  I thought that notmuch guarantees the uniqueness
> of indexed message ids.  The only reference I could find without trying
> to read the code was this thread id:87mwyz3s9d.fsf@star.eba from 2012,
> which supports the assumption.

Sadly, yes it is quite possible, and even opens up a slight security
issue.  Suppose I know you are on a mailing list, and some message
appears on that mailing list that I don't want you to see.  I can send
you an innocuous-looking message that just happens to have the same
message-id, and you may never see the original mailing list message.
Even better, depending on how your spam filtering is setup, if I include
the GTUBE string in my message you may never see mine or the original.

That's why with muchsync, I replicate actual mail messages, rather than
message-IDs.  Then you can always periodically check for message-IDs
that appear in more than one file.  (In fact, thought I haven't
published an interface for this, the SQL database kept my muchsync makes
it trivial to check for this and detect certain attacks.)

I understand why notmuch went with message IDs.  For instance you have
sent this reply both directly to me and to a mailing list I am
subscribed to.  So I will get two slightly different copies of the
message (one will have the standard notmuch mailing list signature, the
other won't).  And this way once I've marked it read, the message will
be read even once the second copy comes in.  But personally I'd rather
see the occasional duplicate message than risk not seeing messages.  In
particular, if the goal is to see fewer unread messages, some sort of
feature that pro-actively skips all future messages in a thread or
subthread would be more useful...

> Here is how long they take (on a machine with an SSD, which certainly
> helps):
>
> $ time notmuch dump --format=batch-tag | sort > /tmp/notmuch.dump
> real    0m3.643s
> user    0m3.593s
> sys     0m0.140s
> $ time notmuch restore < /tmp/notmuch.dump
> real    0m3.719s
> user    0m3.357s
> sys     0m0.357s
> $ notmuch count 
> 117118

That's crazy.  I'm jealous.  Then again, this is how fast muchsync runs
(including a full database scan to detect changed messages and tags)
when there is no new mail:

$ time ./muchsync -v
[notmuch] No new mail.
synchronizing muchsync database with Xapian... 0.038506 (+0.038506)
starting scan of Xapian database... 0.039069 (+0.000563)
opened Xapian... 0.040851 (+0.001782)
scanned message IDs... 0.137647 (+0.096796)
scanned tags... 0.170404 (+0.032757)
scanned directories in xapian... 0.172100 (+0.001696)
scanned filenames in xapian... 0.172376 (+0.000276)
adjusted link counts... 0.199461 (+0.027085)
finished synchronizing muchsync database with Xapian... 0.212965 (+0.013505)

real    0m0.220s
user    0m0.173s
sys     0m0.023s

David

      reply	other threads:[~2014-04-13 19:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-10 18:51 Synchronization success stories? Brian Sniffen
2014-04-11 11:21 ` David Bremner
2014-04-11 18:02   ` David Mazieres
2014-04-14  3:56     ` Brian Sniffen
2014-04-13 11:53 ` Tilmann Singer
2014-04-13 12:51   ` David Bremner
2014-04-15 22:55     ` Tilmann Singer
2014-04-13 15:23   ` David Mazieres
2014-04-13 19:08     ` Tilmann Singer
2014-04-13 19:52       ` dm-list-email-notmuch [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ppklknw0.fsf@ta.scs.stanford.edu \
    --to=dm-list-email-notmuch@scs.stanford.edu \
    --cc=bsniffen@akamai.com \
    --cc=mazieres-z3itifemqmn227pvca6hpue6x6@temporary-address.scs.stanford.edu \
    --cc=notmuch@notmuchmail.org \
    --cc=tils@tils.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).