unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: index multiple files per message-id, add reindex command
Date: Fri, 14 Apr 2017 03:14:38 -0000	[thread overview]
Message-ID: <20170414025004.5334-1-david@tethera.net> (raw)

WARNING: reindexing is an intrusive operation. I don't think this will
corrupt your database, but previous versions thrashed threading pretty
well. notmuch-dump is your friend.

[PATCH 01/10] lib: isolate n_d_add_message and helper functions into
[PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost
[PATCH 03/10] lib: factor out message-id parsing to separate file.
[PATCH 04/10] lib: refactor notmuch_database_add_message header

The first 4 patches are just code movement. database.cc has gotten to
large to understand (for me), so this is mainly trying to group
functions together in some logical way.

[PATCH 06/10] lib: index message files with duplicate message-ids

the diff here has grown a bit, but the idea is still simple: add terms
and values for all files with a given message id.

[PATCH 07/10] WIP: Add message count to summary output

This patch gives the user some hints about the existance of multiple
files per message-id.

[PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms

this just iterates over terms, and kills any that are recoverable

[PATCH 09/10] lib: add notmuch_message_reindex

this is the trickiest code here, and it ends up using several of the
functions called by notmuch_database_add_message, rather than calling
it directly.

[PATCH 10/10] add "notmuch reindex" subcommand

This should probably have at least a few more tests: in particular
preservation of message properties is not tested yet. Also, more tests
involving threading are needed, since it turned out to surprisingly
hard to trigger some bugs (i.e. there were bugs triggered only by one
of the two corpora, and only by one of xapian 1.2 vs 1.4).

The good news is that there really seems to be a speed payoff for this
extra complication. reindexing all messages went from about twice as
long the initial notmuch new, to about 60% of that speed.

I'm a little skeptical about the peak memory use, but so far I didn't
see any serious looking memory leaks.

             reply	other threads:[~2017-04-14  3:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14  3:14 David Bremner [this message]
2017-04-14  3:14 ` [PATCH 05/10] test: add known broken tests for duplicate message id David Bremner
2017-04-14  3:14 ` [PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms David Bremner
2017-04-14  3:14 ` [PATCH 10/10] add "notmuch reindex" subcommand David Bremner
2017-04-14  3:14 ` [PATCH 04/10] lib: refactor notmuch_database_add_message header parsing David Bremner
2017-04-14  3:14 ` [PATCH 01/10] lib: isolate n_d_add_message and helper functions into own file David Bremner
2017-04-14  3:14 ` [PATCH 07/10] WIP: Add message count to summary output David Bremner
2017-04-14  3:14 ` [PATCH 09/10] lib: add notmuch_message_reindex David Bremner
2017-04-14  3:14 ` [PATCH 06/10] lib: index message files with duplicate message-ids David Bremner
2017-04-14  3:14 ` [PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost messages David Bremner
2017-04-14  3:14 ` [PATCH 03/10] lib: factor out message-id parsing to separate file David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170414025004.5334-1-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).