unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* index multiple files per message-id, add reindex command
@ 2017-04-14  3:14 David Bremner
  2017-04-14  3:14 ` [PATCH 05/10] test: add known broken tests for duplicate message id David Bremner
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: David Bremner @ 2017-04-14  3:14 UTC (permalink / raw)
  To: notmuch

WARNING: reindexing is an intrusive operation. I don't think this will
corrupt your database, but previous versions thrashed threading pretty
well. notmuch-dump is your friend.

[PATCH 01/10] lib: isolate n_d_add_message and helper functions into
[PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost
[PATCH 03/10] lib: factor out message-id parsing to separate file.
[PATCH 04/10] lib: refactor notmuch_database_add_message header

The first 4 patches are just code movement. database.cc has gotten to
large to understand (for me), so this is mainly trying to group
functions together in some logical way.

[PATCH 06/10] lib: index message files with duplicate message-ids

the diff here has grown a bit, but the idea is still simple: add terms
and values for all files with a given message id.

[PATCH 07/10] WIP: Add message count to summary output

This patch gives the user some hints about the existance of multiple
files per message-id.

[PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms

this just iterates over terms, and kills any that are recoverable

[PATCH 09/10] lib: add notmuch_message_reindex

this is the trickiest code here, and it ends up using several of the
functions called by notmuch_database_add_message, rather than calling
it directly.

[PATCH 10/10] add "notmuch reindex" subcommand

This should probably have at least a few more tests: in particular
preservation of message properties is not tested yet. Also, more tests
involving threading are needed, since it turned out to surprisingly
hard to trigger some bugs (i.e. there were bugs triggered only by one
of the two corpora, and only by one of xapian 1.2 vs 1.4).

The good news is that there really seems to be a speed payoff for this
extra complication. reindexing all messages went from about twice as
long the initial notmuch new, to about 60% of that speed.

I'm a little skeptical about the peak memory use, but so far I didn't
see any serious looking memory leaks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-14  3:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-14  3:14 index multiple files per message-id, add reindex command David Bremner
2017-04-14  3:14 ` [PATCH 05/10] test: add known broken tests for duplicate message id David Bremner
2017-04-14  3:14 ` [PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms David Bremner
2017-04-14  3:14 ` [PATCH 10/10] add "notmuch reindex" subcommand David Bremner
2017-04-14  3:14 ` [PATCH 04/10] lib: refactor notmuch_database_add_message header parsing David Bremner
2017-04-14  3:14 ` [PATCH 07/10] WIP: Add message count to summary output David Bremner
2017-04-14  3:14 ` [PATCH 01/10] lib: isolate n_d_add_message and helper functions into own file David Bremner
2017-04-14  3:14 ` [PATCH 06/10] lib: index message files with duplicate message-ids David Bremner
2017-04-14  3:14 ` [PATCH 09/10] lib: add notmuch_message_reindex David Bremner
2017-04-14  3:14 ` [PATCH 03/10] lib: factor out message-id parsing to separate file David Bremner
2017-04-14  3:14 ` [PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost messages David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).