From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: index multiple files per message-id, add reindex command
Date: Fri, 14 Apr 2017 03:14:38 -0000 [thread overview]
Message-ID: <20170414025004.5334-1-david@tethera.net> (raw)
WARNING: reindexing is an intrusive operation. I don't think this will
corrupt your database, but previous versions thrashed threading pretty
well. notmuch-dump is your friend.
[PATCH 01/10] lib: isolate n_d_add_message and helper functions into
[PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost
[PATCH 03/10] lib: factor out message-id parsing to separate file.
[PATCH 04/10] lib: refactor notmuch_database_add_message header
The first 4 patches are just code movement. database.cc has gotten to
large to understand (for me), so this is mainly trying to group
functions together in some logical way.
[PATCH 06/10] lib: index message files with duplicate message-ids
the diff here has grown a bit, but the idea is still simple: add terms
and values for all files with a given message id.
[PATCH 07/10] WIP: Add message count to summary output
This patch gives the user some hints about the existance of multiple
files per message-id.
[PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms
this just iterates over terms, and kills any that are recoverable
[PATCH 09/10] lib: add notmuch_message_reindex
this is the trickiest code here, and it ends up using several of the
functions called by notmuch_database_add_message, rather than calling
it directly.
[PATCH 10/10] add "notmuch reindex" subcommand
This should probably have at least a few more tests: in particular
preservation of message properties is not tested yet. Also, more tests
involving threading are needed, since it turned out to surprisingly
hard to trigger some bugs (i.e. there were bugs triggered only by one
of the two corpora, and only by one of xapian 1.2 vs 1.4).
The good news is that there really seems to be a speed payoff for this
extra complication. reindexing all messages went from about twice as
long the initial notmuch new, to about 60% of that speed.
I'm a little skeptical about the peak memory use, but so far I didn't
see any serious looking memory leaks.
next reply other threads:[~2017-04-14 3:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-14 3:14 David Bremner [this message]
2017-04-14 3:14 ` [PATCH 05/10] test: add known broken tests for duplicate message id David Bremner
2017-04-14 3:14 ` [PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms David Bremner
2017-04-14 3:14 ` [PATCH 10/10] add "notmuch reindex" subcommand David Bremner
2017-04-14 3:14 ` [PATCH 04/10] lib: refactor notmuch_database_add_message header parsing David Bremner
2017-04-14 3:14 ` [PATCH 01/10] lib: isolate n_d_add_message and helper functions into own file David Bremner
2017-04-14 3:14 ` [PATCH 07/10] WIP: Add message count to summary output David Bremner
2017-04-14 3:14 ` [PATCH 09/10] lib: add notmuch_message_reindex David Bremner
2017-04-14 3:14 ` [PATCH 06/10] lib: index message files with duplicate message-ids David Bremner
2017-04-14 3:14 ` [PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost messages David Bremner
2017-04-14 3:14 ` [PATCH 03/10] lib: factor out message-id parsing to separate file David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170414025004.5334-1-david@tethera.net \
--to=david@tethera.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).