unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: [PATCH 2/2] n_m_remove_indexed_terms: reduce number of Xapian API calls.
Date: Mon, 15 Apr 2019 22:46:16 -0300	[thread overview]
Message-ID: <20190416014616.31623-3-david@tethera.net> (raw)
In-Reply-To: <20190416014616.31623-1-david@tethera.net>

Previously this functioned scanned every term attached to a given
Xapian document. It turns out we know how to read only the terms we
need to preserve (and we might have already done so). This commit
replaces many calls to Xapian::Document::remove_term with one call to
::clear_terms, and a (typically much smaller) number of calls to
::add_term. Roughly speaking this is based on the assumption that most
messages have more text than they have tags.

According to the performance test suite, this yields a roughly 40%
speedup on "notmuch reindex '*'"
---
 lib/message.cc | 66 +++++++++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..3e33d8b8 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -716,6 +716,8 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
 
 /* Remove all terms generated by indexing, i.e. not tags or
  * properties, along with any automatic tags*/
+/* According to Xapian API docs, none of these calls throw
+ * exceptions */
 notmuch_private_status_t
 _notmuch_message_remove_indexed_terms (notmuch_message_t *message)
 {
@@ -727,45 +729,53 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t *message)
 	tag_prefix = _find_prefix ("tag"),
 	type_prefix = _find_prefix ("type");
 
-    for (i = message->doc.termlist_begin ();
-	 i != message->doc.termlist_end (); i++) {
+    /* Make sure we have the data to restore to Xapian*/
+    _notmuch_message_ensure_metadata (message,NULL);
 
-	const std::string term = *i;
+    /* Empirically, it turns out to be faster to remove all the terms,
+     * and add back the ones we want. */
+    message->doc.clear_terms ();
+    message->modified = true;
 
-	if (term.compare (0, type_prefix.size (), type_prefix) == 0)
-	    continue;
+    /* still a mail message */
+    message->doc.add_term (type_prefix + "mail");
 
-	if (term.compare (0, id_prefix.size (), id_prefix) == 0)
-	    continue;
+    /* Put back message-id */
+    message->doc.add_term (id_prefix + message->message_id);
 
-	if (term.compare (0, property_prefix.size (), property_prefix) == 0)
-	    continue;
+    /* Put back non-automatic tags */
+    for (notmuch_tags_t *tags = notmuch_message_get_tags (message);
+	 notmuch_tags_valid (tags);
+	 notmuch_tags_move_to_next (tags)) {
 
-	if (term.compare (0, tag_prefix.size (), tag_prefix) == 0 &&
-	    term.compare (1, strlen("encrypted"), "encrypted") != 0 &&
-	    term.compare (1, strlen("signed"), "signed") != 0 &&
-	    term.compare (1, strlen("attachment"), "attachment") != 0)
-	    continue;
+	const char *tag = notmuch_tags_get (tags);
 
-	try {
-	    message->doc.remove_term ((*i));
-	    message->modified = true;
-	} catch (const Xapian::InvalidArgumentError) {
-	    /* Ignore failure to remove non-existent term. */
-	} catch (const Xapian::Error &error) {
-	    notmuch_database_t *notmuch = message->notmuch;
-
-	    if (!notmuch->exception_reported) {
-		_notmuch_database_log(notmuch_message_get_database (message), "A Xapian exception occurred creating message: %s\n",
-				      error.get_msg().c_str());
-		notmuch->exception_reported = true;
-	    }
-	    return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+	if (STRNCMP_LITERAL (tag, "encrypted") != 0 &&
+	    STRNCMP_LITERAL (tag, "signed") != 0 &&
+	    STRNCMP_LITERAL (tag, "attachment") != 0) {
+	    std::string term = tag_prefix + tag;
+	    message->doc.add_term(term);
 	}
     }
+
+    /* Put back properties */
+    notmuch_message_properties_t *list;
+
+    for (list = notmuch_message_get_properties (message, "", false);
+	 notmuch_message_properties_valid (list); notmuch_message_properties_move_to_next (list)) {
+	std::string term = property_prefix +
+	    notmuch_message_properties_key(list) + "=" +
+	    notmuch_message_properties_value(list);
+
+	message->doc.add_term(term);
+    }
+
+    notmuch_message_properties_destroy (list);
+
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+
 /* Return true if p points at "new" or "cur". */
 static bool is_maildir (const char *p)
 {
-- 
2.20.1

  parent reply	other threads:[~2019-04-16  1:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-16  1:46 reindex improvements David Bremner
2019-04-16  1:46 ` [PATCH 1/2] CLI/reindex: fix memory leak David Bremner
2019-04-17 17:04   ` Tomi Ollila
2019-04-19  2:22   ` David Bremner
2019-04-16  1:46 ` David Bremner [this message]
2019-05-22 11:58   ` [PATCH 2/2] n_m_remove_indexed_terms: reduce number of Xapian API calls David Bremner
2019-05-23 11:40   ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190416014616.31623-3-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).