unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Deletion speed improvements for notmuch-new
@ 2021-04-14  2:16 David Bremner
  2021-04-14  2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner
  2021-04-14  2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner
  0 siblings, 2 replies; 4+ messages in thread
From: David Bremner @ 2021-04-14  2:16 UTC (permalink / raw)
  To: notmuch

These are a bit rough around the edges, but I wanted to see if we
could get them into shape for the 0.32 release, since the performance
improvement for me is pretty drastic. I would appreciate testing,
since I think the performance test may be kindof best case in avoiding
long threads.

[PATCH 1/2] WIP: add performance test for removing files.
-- this needs someone (Tomi?) to fix my gnu centric tar usage

[PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist

this leads to (at least) a 30x improvement in the test introduced
above with the medium corpus; I ran out of patience waiting for the
unpatched version.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] WIP: add performance test for removing files.
  2021-04-14  2:16 Deletion speed improvements for notmuch-new David Bremner
@ 2021-04-14  2:16 ` David Bremner
  2021-04-15  7:36   ` Tomi Ollila
  2021-04-14  2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner
  1 sibling, 1 reply; 4+ messages in thread
From: David Bremner @ 2021-04-14  2:16 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

No doubt this is non-portable use of tar.
---
 performance-test/T00-new.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/performance-test/T00-new.sh b/performance-test/T00-new.sh
index a14dd13f..1eeac6d0 100755
--- a/performance-test/T00-new.sh
+++ b/performance-test/T00-new.sh
@@ -26,6 +26,16 @@ perl -nle 'rename "$_.renamed", $_' $manifest
 
 time_run "new ($count mv back)" 'notmuch new'
 
+tar --create --file backup.tar --files-from=$manifest
+
+perl -nle 'unlink $_; unlink $_.copy' $manifest
+
+time_run "new ($count rm)" 'notmuch new'
+
+tar --extract --file backup.tar
+
+time_run "new ($count restore)" 'notmuch new'
+
 perl -nle 'link $_, "$_.copy"' $manifest
 
 time_run "new ($count cp)" 'notmuch new'
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access
  2021-04-14  2:16 Deletion speed improvements for notmuch-new David Bremner
  2021-04-14  2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner
@ 2021-04-14  2:16 ` David Bremner
  1 sibling, 0 replies; 4+ messages in thread
From: David Bremner @ 2021-04-14  2:16 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

This improves performance because it removes some interleaving of
queries and deletions, which is causing a lot of time to be spenting
checking for position information in the glass backend
---
 lib/message.cc | 68 +++++++++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 29 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 0c2eeab5..42d56acb 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1356,11 +1356,10 @@ notmuch_status_t
 _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
-    const char *mid, *tid, *query_string;
+    const char *mid, *tid;
     notmuch_message_t *ghost;
     notmuch_private_status_t private_status;
     notmuch_database_t *notmuch;
-    notmuch_query_t *query;
     unsigned int count = 0;
     bool is_ghost;
 
@@ -1382,16 +1381,33 @@ _notmuch_message_delete (notmuch_message_t *message)
     if (is_ghost)
 	return NOTMUCH_STATUS_SUCCESS;
 
-    query_string = talloc_asprintf (message, "thread:%s", tid);
-    query = notmuch_query_create (notmuch, query_string);
-    if (query == NULL)
-	return NOTMUCH_STATUS_OUT_OF_MEMORY;
-    status = notmuch_query_count_messages (query, &count);
-    if (status) {
-	notmuch_query_destroy (query);
-	return status;
+    /* look for a non-ghost message in the same thread */
+    try {
+	Xapian::PostingIterator thread_doc, thread_doc_end;
+	Xapian::PostingIterator mail_doc, mail_doc_end;
+
+	_notmuch_database_find_doc_ids (message->notmuch, "thread", tid, &thread_doc,
+					&thread_doc_end);
+	_notmuch_database_find_doc_ids (message->notmuch, "type", "mail", &mail_doc, &mail_doc_end);
+
+	while (count == 0 &&
+	       thread_doc != thread_doc_end &&
+	       mail_doc != mail_doc_end) {
+	    thread_doc.skip_to (*mail_doc);
+	    if (thread_doc != thread_doc_end) {
+		if (*thread_doc == *mail_doc) {
+		    count++;
+		} else {
+		    mail_doc.skip_to (*thread_doc);
+		    if (mail_doc != mail_doc_end && *thread_doc == *mail_doc)
+			count++;
+		}
+	    }
+	}
+    } catch (Xapian::Error &error) {
+	LOG_XAPIAN_EXCEPTION (message, error);
+	return NOTMUCH_STATUS_XAPIAN_EXCEPTION;
     }
-
     if (count > 0) {
 	/* reintroduce a ghost in its place because there are still
 	 * other active messages in this thread: */
@@ -1410,27 +1426,21 @@ _notmuch_message_delete (notmuch_message_t *message)
 	notmuch_message_destroy (ghost);
 	status = COERCE_STATUS (private_status, "Error converting to ghost message");
     } else {
-	/* the thread is empty; drop all ghost messages from it */
-	notmuch_messages_t *messages;
-	status = _notmuch_query_search_documents (query,
-						  "ghost",
-						  &messages);
-	if (status == NOTMUCH_STATUS_SUCCESS) {
-	    notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS;
-	    while (notmuch_messages_valid (messages)) {
-		message = notmuch_messages_get (messages);
-		status = _notmuch_message_delete (message);
-		if (status) /* we'll report the last failure we see;
-					 * if there is more than one failure, we
-					 * forget about previous ones */
-		    last_error = status;
-		notmuch_message_destroy (message);
-		notmuch_messages_move_to_next (messages);
+	/* the thread now contains only ghosts: delete them */
+	try {
+	    Xapian::PostingIterator doc, doc_end;
+
+	    _notmuch_database_find_doc_ids (message->notmuch, "thread", tid, &doc, &doc_end);
+
+	    for (; doc != doc_end; doc++) {
+		message->notmuch->writable_xapian_db->delete_document (*doc);
 	    }
-	    status = last_error;
+	} catch (Xapian::Error &error) {
+	    LOG_XAPIAN_EXCEPTION (message, error);
+	    return NOTMUCH_STATUS_XAPIAN_EXCEPTION;
 	}
+
     }
-    notmuch_query_destroy (query);
     return status;
 }
 
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] WIP: add performance test for removing files.
  2021-04-14  2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner
@ 2021-04-15  7:36   ` Tomi Ollila
  0 siblings, 0 replies; 4+ messages in thread
From: Tomi Ollila @ 2021-04-15  7:36 UTC (permalink / raw)
  To: David Bremner, notmuch

On Tue, Apr 13 2021, David Bremner wrote:

> No doubt this is non-portable use of tar.

portable alternative(s) (?)

(we probably can trust no file names start with '-')

> ---
>  performance-test/T00-new.sh | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/performance-test/T00-new.sh b/performance-test/T00-new.sh
> index a14dd13f..1eeac6d0 100755
> --- a/performance-test/T00-new.sh
> +++ b/performance-test/T00-new.sh
> @@ -26,6 +26,16 @@ perl -nle 'rename "$_.renamed", $_' $manifest
>  
>  time_run "new ($count mv back)" 'notmuch new'
>  
> +tar --create --file backup.tar --files-from=$manifest

xargs tar cf backup.tar < $manifest

> +perl -nle 'unlink $_; unlink $_.copy' $manifest
> +
> +time_run "new ($count rm)" 'notmuch new'
> +
> +tar --extract --file backup.tar

tar xf backup.tar

> +
> +time_run "new ($count restore)" 'notmuch new'
> +
>  perl -nle 'link $_, "$_.copy"' $manifest
>  
>  time_run "new ($count cp)" 'notmuch new'
> -- 
> 2.30.2

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-15  7:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14  2:16 Deletion speed improvements for notmuch-new David Bremner
2021-04-14  2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner
2021-04-15  7:36   ` Tomi Ollila
2021-04-14  2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).