* Deletion speed improvements for notmuch-new @ 2021-04-14 2:16 David Bremner 2021-04-14 2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner 2021-04-14 2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner 0 siblings, 2 replies; 4+ messages in thread From: David Bremner @ 2021-04-14 2:16 UTC (permalink / raw) To: notmuch These are a bit rough around the edges, but I wanted to see if we could get them into shape for the 0.32 release, since the performance improvement for me is pretty drastic. I would appreciate testing, since I think the performance test may be kindof best case in avoiding long threads. [PATCH 1/2] WIP: add performance test for removing files. -- this needs someone (Tomi?) to fix my gnu centric tar usage [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist this leads to (at least) a 30x improvement in the test introduced above with the medium corpus; I ran out of patience waiting for the unpatched version. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] WIP: add performance test for removing files. 2021-04-14 2:16 Deletion speed improvements for notmuch-new David Bremner @ 2021-04-14 2:16 ` David Bremner 2021-04-15 7:36 ` Tomi Ollila 2021-04-14 2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner 1 sibling, 1 reply; 4+ messages in thread From: David Bremner @ 2021-04-14 2:16 UTC (permalink / raw) To: notmuch; +Cc: David Bremner No doubt this is non-portable use of tar. --- performance-test/T00-new.sh | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/performance-test/T00-new.sh b/performance-test/T00-new.sh index a14dd13f..1eeac6d0 100755 --- a/performance-test/T00-new.sh +++ b/performance-test/T00-new.sh @@ -26,6 +26,16 @@ perl -nle 'rename "$_.renamed", $_' $manifest time_run "new ($count mv back)" 'notmuch new' +tar --create --file backup.tar --files-from=$manifest + +perl -nle 'unlink $_; unlink $_.copy' $manifest + +time_run "new ($count rm)" 'notmuch new' + +tar --extract --file backup.tar + +time_run "new ($count restore)" 'notmuch new' + perl -nle 'link $_, "$_.copy"' $manifest time_run "new ($count cp)" 'notmuch new' -- 2.30.2 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] WIP: add performance test for removing files. 2021-04-14 2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner @ 2021-04-15 7:36 ` Tomi Ollila 0 siblings, 0 replies; 4+ messages in thread From: Tomi Ollila @ 2021-04-15 7:36 UTC (permalink / raw) To: David Bremner, notmuch On Tue, Apr 13 2021, David Bremner wrote: > No doubt this is non-portable use of tar. portable alternative(s) (?) (we probably can trust no file names start with '-') > --- > performance-test/T00-new.sh | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/performance-test/T00-new.sh b/performance-test/T00-new.sh > index a14dd13f..1eeac6d0 100755 > --- a/performance-test/T00-new.sh > +++ b/performance-test/T00-new.sh > @@ -26,6 +26,16 @@ perl -nle 'rename "$_.renamed", $_' $manifest > > time_run "new ($count mv back)" 'notmuch new' > > +tar --create --file backup.tar --files-from=$manifest xargs tar cf backup.tar < $manifest > +perl -nle 'unlink $_; unlink $_.copy' $manifest > + > +time_run "new ($count rm)" 'notmuch new' > + > +tar --extract --file backup.tar tar xf backup.tar > + > +time_run "new ($count restore)" 'notmuch new' > + > perl -nle 'link $_, "$_.copy"' $manifest > > time_run "new ($count cp)" 'notmuch new' > -- > 2.30.2 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access 2021-04-14 2:16 Deletion speed improvements for notmuch-new David Bremner 2021-04-14 2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner @ 2021-04-14 2:16 ` David Bremner 1 sibling, 0 replies; 4+ messages in thread From: David Bremner @ 2021-04-14 2:16 UTC (permalink / raw) To: notmuch; +Cc: David Bremner This improves performance because it removes some interleaving of queries and deletions, which is causing a lot of time to be spenting checking for position information in the glass backend --- lib/message.cc | 68 +++++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 29 deletions(-) diff --git a/lib/message.cc b/lib/message.cc index 0c2eeab5..42d56acb 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -1356,11 +1356,10 @@ notmuch_status_t _notmuch_message_delete (notmuch_message_t *message) { notmuch_status_t status; - const char *mid, *tid, *query_string; + const char *mid, *tid; notmuch_message_t *ghost; notmuch_private_status_t private_status; notmuch_database_t *notmuch; - notmuch_query_t *query; unsigned int count = 0; bool is_ghost; @@ -1382,16 +1381,33 @@ _notmuch_message_delete (notmuch_message_t *message) if (is_ghost) return NOTMUCH_STATUS_SUCCESS; - query_string = talloc_asprintf (message, "thread:%s", tid); - query = notmuch_query_create (notmuch, query_string); - if (query == NULL) - return NOTMUCH_STATUS_OUT_OF_MEMORY; - status = notmuch_query_count_messages (query, &count); - if (status) { - notmuch_query_destroy (query); - return status; + /* look for a non-ghost message in the same thread */ + try { + Xapian::PostingIterator thread_doc, thread_doc_end; + Xapian::PostingIterator mail_doc, mail_doc_end; + + _notmuch_database_find_doc_ids (message->notmuch, "thread", tid, &thread_doc, + &thread_doc_end); + _notmuch_database_find_doc_ids (message->notmuch, "type", "mail", &mail_doc, &mail_doc_end); + + while (count == 0 && + thread_doc != thread_doc_end && + mail_doc != mail_doc_end) { + thread_doc.skip_to (*mail_doc); + if (thread_doc != thread_doc_end) { + if (*thread_doc == *mail_doc) { + count++; + } else { + mail_doc.skip_to (*thread_doc); + if (mail_doc != mail_doc_end && *thread_doc == *mail_doc) + count++; + } + } + } + } catch (Xapian::Error &error) { + LOG_XAPIAN_EXCEPTION (message, error); + return NOTMUCH_STATUS_XAPIAN_EXCEPTION; } - if (count > 0) { /* reintroduce a ghost in its place because there are still * other active messages in this thread: */ @@ -1410,27 +1426,21 @@ _notmuch_message_delete (notmuch_message_t *message) notmuch_message_destroy (ghost); status = COERCE_STATUS (private_status, "Error converting to ghost message"); } else { - /* the thread is empty; drop all ghost messages from it */ - notmuch_messages_t *messages; - status = _notmuch_query_search_documents (query, - "ghost", - &messages); - if (status == NOTMUCH_STATUS_SUCCESS) { - notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS; - while (notmuch_messages_valid (messages)) { - message = notmuch_messages_get (messages); - status = _notmuch_message_delete (message); - if (status) /* we'll report the last failure we see; - * if there is more than one failure, we - * forget about previous ones */ - last_error = status; - notmuch_message_destroy (message); - notmuch_messages_move_to_next (messages); + /* the thread now contains only ghosts: delete them */ + try { + Xapian::PostingIterator doc, doc_end; + + _notmuch_database_find_doc_ids (message->notmuch, "thread", tid, &doc, &doc_end); + + for (; doc != doc_end; doc++) { + message->notmuch->writable_xapian_db->delete_document (*doc); } - status = last_error; + } catch (Xapian::Error &error) { + LOG_XAPIAN_EXCEPTION (message, error); + return NOTMUCH_STATUS_XAPIAN_EXCEPTION; } + } - notmuch_query_destroy (query); return status; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-04-15 7:36 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-14 2:16 Deletion speed improvements for notmuch-new David Bremner 2021-04-14 2:16 ` [PATCH 1/2] WIP: add performance test for removing files David Bremner 2021-04-15 7:36 ` Tomi Ollila 2021-04-14 2:16 ` [PATCH 2/2] WIP: replace use of queries in n_m_delete with postlist access David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).