unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Speedup for deleting files
@ 2023-07-20 12:07 David Bremner
  2023-07-20 12:08 ` [PATCH 1/2] lib/message: check message type before deleting document David Bremner
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Bremner @ 2023-07-20 12:07 UTC (permalink / raw)
  To: notmuch; +Cc: ukleinek

Thanks to discussion with Olly Betts and some perf runs, I realized
the current clean up of deleted files from the database is somewhat
wasteful since it modifies the message documents (by deleting the
filename) before in most cases deleting the whole record. While trying
this out, I triggered what seems to be a bug in existing notmuch; the
code checks for the existence of a certain Xapian term after deleting
the document.

Compared to git master [1], cleaning up 50k of 200k messages now takes
about 44s versus 80. So not quite a 50% improvement, but not bad. I
would expect a larger relative improvement on larger databases.

[1]: https://notmuchmail.org/perf-test-results/2023-07-18-minkowski/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] lib/message: check message type before deleting document
  2023-07-20 12:07 Speedup for deleting files David Bremner
@ 2023-07-20 12:08 ` David Bremner
  2023-07-20 12:08 ` [PATCH 2/2] lib/n_d_remove_message: do not remove unique filename David Bremner
  2023-07-22  9:02 ` Speedup for deleting files Tomi Ollila
  2 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2023-07-20 12:08 UTC (permalink / raw)
  To: notmuch; +Cc: ukleinek

It isn't really clear how this worked before. Traversing the terms of
a document after deleting it from the database seems likely to be
undefined behaviour at best
---
 lib/message.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 53f35dd1..46638f80 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1397,14 +1397,15 @@ _notmuch_message_delete (notmuch_message_t *message)
 	Xapian::PostingIterator thread_doc, thread_doc_end;
 	Xapian::PostingIterator mail_doc, mail_doc_end;
 
-	message->notmuch->writable_xapian_db->delete_document (message->doc_id);
-
 	/* look for a non-ghost message in the same thread */
 	/* if this was a ghost to begin with, we are done */
 	private_status = _notmuch_message_has_term (message, "type", "ghost", &is_ghost);
 	if (private_status)
 	    return COERCE_STATUS (private_status,
 				  "Error trying to determine whether message was a ghost");
+
+	message->notmuch->writable_xapian_db->delete_document (message->doc_id);
+
 	if (is_ghost)
 	    return NOTMUCH_STATUS_SUCCESS;
 
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] lib/n_d_remove_message: do not remove unique filename
  2023-07-20 12:07 Speedup for deleting files David Bremner
  2023-07-20 12:08 ` [PATCH 1/2] lib/message: check message type before deleting document David Bremner
@ 2023-07-20 12:08 ` David Bremner
  2023-07-22  9:02 ` Speedup for deleting files Tomi Ollila
  2 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2023-07-20 12:08 UTC (permalink / raw)
  To: notmuch; +Cc: ukleinek

It is wasteful to remove a filename term when the whole message
document is about to be removed from the database. Profiling with perf
shows this takes a significant portion of the time when cleaning up
removed files in the database.

The logic of n_d_remove_message becomes a bit more convoluted here in
order to make the change minimal.

It is possible that this function can be further optimized, since the
expansion of filename terms into filenames is probably not needed
here.
---
 lib/database.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 6987e2f4..737a3f30 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1454,7 +1454,9 @@ notmuch_database_remove_message (notmuch_database_t *notmuch,
 							&message);
 
     if (status == NOTMUCH_STATUS_SUCCESS && message) {
-	status = _notmuch_message_remove_filename (message, filename);
+	if (notmuch_message_count_files (message) > 1) {
+	    status = _notmuch_message_remove_filename (message, filename);
+	}
 	if (status == NOTMUCH_STATUS_SUCCESS)
 	    status = _notmuch_message_delete (message);
 	else if (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Speedup for deleting files
  2023-07-20 12:07 Speedup for deleting files David Bremner
  2023-07-20 12:08 ` [PATCH 1/2] lib/message: check message type before deleting document David Bremner
  2023-07-20 12:08 ` [PATCH 2/2] lib/n_d_remove_message: do not remove unique filename David Bremner
@ 2023-07-22  9:02 ` Tomi Ollila
  2023-08-20 15:56   ` David Bremner
  2 siblings, 1 reply; 5+ messages in thread
From: Tomi Ollila @ 2023-07-22  9:02 UTC (permalink / raw)
  To: David Bremner, notmuch; +Cc: ukleinek

On Thu, Jul 20 2023, David Bremner wrote:

> Thanks to discussion with Olly Betts and some perf runs, I realized
> the current clean up of deleted files from the database is somewhat
> wasteful since it modifies the message documents (by deleting the
> filename) before in most cases deleting the whole record. While trying
> this out, I triggered what seems to be a bug in existing notmuch; the
> code checks for the existence of a certain Xapian term after deleting
> the document.
>
> Compared to git master [1], cleaning up 50k of 200k messages now takes
> about 44s versus 80. So not quite a 50% improvement, but not bad. I
> would expect a larger relative improvement on larger databases.
>
> [1]: https://notmuchmail.org/perf-test-results/2023-07-18-minkowski/

Series LGTM.

Tomi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Speedup for deleting files
  2023-07-22  9:02 ` Speedup for deleting files Tomi Ollila
@ 2023-08-20 15:56   ` David Bremner
  0 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2023-08-20 15:56 UTC (permalink / raw)
  To: Tomi Ollila, notmuch; +Cc: ukleinek

Tomi Ollila <tomi.ollila@iki.fi> writes:

>
> Series LGTM.
>
> Tomi

Series applied to master.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-20 15:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-20 12:07 Speedup for deleting files David Bremner
2023-07-20 12:08 ` [PATCH 1/2] lib/message: check message type before deleting document David Bremner
2023-07-20 12:08 ` [PATCH 2/2] lib/n_d_remove_message: do not remove unique filename David Bremner
2023-07-22  9:02 ` Speedup for deleting files Tomi Ollila
2023-08-20 15:56   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).