unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Commit after some number of transactions
@ 2021-05-22  1:20 David Bremner
  2021-05-22  1:20 ` [PATCH 1/5] database/close: remove misleading code / comment David Bremner
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch

The main rational is explained in the commit message to 

[PATCH 4/5] lib: autocommit after some number of completed

I'm not super-happy with the documentation in [5/5], as it explains
things in terms of database concepts the user shouldn't really need to
understand.

[PATCH 5/5] doc: document database.autocommit variable

The default value of 8000 was chose not to cause any noticable
slowdown when indexing the "large" corpus of about 200k messages.  The
test machine is a recent Xeon with fast spinning rust drives; the
whole index takes about 8.5 minutes on this machine. I'd be curious if
other people notice a performance impact.  On the same machine the
threshold of 8000 means that less than 30 seconds worth of work would
be discarded.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] database/close: remove misleading code / comment
  2021-05-22  1:20 Commit after some number of transactions David Bremner
@ 2021-05-22  1:20 ` David Bremner
  2021-05-22  1:20 ` [PATCH 2/5] lib/config: add NOTMUCH_CONFIG_AUTOCOMMIT David Bremner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

Unfortunately, it doesn't make a difference if we call
cancel_transaction or not, all uncommited changes are discarded if
there is an open (unflushed) transaction.
---
 lib/database.cc | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 96458f6f..db7feca2 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -502,17 +502,9 @@ notmuch_database_close (notmuch_database_t *notmuch)
      * close it.  Thus, we explicitly close it here. */
     if (notmuch->open) {
 	try {
-	    /* If there's an outstanding transaction, it's unclear if
-	     * closing the Xapian database commits everything up to
-	     * that transaction, or may discard committed (but
-	     * unflushed) transactions.  To be certain, explicitly
-	     * cancel any outstanding transaction before closing. */
-	    if (_notmuch_database_mode (notmuch) == NOTMUCH_DATABASE_MODE_READ_WRITE &&
-		notmuch->atomic_nesting)
-		notmuch->writable_xapian_db->cancel_transaction ();
-
 	    /* Close the database.  This implicitly flushes
-	     * outstanding changes. */
+	     * outstanding changes. If there is an open (non-flushed)
+	     * transaction, ALL pending changes will be discarded */
 	    notmuch->xapian_db->close ();
 	} catch (const Xapian::Error &error) {
 	    status = NOTMUCH_STATUS_XAPIAN_EXCEPTION;
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] lib/config: add NOTMUCH_CONFIG_AUTOCOMMIT
  2021-05-22  1:20 Commit after some number of transactions David Bremner
  2021-05-22  1:20 ` [PATCH 1/5] database/close: remove misleading code / comment David Bremner
@ 2021-05-22  1:20 ` David Bremner
  2021-05-22  1:20 ` [PATCH 3/5] test: add known broken test for closing with open transaction David Bremner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

This will be used to control how often atomic transactions are
committed.
---
 lib/config.cc            | 4 ++++
 lib/notmuch.h            | 1 +
 test/T030-config.sh      | 1 +
 test/T055-path-config.sh | 1 +
 test/T590-libconfig.sh   | 4 ++++
 5 files changed, 11 insertions(+)

diff --git a/lib/config.cc b/lib/config.cc
index 0ec66372..13eab5f1 100644
--- a/lib/config.cc
+++ b/lib/config.cc
@@ -593,6 +593,8 @@ _notmuch_config_key_to_string (notmuch_config_key_t key)
 	return "user.other_email";
     case NOTMUCH_CONFIG_USER_NAME:
 	return "user.name";
+    case NOTMUCH_CONFIG_AUTOCOMMIT:
+	return "database.autocommit";
     default:
 	return NULL;
     }
@@ -638,6 +640,8 @@ _notmuch_config_default (notmuch_database_t *notmuch, notmuch_config_key_t key)
 	return email;
     case NOTMUCH_CONFIG_NEW_IGNORE:
 	return "";
+    case NOTMUCH_CONFIG_AUTOCOMMIT:
+	return "8000";
     case NOTMUCH_CONFIG_HOOK_DIR:
     case NOTMUCH_CONFIG_BACKUP_DIR:
     case NOTMUCH_CONFIG_OTHER_EMAIL:
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 4b053932..5c3be342 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -2520,6 +2520,7 @@ typedef enum _notmuch_config_key {
     NOTMUCH_CONFIG_PRIMARY_EMAIL,
     NOTMUCH_CONFIG_OTHER_EMAIL,
     NOTMUCH_CONFIG_USER_NAME,
+    NOTMUCH_CONFIG_AUTOCOMMIT,
     NOTMUCH_CONFIG_LAST
 } notmuch_config_key_t;
 
diff --git a/test/T030-config.sh b/test/T030-config.sh
index 7a1660e9..751feaf3 100755
--- a/test/T030-config.sh
+++ b/test/T030-config.sh
@@ -51,6 +51,7 @@ cat <<EOF > EXPECTED
 built_with.compact=something
 built_with.field_processor=something
 built_with.retry_lock=something
+database.autocommit=8000
 database.mail_root=MAIL_DIR
 database.path=MAIL_DIR
 foo.list=this;is another;list value;
diff --git a/test/T055-path-config.sh b/test/T055-path-config.sh
index 8ef76aed..bb3bf665 100755
--- a/test/T055-path-config.sh
+++ b/test/T055-path-config.sh
@@ -259,6 +259,7 @@ EOF
 built_with.compact=true
 built_with.field_processor=true
 built_with.retry_lock=true
+database.autocommit=8000
 database.backup_dir
 database.hook_dir
 database.mail_root=MAIL_DIR
diff --git a/test/T590-libconfig.sh b/test/T590-libconfig.sh
index 745e1bb4..d922c3af 100755
--- a/test/T590-libconfig.sh
+++ b/test/T590-libconfig.sh
@@ -400,6 +400,7 @@ true
 USERNAME@FQDN
 NULL
 USER_FULL_NAME
+8000
 == stderr ==
 EOF
 unset MAILDIR
@@ -711,6 +712,7 @@ true
 test_suite@notmuchmail.org
 test_suite_other@notmuchmail.org;test_suite@otherdomain.org
 Notmuch Test Suite
+8000
 == stderr ==
 EOF
 test_expect_equal_file EXPECTED OUTPUT
@@ -742,6 +744,7 @@ true
 USERNAME@FQDN
 NULL
 USER_FULL_NAME
+8000
 == stderr ==
 EOF
 test_expect_equal_file EXPECTED OUTPUT.clean
@@ -808,6 +811,7 @@ EOF
 cat <<'EOF' >EXPECTED
 == stdout ==
 aaabefore beforeval
+database.autocommit 8000
 database.backup_dir MAIL_DIR/.notmuch/backups
 database.hook_dir MAIL_DIR/.notmuch/hooks
 database.mail_root MAIL_DIR
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] test: add known broken test for closing with open transaction
  2021-05-22  1:20 Commit after some number of transactions David Bremner
  2021-05-22  1:20 ` [PATCH 1/5] database/close: remove misleading code / comment David Bremner
  2021-05-22  1:20 ` [PATCH 2/5] lib/config: add NOTMUCH_CONFIG_AUTOCOMMIT David Bremner
@ 2021-05-22  1:20 ` David Bremner
  2021-05-22  1:20 ` [PATCH 4/5] lib: autocommit after some number of completed transactions David Bremner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

The expected output may need adjusting, but what is clear is that
saving none of the changes is not desirable.
---
 test/T385-transactions.sh | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)
 create mode 100755 test/T385-transactions.sh

diff --git a/test/T385-transactions.sh b/test/T385-transactions.sh
new file mode 100755
index 00000000..ebfec2ed
--- /dev/null
+++ b/test/T385-transactions.sh
@@ -0,0 +1,36 @@
+#!/usr/bin/env bash
+test_description='transactions'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+make_shim no-close <<EOF
+#include <notmuch.h>
+#include <stdio.h>
+notmuch_status_t
+notmuch_database_close (notmuch_database_t *notmuch)
+{
+  return notmuch_database_begin_atomic (notmuch);
+}
+EOF
+
+for i in `seq 1 1024`
+do
+    generate_message '[subject]="'"subject $i"'"' \
+	             '[body]="'"body $i"'"'
+done
+
+test_begin_subtest "initial new"
+NOTMUCH_NEW > OUTPUT
+cat <<EOF > EXPECTED
+Added 1024 new messages to the database.
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Some changes saved with open transaction"
+test_subtest_known_broken
+notmuch config set database.autocommit 1000
+rm -r ${MAIL_DIR}/.notmuch
+notmuch_with_shim no-close new
+output=$(notmuch count '*')
+test_expect_equal "$output" "1000"
+
+test_done
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] lib: autocommit after some number of completed transactions
  2021-05-22  1:20 Commit after some number of transactions David Bremner
                   ` (2 preceding siblings ...)
  2021-05-22  1:20 ` [PATCH 3/5] test: add known broken test for closing with open transaction David Bremner
@ 2021-05-22  1:20 ` David Bremner
  2021-05-22  1:20 ` [PATCH 5/5] doc: document database.autocommit variable David Bremner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

This change addresses two known issues with large sets of changes to
the database.  The first is that as reported by Steven Allen [1],
notmuch commits are not "flushed" when they complete, which means that
if there is an open transaction when the database closes (or e.g. the
program crashes) then all changes since the last commit will be
discarded (nothing is irrecoverably lost for "notmuch new", as the
indexing process just restarts next time it is run).  This does not
really "fix" the issue reported in [1]; that seems rather difficult
given how transactions work in Xapian. On the other hand, with the
default settings, this should mean one only loses less than a minutes
worth of work.  The second issue is the occasionally reported "storm"
of disk writes when notmuch finishes. I don't yet have a test for
this, but I think committing as we go should reduce the amount of work
when finalizing the database.

[1]: id:20151025210215.GA3754@stebalien.com
---
 lib/database-private.h    |  5 +++++
 lib/database.cc           | 18 +++++++++++++-----
 lib/open.cc               | 12 ++++++++++++
 test/T385-transactions.sh |  1 -
 4 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/lib/database-private.h b/lib/database-private.h
index 1a73dacc..9706c17e 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -212,6 +212,11 @@ struct _notmuch_database {
     char thread_id_str[17];
     uint64_t last_thread_id;
 
+    /* How many transactions have successfully completed since we last committed */
+    int transaction_count;
+    /* when to commit and reset the counter */
+    int transaction_threshold;
+
     /* error reporting; this value persists only until the
      * next library call. May be NULL */
     char *status_string;
diff --git a/lib/database.cc b/lib/database.cc
index db7feca2..dafea4ce 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -1125,13 +1125,21 @@ notmuch_database_end_atomic (notmuch_database_t *notmuch)
     db = notmuch->writable_xapian_db;
     try {
 	db->commit_transaction ();
-
-	/* This is a hack for testing.  Xapian never flushes on a
-	 * non-flushed commit, even if the flush threshold is 1.
-	 * However, we rely on flushing to test atomicity. */
+	notmuch->transaction_count++;
+
+	/* Xapian never flushes on a non-flushed commit, even if the
+	 * flush threshold is 1.  However, we rely on flushing to test
+	 * atomicity. On the other hand, we can't straight replace
+	 * XAPIAN_FLUSH_THRESHOLD with our autocommit counter, because
+	 * the former also applies outside notmuch atomic
+	 * commits. Hence the follow complicated  test */
 	const char *thresh = getenv ("XAPIAN_FLUSH_THRESHOLD");
-	if (thresh && atoi (thresh) == 1)
+	if ((notmuch->transaction_threshold > 0 &&
+	     notmuch->transaction_count >= notmuch->transaction_threshold) ||
+	    (thresh && atoi (thresh) == 1)) {
 	    db->commit ();
+	    notmuch->transaction_count = 0;
+	}
     } catch (const Xapian::Error &error) {
 	_notmuch_database_log (notmuch, "A Xapian exception occurred committing transaction: %s.\n",
 			       error.get_msg ().c_str ());
diff --git a/lib/open.cc b/lib/open.cc
index 1ca69665..706a23e1 100644
--- a/lib/open.cc
+++ b/lib/open.cc
@@ -256,6 +256,8 @@ _alloc_notmuch ()
     notmuch->writable_xapian_db = NULL;
     notmuch->config_path = NULL;
     notmuch->atomic_nesting = 0;
+    notmuch->transaction_count = 0;
+    notmuch->transaction_threshold = 0;
     notmuch->view = 1;
     return notmuch;
 }
@@ -365,6 +367,8 @@ _finish_open (notmuch_database_t *notmuch,
     notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
     char *incompat_features;
     char *message = NULL;
+    const char *autocommit_str;
+    char *autocommit_end;
     unsigned int version;
     const char *database_path = notmuch_database_get_path (notmuch);
 
@@ -461,6 +465,14 @@ _finish_open (notmuch_database_t *notmuch,
 	if (status)
 	    goto DONE;
 
+	autocommit_str = notmuch_config_get (notmuch, NOTMUCH_CONFIG_AUTOCOMMIT);
+	if (unlikely (!autocommit_str)) {
+	    INTERNAL_ERROR ("missing configuration for autocommit");
+	}
+	notmuch->transaction_threshold = strtoul (autocommit_str, &autocommit_end, 10);
+	if (*autocommit_end != '\0')
+	    INTERNAL_ERROR ("Malformed database database.autocommit value: %s", autocommit_str);
+
 	status = _notmuch_database_setup_standard_query_fields (notmuch);
 	if (status)
 	    goto DONE;
diff --git a/test/T385-transactions.sh b/test/T385-transactions.sh
index ebfec2ed..d8bb502d 100755
--- a/test/T385-transactions.sh
+++ b/test/T385-transactions.sh
@@ -26,7 +26,6 @@ EOF
 test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest "Some changes saved with open transaction"
-test_subtest_known_broken
 notmuch config set database.autocommit 1000
 rm -r ${MAIL_DIR}/.notmuch
 notmuch_with_shim no-close new
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] doc: document database.autocommit variable
  2021-05-22  1:20 Commit after some number of transactions David Bremner
                   ` (3 preceding siblings ...)
  2021-05-22  1:20 ` [PATCH 4/5] lib: autocommit after some number of completed transactions David Bremner
@ 2021-05-22  1:20 ` David Bremner
  2021-05-22 11:10   ` [PATCH] lib: update transaction documentation David Bremner
  2021-05-24 17:14 ` Commit after some number of transactions David Bremner
  2021-06-27 17:11 ` David Bremner
  6 siblings, 1 reply; 9+ messages in thread
From: David Bremner @ 2021-05-22  1:20 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner

This exposes some database internals that most users will probably not
understand.
---
 doc/man1/notmuch-config.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/doc/man1/notmuch-config.rst b/doc/man1/notmuch-config.rst
index 75c59ff9..7d709bfb 100644
--- a/doc/man1/notmuch-config.rst
+++ b/doc/man1/notmuch-config.rst
@@ -78,6 +78,16 @@ paths are presumed relative to `$HOME` for items in section
     Directory containing hooks run by notmuch commands. See
     **notmuch-hooks(5)**.
 
+    History: this configuration value was introduced in notmuch 0.32.
+
+**database.autocommit**
+
+    How often to commit transactions to disk. `0` means wait until
+    command completes, otherwise an integer `n` specifies to commit to
+    disk after every `n` completed transactions.
+
+    History: this configuration value was introduced in notmuch 0.33.
+
 **user.name**
     Your full name.
 
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] lib: update transaction documentation
  2021-05-22  1:20 ` [PATCH 5/5] doc: document database.autocommit variable David Bremner
@ 2021-05-22 11:10   ` David Bremner
  0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-22 11:10 UTC (permalink / raw)
  To: David Bremner, notmuch

Partly this is to recognize the semantics we inherit from Xapian,
partly to mention the new autocommit feature.
---
 lib/notmuch.h | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index 5c3be342..3b28bea3 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -529,11 +529,11 @@ notmuch_database_status_string (const notmuch_database_t *notmuch);
  * have no effect.
  *
  * For writable databases, notmuch_database_close commits all changes
- * to disk before closing the database.  If the caller is currently in
- * an atomic section (there was a notmuch_database_begin_atomic
- * without a matching notmuch_database_end_atomic), this will discard
- * changes made in that atomic section (but still commit changes made
- * prior to entering the atomic section).
+ * to disk before closing the database, unless the caller is currently
+ * in an atomic section (there was a notmuch_database_begin_atomic
+ * without a matching notmuch_database_end_atomic). In this case
+ * changes since the last commit are discarded. @see
+ * notmuch_database_end_atomic for more information.
  *
  * Return value:
  *
@@ -670,7 +670,10 @@ notmuch_status_t
 notmuch_database_begin_atomic (notmuch_database_t *notmuch);
 
 /**
- * Indicate the end of an atomic database operation.
+ * Indicate the end of an atomic database operation.  If repeated
+ * (with matching notmuch_database_begin_atomic) "database.autocommit"
+ * times, commit the the transaction and all previous (non-cancelled)
+ * transactions to the database.
  *
  * Return value:
  *
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Commit after some number of transactions
  2021-05-22  1:20 Commit after some number of transactions David Bremner
                   ` (4 preceding siblings ...)
  2021-05-22  1:20 ` [PATCH 5/5] doc: document database.autocommit variable David Bremner
@ 2021-05-24 17:14 ` David Bremner
  2021-06-27 17:11 ` David Bremner
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-05-24 17:14 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> The main rational is explained in the commit message to 
>
> [PATCH 4/5] lib: autocommit after some number of completed
>
> I'm not super-happy with the documentation in [5/5], as it explains
> things in terms of database concepts the user shouldn't really need to
> understand.
>
> [PATCH 5/5] doc: document database.autocommit variable
>
> The default value of 8000 was chose not to cause any noticable
> slowdown when indexing the "large" corpus of about 200k messages.  The
> test machine is a recent Xeon with fast spinning rust drives; the
> whole index takes about 8.5 minutes on this machine. I'd be curious if
> other people notice a performance impact.  On the same machine the
> threshold of 8000 means that less than 30 seconds worth of work would
> be discarded.

An alternate approach would be to expose Xapian's commit() method as
notmuch_database_commit(). That would not really conflict with this
series, but arguably makes it unnecessary, if the clients (e.g. the
notmuch CLI) add appropriate calls to notmuch_database_commit.

I'm not 100% sure, but I think I may want to add notmuch_database_commit
in any case, for some other cases where automatic committing doesn't
work so well.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Commit after some number of transactions
  2021-05-22  1:20 Commit after some number of transactions David Bremner
                   ` (5 preceding siblings ...)
  2021-05-24 17:14 ` Commit after some number of transactions David Bremner
@ 2021-06-27 17:11 ` David Bremner
  6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2021-06-27 17:11 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> The main rational is explained in the commit message to 
>
> [PATCH 4/5] lib: autocommit after some number of completed
>
> I'm not super-happy with the documentation in [5/5], as it explains
> things in terms of database concepts the user shouldn't really need to
> understand.
>

I have applied the series to master. Documentation improvements welcome.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-27 17:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-22  1:20 Commit after some number of transactions David Bremner
2021-05-22  1:20 ` [PATCH 1/5] database/close: remove misleading code / comment David Bremner
2021-05-22  1:20 ` [PATCH 2/5] lib/config: add NOTMUCH_CONFIG_AUTOCOMMIT David Bremner
2021-05-22  1:20 ` [PATCH 3/5] test: add known broken test for closing with open transaction David Bremner
2021-05-22  1:20 ` [PATCH 4/5] lib: autocommit after some number of completed transactions David Bremner
2021-05-22  1:20 ` [PATCH 5/5] doc: document database.autocommit variable David Bremner
2021-05-22 11:10   ` [PATCH] lib: update transaction documentation David Bremner
2021-05-24 17:14 ` Commit after some number of transactions David Bremner
2021-06-27 17:11 ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).