unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [DRAFT 0/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
@ 2023-02-25 17:46 Kevin Boulain
  2023-02-25 17:46 ` [DRAFT 1/1] " Kevin Boulain
  2023-02-26 12:08 ` [DRAFT 0/1] " David Bremner
  0 siblings, 2 replies; 6+ messages in thread
From: Kevin Boulain @ 2023-02-25 17:46 UTC (permalink / raw)
  To: notmuch; +Cc: Kevin Boulain

Hey,

Here's a draft patch previously discussed in
https://nmbug.notmuchmail.org/nmweb/show/87ilfvq3bc.fsf%40boula.in

See the commit's message for more information.

Unfortunaly this only covers two third of the usages of
Xapian::Query::MatchAll, in lib/query.cc and lib/regexp-fields.cc.
lib/parse-sexp.cc has the rest of it but I don't believe it's worth to
change until sfsexp also becomes thread-safe (and s-expressions are
currently optional).

For example, globals are used:
https://github.com/mjsottile/sfsexp/blob/master/src/parser.c#L81
Which will lead to segfaults (null pointer dereference) in
pd_allocate:
  #0  0x00007fc19ce20861 in pd_allocate () at parser.c:283
  #1  0x00007fc19ce21778 in cparse_sexp (str=0x7fc194018c40 "(from )", len=7, lc=0x0) at parser.c:863
  #2  0x00007fc19ce20c50 in parse_sexp (s=0x7fc194018c40 "(from )", len=7) at parser.c:486
  #3  0x00007fc19ce6f8a8 in _notmuch_sexp_string_to_xapian_query (notmuch=0x7fc194003120, querystr=0x7fc194018bd0 "(from )", output=...) at lib/parse-sexp.cc:723
  #4  0x00007fc19ce5e93f in _notmuch_query_ensure_parsed_sexpr (query=0x7fc1940147b0) at lib/query.cc:240
  #5  0x00007fc19ce5e997 in _notmuch_query_ensure_parsed (query=0x7fc1940147b0) at lib/query.cc:258
  #6  0x00007fc19ce5ed89 in _notmuch_query_search_documents (query=0x7fc1940147b0, type=0x7fc19ce78af6 "mail", out=0x7fc19bd3bcc0) at lib/query.cc:362
  #7  0x00007fc19ce5ed2f in notmuch_query_search_messages (query=0x7fc1940147b0, out=0x7fc19bd3bcc0) at lib/query.cc:350
  ...
And looking at the function clearly shows this is due to a race
because it segfaults on an empty stack (l = NULL):
https://github.com/mjsottile/sfsexp/blob/master/src/parser.c#L283
When it just checked it wasn't:
https://github.com/mjsottile/sfsexp/blob/master/src/parser.c#L270

Thoughts? Should clients using the library gate every function call to
Notmuch behind a lock instead? I can also start a discussion in
sfsexp's issue tracker.

Kevin Boulain (1):
  lib: replace some uses of Query::MatchAll with thread-safe alternative

 configure            |  15 +++++-
 lib/query.cc         |   2 +-
 lib/regexp-fields.cc |   2 +-
 test/T810-tsan.sh    | 106 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 122 insertions(+), 3 deletions(-)
 create mode 100755 test/T810-tsan.sh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [DRAFT 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
  2023-02-25 17:46 [DRAFT 0/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative Kevin Boulain
@ 2023-02-25 17:46 ` Kevin Boulain
  2023-02-26 12:18   ` David Bremner
  2023-02-26 12:08 ` [DRAFT 0/1] " David Bremner
  1 sibling, 1 reply; 6+ messages in thread
From: Kevin Boulain @ 2023-02-25 17:46 UTC (permalink / raw)
  To: notmuch; +Cc: Kevin Boulain

This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.

None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
https://github.com/xapian/xapian/blob/4715de3a9fcee741587439dc3cc1d2ff01ffeaf2/xapian-core/include/xapian/query.h#L65

While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.

Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).

First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
  #0  0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141
  #1  0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176
  #2  0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225
  #3  0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260
  #4  0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361
  #5  0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349
  #6  0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934
  #7  0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754
  #8  0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653
  #9  0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
  ...

Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
  #0  0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #1  0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
  #2  0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #3  0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166
  #4  0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
  #5  0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
  #6  0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
  #7  0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353
  #8  0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569
  #9  0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937
  #10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447
  #11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
  ...

Printing Xapian::Query::MatchAll->internal.px->_refs in these
circumstances can help quickly identifying this scenario.

This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.

This is what can be expected from the tests when they fail:
   == stderr ==
  +==================
  +WARNING: ThreadSanitizer: data race (pid=207931)
  +  Read of size 1 at 0x7b10000001a0 by thread T2:
  +    #0 memcpy <null> (libtsan.so.2+0x62506)
  +    #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3)
  +
  +  Previous write of size 8 at 0x7b10000001a0 by thread T1:
  +    #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83)
  +    #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd)
  ...

---
 configure            |  15 +++++-
 lib/query.cc         |   2 +-
 lib/regexp-fields.cc |   2 +-
 test/T810-tsan.sh    | 106 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 122 insertions(+), 3 deletions(-)
 create mode 100755 test/T810-tsan.sh

diff --git a/configure b/configure
index c3629a73..152cc8be 100755
--- a/configure
+++ b/configure
@@ -422,6 +422,18 @@ else
 fi
 unset test_cmdline
 
+printf "C compiler supports address sanitizer... "
+test_cmdline="${CC} ${CFLAGS} ${CPPFLAGS} -fsanitize=thread minimal.c ${LDFLAGS} -o minimal"
+if ${test_cmdline} >/dev/null 2>&1 && ./minimal
+then
+    printf "Yes.\n"
+    have_tsan=1
+else
+    printf "Nope, skipping those tests.\n"
+    have_tsan=0
+fi
+unset test_cmdline
+
 printf "Reading libnotmuch version from source... "
 cat > _libversion.c <<EOF
 #include <stdio.h>
@@ -1590,8 +1602,9 @@ NOTMUCH_GMIME_VERIFY_WITH_SESSION_KEY=${gmime_verify_with_session_key}
 NOTMUCH_ZLIB_CFLAGS="${zlib_cflags}"
 NOTMUCH_ZLIB_LDFLAGS="${zlib_ldflags}"
 
-# Does the C compiler support the address sanitizer
+# Does the C compiler support the sanitizers
 NOTMUCH_HAVE_ASAN=${have_asan}
+NOTMUCH_HAVE_TSAN=${have_tsan}
 
 # do we have man pages?
 NOTMUCH_HAVE_MAN=$((have_sphinx))
diff --git a/lib/query.cc b/lib/query.cc
index 707f6222..348e1ec2 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -186,7 +186,7 @@ _notmuch_query_string_to_xapian_query (notmuch_database_t *notmuch,
 {
     try {
 	if (query_string == "" || query_string == "*") {
-	    output = Xapian::Query::MatchAll;
+	    output = Xapian::Query(std::string());
 	} else {
 	    output =
 		notmuch->query_parser->
diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 539915d8..89a88a4b 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -200,7 +200,7 @@ RegexpFieldProcessor::operator() (const std::string & str)
     if (str.empty ()) {
 	if (options & NOTMUCH_FIELD_PROBABILISTIC) {
 	    return Xapian::Query (Xapian::Query::OP_AND_NOT,
-				  Xapian::Query::MatchAll,
+				  Xapian::Query(std::string()),
 				  Xapian::Query (Xapian::Query::OP_WILDCARD, term_prefix));
 	} else {
 	    return Xapian::Query (term_prefix);
diff --git a/test/T810-tsan.sh b/test/T810-tsan.sh
new file mode 100755
index 00000000..129b7437
--- /dev/null
+++ b/test/T810-tsan.sh
@@ -0,0 +1,106 @@
+#!/usr/bin/env bash
+
+test_description='run code with TSan enabled against the library'
+# Note it is hard to ensure race conditions are deterministic so this
+# only provides best effort detection.
+
+. $(dirname "$0")/test-lib.sh || exit 1
+
+if [ $NOTMUCH_HAVE_TSAN -ne 1 ]; then
+    printf "Skipping due to missing TSan support\n"
+    test_done
+fi
+
+TEST_CFLAGS="-fsanitize=thread"
+
+cp -r ${MAIL_DIR} ${MAIL_DIR}-2
+
+test_begin_subtest "create"
+test_C ${MAIL_DIR} ${MAIL_DIR}-2 <<EOF
+#include <assert.h>
+#include <notmuch.h>
+#include <pthread.h>
+#include <stdio.h>
+
+void *thread (void *arg) {
+  char *mail_dir = arg;
+  /*
+   * Calls into notmuch_query_search_messages which was using the thread-unsafe
+   * Xapian::Query::MatchAll.
+   */
+  notmuch_status_t st = notmuch_database_create (mail_dir, NULL);
+  assert (st == NOTMUCH_STATUS_SUCCESS);
+  return NULL;
+}
+
+int main (int argc, char **argv) {
+  int ret;
+  pthread_t t1, t2;
+  ret = pthread_create (&t1, NULL, thread, argv[1]);
+  assert (ret == 0);
+  ret = pthread_create (&t2, NULL, thread, argv[2]);
+  assert (ret == 0);
+  ret = pthread_join (t1, NULL);
+  assert (ret == 0);
+  ret = pthread_join (t2, NULL);
+  assert (ret == 0);
+  return 0;
+}
+EOF
+cat <<EOF > EXPECTED
+== stdout ==
+== stderr ==
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+add_email_corpus
+rm -r ${MAIL_DIR}-2
+cp -r ${MAIL_DIR} ${MAIL_DIR}-2
+
+test_begin_subtest "query"
+test_C ${MAIL_DIR} ${MAIL_DIR}-2 <<EOF
+#include <assert.h>
+#include <notmuch.h>
+#include <pthread.h>
+#include <stdio.h>
+
+void *thread (void *arg) {
+  char *mail_dir = arg;
+  notmuch_database_t *db;
+  /*
+   * 'from' is NOTMUCH_FIELD_PROBABILISTIC | NOTMUCH_FIELD_PROCESSOR and an
+   * empty string gets us to RegexpFieldProcessor::operator which was using
+   * the tread-unsafe Xapian::Query::MatchAll.
+   */
+  notmuch_status_t st = notmuch_database_open_with_config (mail_dir,
+                                                           NOTMUCH_DATABASE_MODE_READ_ONLY,
+                                                           NULL, NULL, &db, NULL);
+  assert (st == NOTMUCH_STATUS_SUCCESS);
+  notmuch_query_t *query = notmuch_query_create (db, "from:\"\"");
+  notmuch_messages_t *messages;
+  st = notmuch_query_search_messages (query, &messages);
+  assert (st == NOTMUCH_STATUS_SUCCESS);
+  return NULL;
+}
+
+int main (int argc, char **argv) {
+  int ret;
+  pthread_t t1, t2;
+  ret = pthread_create (&t1, NULL, thread, argv[1]);
+  assert (ret == 0);
+  ret = pthread_create (&t2, NULL, thread, argv[2]);
+  assert (ret == 0);
+  ret = pthread_join (t1, NULL);
+  assert (ret == 0);
+  ret = pthread_join (t2, NULL);
+  assert (ret == 0);
+  return 0;
+}
+EOF
+cat <<EOF > EXPECTED
+== stdout ==
+== stderr ==
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [DRAFT 0/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
  2023-02-25 17:46 [DRAFT 0/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative Kevin Boulain
  2023-02-25 17:46 ` [DRAFT 1/1] " Kevin Boulain
@ 2023-02-26 12:08 ` David Bremner
  2023-03-02 17:59   ` [PATCH 1/1] " Kevin Boulain
  1 sibling, 1 reply; 6+ messages in thread
From: David Bremner @ 2023-02-26 12:08 UTC (permalink / raw)
  To: Kevin Boulain, notmuch; +Cc: Kevin Boulain

Kevin Boulain <kevin@boula.in> writes:

>
> Thoughts? Should clients using the library gate every function call to
> Notmuch behind a lock instead?

Ideally that should not be necessary. On the other hand, it may take some time to fix
issues with concurrent access.

> I can also start a discussion in sfsexp's issue tracker.

Yes please. I guess in worst case we can implment come kind of
concurrency control to sfsexp (wrap calls in libnotmuch inside some
mutex), although that sounds pretty far from ideal.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [DRAFT 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
  2023-02-25 17:46 ` [DRAFT 1/1] " Kevin Boulain
@ 2023-02-26 12:18   ` David Bremner
  0 siblings, 0 replies; 6+ messages in thread
From: David Bremner @ 2023-02-26 12:18 UTC (permalink / raw)
  To: Kevin Boulain, notmuch; +Cc: Kevin Boulain

Kevin Boulain <kevin@boula.in> writes:

> This is what can be expected from the tests when they fail:
>    == stderr ==
>   +==================
>   +WARNING: ThreadSanitizer: data race (pid=207931)
>   +  Read of size 1 at 0x7b10000001a0 by thread T2:
>   +    #0 memcpy <null> (libtsan.so.2+0x62506)
>   +    #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3)
>   +
>   +  Previous write of size 8 at 0x7b10000001a0 by thread T1:
>   +    #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83)
>   +    #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd)
>   ...

I don't know what the difference between our environments is, but these
tests are failing in glib for me.

      +==================
	+WARNING: ThreadSanitizer: data race (pid=392122)
	+  Atomic read of size 1 at 0x7b10000025c0 by thread T2:
	+    #0 pthread_rwlock_rdlock ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1406 (libtsan.so.2+0x447f1)
	+    #1 g_rw_lock_reader_lock <null> (libglib-2.0.so.0+0xa8b84)
	+
	+  Previous write of size 8 at 0x7b10000025c0 by thread T1:
	+    #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:647 (libtsan.so.2+0x3ebb8)
	+    #1 <null> <null> (libglib-2.0.so.0+0xa880a)
	+
        ...

FWIW I have libglib2.0-0 version 2.74.5-1 on Debian.

> +printf "C compiler supports address sanitizer... "
s/address/thread/

> +test_cmdline="${CC} ${CFLAGS} ${CPPFLAGS} -fsanitize=thread minimal.c ${LDFLAGS} -o minimal"
> +if ${test_cmdline} >/dev/null 2>&1 && ./minimal

> @@ -186,7 +186,7 @@ _notmuch_query_string_to_xapian_query (notmuch_database_t *notmuch,
>  {
>      try {
>  	if (query_string == "" || query_string == "*") {
> -	    output = Xapian::Query::MatchAll;
> +	    output = Xapian::Query(std::string());

I think this probably deserves a brief comment like "use thread-safe
alternative to Query::MatchAll.

That is why I was thinking a preprocessor macro / inline-function might
be appropriate, to only need that comment in one place.

> +test_begin_subtest "create"
> +test_C ${MAIL_DIR} ${MAIL_DIR}-2 <<EOF
> +#include <assert.h>
> +#include <notmuch.h>
> +#include <pthread.h>
> +#include <stdio.h>

I guess we will find out if any changes are needed to test harness to
portably support pthreads.  BSD and macOS are the usual places that have
slightly different expectations with respect to include files and
libraries; I don't know if that applies here.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
  2023-02-26 12:08 ` [DRAFT 0/1] " David Bremner
@ 2023-03-02 17:59   ` Kevin Boulain
  2023-03-31 11:20     ` David Bremner
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Boulain @ 2023-03-02 17:59 UTC (permalink / raw)
  To: notmuch; +Cc: Kevin Boulain

This replaces two instances of Xapian::Query::MatchAll with the
equivalent but thread-safe alternative Xapian::Query(std::string()).
Xapian::Query::MatchAll maintains an internal pointer to a refcounted
Xapian::Internal::QueryTerm.

None of this is thread-safe but that wouldn't be an issue if
Xapian::Query::MatchAll wasn't static. Because it's static, the
refcounting goes awry when Notmuch is called from multiple threads.
This is actually documented by Xapian:
https://github.com/xapian/xapian/blob/4715de3a9fcee741587439dc3cc1d2ff01ffeaf2/xapian-core/include/xapian/query.h#L65

While static, Xapian::Query::MatchNothing is safe because it doesn't
maintain an internal object and as such, doesn't use references.

Two best-effort tests making use of TSan were added to showcase the
issue (I couldn't figure out a way to deterministically reproduce it
without making an unmaintainable mess).

First, when two databases are created in parallel, a query that uses
Xapian::Query::MatchAll is made (lib/query.cc), resulting in the
following backtrace on a segfault:
  #0  0x00007ffff76822af in Xapian::Query::get_terms_begin (this=0x7fffe80137f0) at api/query.cc:141
  #1  0x00007ffff7f933f5 in _notmuch_query_cache_terms (query=0x7fffe80137c0) at lib/query.cc:176
  #2  0x00007ffff7f93784 in _notmuch_query_ensure_parsed_xapian (query=0x7fffe80137c0) at lib/query.cc:225
  #3  0x00007ffff7f9381a in _notmuch_query_ensure_parsed (query=0x7fffe80137c0) at lib/query.cc:260
  #4  0x00007ffff7f93bfe in _notmuch_query_search_documents (query=0x7fffe80137c0, type=0x7ffff7fa9b1e "mail", out=0x7ffff666da18) at lib/query.cc:361
  #5  0x00007ffff7f93ba4 in notmuch_query_search_messages (query=0x7fffe80137c0, out=0x7ffff666da18) at lib/query.cc:349
  #6  0x00007ffff7f83d98 in notmuch_database_upgrade (notmuch=0x7fffe8000bd0, progress_notify=0x0, closure=0x0) at lib/database.cc:934
  #7  0x00007ffff7fa110f in notmuch_database_create_with_config (database_path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", config_path=0x7ffff7faab3c "", profile=0x0, database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:754
  #8  0x00007ffff7fa0d6f in notmuch_database_create_verbose (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0, status_string=0x7ffff666dc90) at lib/open.cc:653
  #9  0x00007ffff7fa0ceb in notmuch_database_create (path=0x7ffff666dcb0 "/tmp/notmuch.MZ2AGr", database=0x0) at lib/open.cc:637
  ...

Second, some queries would make use of Xapian::Query::MatchAll
(lib/regexp-fields.cc), resulting in the following backtrace on a
segfault:
  #0  0x00007f629828b690 in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800def0, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #1  0x00007f629828c260 in Xapian::Internal::QueryScaleWeight::gather_terms (this=0x7f628800df70, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1434
  #2  0x00007f629828b69f in Xapian::Internal::QueryBranch::gather_terms (this=0x7f628800dd90, void_terms=0x7f629726d5a0) at api/queryinternal.cc:1245
  #3  0x00007f6298282571 in Xapian::Query::get_unique_terms_begin (this=0x7f628800dcd8) at api/query.cc:166
  #4  0x00007f629841a59b in Xapian::Weight::Internal::accumulate_stats (this=0x7f628800dca0, subdb=..., rset=...) at weight/weightinternal.cc:86
  #5  0x00007f62983c15ba in LocalSubMatch::prepare_match (this=0x7f628800df20, nowait=true, total_stats=...) at matcher/localsubmatch.cc:172
  #6  0x00007f62983c8fcc in prepare_sub_matches (leaves=std::vector of length 1, capacity 1 = {...}, stats=...) at matcher/multimatch.cc:237
  #7  0x00007f62983c98a3 in MultiMatch::MultiMatch (this=0x7f629726d9a0, db_=..., query_=..., qlen=3, omrset=0x0, collapse_max_=0, collapse_key_=4294967295, percent_cutoff_=0, weight_cutoff_=0, order_=Xapian::Enquire::ASCENDING, sort_key_=0, sort_by_=Xapian::Enquire::Internal::VAL, sort_value_forward_=true, time_limit_=0, stats=..., weight_=0x7f6288008d50, matchspies_=std::vector of length 0, capacity 0, have_sorter=false, have_mdecider=false) at matcher/multimatch.cc:353
  #8  0x00007f629826fcba in Xapian::Enquire::Internal::get_mset (this=0x7f628800e0b0, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:569
  #9  0x00007f629827181c in Xapian::Enquire::get_mset (this=0x7f629726db80, first=0, maxitems=0, check_at_least=0, rset=0x0, mdecider=0x0) at api/omenquire.cc:937
  #10 0x00007f6298be529a in _notmuch_query_search_documents (query=0x7f6288009750, type=0x7f6298bfaafe "mail", out=0x7f629726dcc0) at lib/query.cc:447
  #11 0x00007f6298be4ae8 in notmuch_query_search_messages (query=0x7f6288009750, out=0x7f629726dcc0) at lib/query.cc:349
  ...

Printing Xapian::Query::MatchAll->internal.px->_refs in these
circumstances can help quickly identifying this scenario.

This is motivated by some test frameworks (like Rust's Cargo) that
runs unit tests in parallel and would easily encounter this issue,
unless client code gates every call to Notmuch behind a lock.

This is what can be expected from the tests when they fail:
   == stderr ==
  +==================
  +WARNING: ThreadSanitizer: data race (pid=207931)
  +  Read of size 1 at 0x7b10000001a0 by thread T2:
  +    #0 memcpy <null> (libtsan.so.2+0x62506)
  +    #1 void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) [clone .isra.0] <null> (libxapian.so.30+0x872b3)
  +
  +  Previous write of size 8 at 0x7b10000001a0 by thread T1:
  +    #0 operator new(unsigned long) <null> (libtsan.so.2+0x8ba83)
  +    #1 Xapian::Query::Query(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int) <null> (libxapian.so.30+0x855cd)
  ...

---

> I don't know what the difference between our environments is, but
> these tests are failing in glib for me.

I also got these failures a few times but I can't reproduce them
anymore... Fortunately I wrote a suppressions file when I still had
them so I've added that to the patch.

> That is why I was thinking a preprocessor macro / inline-function
> might be appropriate, to only need that comment in one place.

Sorry, done.

> I guess we will find out if any changes are needed to test harness
> to portably support pthreads. BSD and macOS are the usual places
> that have slightly different expectations with respect to include
> files and libraries; I don't know if that applies here.

I don't have a macOS handy but I ran the test on FreeBSD. I also fixed
T800-asan.sh so that it runs on FreeBSD too as they pass TEST_CFLAGS:
https://github.com/freebsd/freebsd-ports/blob/eabf244707d70a31c594dfa3953320267725514c/mail/notmuch/Makefile#L49
Should I try something else?

> > I can also start a discussion in sfsexp's issue tracker.
> Yes please.

It seems upstream is willing to fix that:
https://github.com/mjsottile/sfsexp/issues/21
Should we revisit later once some progress is made?

 configure                   | 15 +++++-
 lib/query.cc                |  3 +-
 lib/regexp-fields.cc        |  3 +-
 test/T800-asan.sh           |  2 +-
 test/T810-tsan.sh           | 92 +++++++++++++++++++++++++++++++++++++
 test/T810-tsan.suppressions |  5 ++
 util/xapian-extra.h         | 15 ++++++
 7 files changed, 131 insertions(+), 4 deletions(-)
 create mode 100755 test/T810-tsan.sh
 create mode 100644 test/T810-tsan.suppressions
 create mode 100644 util/xapian-extra.h

diff --git a/configure b/configure
index c3629a73..7afd08c7 100755
--- a/configure
+++ b/configure
@@ -422,6 +422,18 @@ else
 fi
 unset test_cmdline
 
+printf "C compiler supports thread sanitizer... "
+test_cmdline="${CC} ${CFLAGS} ${CPPFLAGS} -fsanitize=thread minimal.c ${LDFLAGS} -o minimal"
+if ${test_cmdline} >/dev/null 2>&1 && ./minimal
+then
+    printf "Yes.\n"
+    have_tsan=1
+else
+    printf "Nope, skipping those tests.\n"
+    have_tsan=0
+fi
+unset test_cmdline
+
 printf "Reading libnotmuch version from source... "
 cat > _libversion.c <<EOF
 #include <stdio.h>
@@ -1590,8 +1602,9 @@ NOTMUCH_GMIME_VERIFY_WITH_SESSION_KEY=${gmime_verify_with_session_key}
 NOTMUCH_ZLIB_CFLAGS="${zlib_cflags}"
 NOTMUCH_ZLIB_LDFLAGS="${zlib_ldflags}"
 
-# Does the C compiler support the address sanitizer
+# Does the C compiler support the sanitizers
 NOTMUCH_HAVE_ASAN=${have_asan}
+NOTMUCH_HAVE_TSAN=${have_tsan}
 
 # do we have man pages?
 NOTMUCH_HAVE_MAN=$((have_sphinx))
diff --git a/lib/query.cc b/lib/query.cc
index 707f6222..1c60c122 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -20,6 +20,7 @@
 
 #include "notmuch-private.h"
 #include "database-private.h"
+#include "xapian-extra.h"
 
 #include <glib.h> /* GHashTable, GPtrArray */
 
@@ -186,7 +187,7 @@ _notmuch_query_string_to_xapian_query (notmuch_database_t *notmuch,
 {
     try {
 	if (query_string == "" || query_string == "*") {
-	    output = Xapian::Query::MatchAll;
+	    output = xapian_query_match_all ();
 	} else {
 	    output =
 		notmuch->query_parser->
diff --git a/lib/regexp-fields.cc b/lib/regexp-fields.cc
index 539915d8..3a775261 100644
--- a/lib/regexp-fields.cc
+++ b/lib/regexp-fields.cc
@@ -25,6 +25,7 @@
 #include "regexp-fields.h"
 #include "notmuch-private.h"
 #include "database-private.h"
+#include "xapian-extra.h"
 
 notmuch_status_t
 compile_regex (regex_t &regexp, const char *str, std::string &msg)
@@ -200,7 +201,7 @@ RegexpFieldProcessor::operator() (const std::string & str)
     if (str.empty ()) {
 	if (options & NOTMUCH_FIELD_PROBABILISTIC) {
 	    return Xapian::Query (Xapian::Query::OP_AND_NOT,
-				  Xapian::Query::MatchAll,
+				  xapian_query_match_all (),
 				  Xapian::Query (Xapian::Query::OP_WILDCARD, term_prefix));
 	} else {
 	    return Xapian::Query (term_prefix);
diff --git a/test/T800-asan.sh b/test/T800-asan.sh
index 8607732e..5055c93e 100755
--- a/test/T800-asan.sh
+++ b/test/T800-asan.sh
@@ -9,7 +9,7 @@ fi
 
 add_email_corpus
 
-TEST_CFLAGS="-fsanitize=address"
+TEST_CFLAGS="${TEST_CFLAGS:-} -fsanitize=address"
 
 test_begin_subtest "open and destroy"
 test_C ${MAIL_DIR} ${NOTMUCH_CONFIG} <<EOF
diff --git a/test/T810-tsan.sh b/test/T810-tsan.sh
new file mode 100755
index 00000000..7e877b27
--- /dev/null
+++ b/test/T810-tsan.sh
@@ -0,0 +1,92 @@
+#!/usr/bin/env bash
+
+test_directory=$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)
+
+test_description='run code with TSan enabled against the library'
+# Note it is hard to ensure race conditions are deterministic so this
+# only provides best effort detection.
+
+. "$test_directory"/test-lib.sh || exit 1
+
+if [ $NOTMUCH_HAVE_TSAN -ne 1 ]; then
+    printf "Skipping due to missing TSan support\n"
+    test_done
+fi
+
+export TSAN_OPTIONS="suppressions=$test_directory/T810-tsan.suppressions"
+TEST_CFLAGS="${TEST_CFLAGS:-} -fsanitize=thread"
+
+cp -r ${MAIL_DIR} ${MAIL_DIR}-2
+
+test_begin_subtest "create"
+test_C ${MAIL_DIR} ${MAIL_DIR}-2 <<EOF
+#include <notmuch-test.h>
+#include <pthread.h>
+
+void *thread (void *arg) {
+  char *mail_dir = arg;
+  /*
+   * Calls into notmuch_query_search_messages which was using the thread-unsafe
+   * Xapian::Query::MatchAll.
+   */
+  EXPECT0(notmuch_database_create (mail_dir, NULL));
+  return NULL;
+}
+
+int main (int argc, char **argv) {
+  pthread_t t1, t2;
+  EXPECT0(pthread_create (&t1, NULL, thread, argv[1]));
+  EXPECT0(pthread_create (&t2, NULL, thread, argv[2]));
+  EXPECT0(pthread_join (t1, NULL));
+  EXPECT0(pthread_join (t2, NULL));
+  return 0;
+}
+EOF
+cat <<EOF > EXPECTED
+== stdout ==
+== stderr ==
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+add_email_corpus
+rm -r ${MAIL_DIR}-2
+cp -r ${MAIL_DIR} ${MAIL_DIR}-2
+
+test_begin_subtest "query"
+test_C ${MAIL_DIR} ${MAIL_DIR}-2 <<EOF
+#include <notmuch-test.h>
+#include <pthread.h>
+
+void *thread (void *arg) {
+  char *mail_dir = arg;
+  notmuch_database_t *db;
+  /*
+   * 'from' is NOTMUCH_FIELD_PROBABILISTIC | NOTMUCH_FIELD_PROCESSOR and an
+   * empty string gets us to RegexpFieldProcessor::operator which was using
+   * the tread-unsafe Xapian::Query::MatchAll.
+   */
+  EXPECT0(notmuch_database_open_with_config (mail_dir,
+                                             NOTMUCH_DATABASE_MODE_READ_ONLY,
+                                             NULL, NULL, &db, NULL));
+  notmuch_query_t *query = notmuch_query_create (db, "from:\"\"");
+  notmuch_messages_t *messages;
+  EXPECT0(notmuch_query_search_messages (query, &messages));
+  return NULL;
+}
+
+int main (int argc, char **argv) {
+  pthread_t t1, t2;
+  EXPECT0(pthread_create (&t1, NULL, thread, argv[1]));
+  EXPECT0(pthread_create (&t2, NULL, thread, argv[2]));
+  EXPECT0(pthread_join (t1, NULL));
+  EXPECT0(pthread_join (t2, NULL));
+  return 0;
+}
+EOF
+cat <<EOF > EXPECTED
+== stdout ==
+== stderr ==
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
diff --git a/test/T810-tsan.suppressions b/test/T810-tsan.suppressions
new file mode 100644
index 00000000..dbd16a94
--- /dev/null
+++ b/test/T810-tsan.suppressions
@@ -0,0 +1,5 @@
+# It's unclear how TSan-friendly GLib is:
+# https://gitlab.gnome.org/GNOME/glib/-/issues/1672
+race:g_rw_lock_reader_lock
+# https://gitlab.gnome.org/GNOME/glib/-/issues/1952
+race:g_slice_alloc0
diff --git a/util/xapian-extra.h b/util/xapian-extra.h
new file mode 100644
index 00000000..39c7f48f
--- /dev/null
+++ b/util/xapian-extra.h
@@ -0,0 +1,15 @@
+#ifndef _XAPIAN_EXTRA_H
+#define _XAPIAN_EXTRA_H
+
+#include <string>
+#include <xapian.h>
+
+inline Xapian::Query
+xapian_query_match_all (void)
+{
+    // Xapian::Query::MatchAll isn't thread safe (a static object with reference
+    // counting) so instead reconstruct the equivalent on demand.
+    return Xapian::Query (std::string ());
+}
+
+#endif

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative
  2023-03-02 17:59   ` [PATCH 1/1] " Kevin Boulain
@ 2023-03-31 11:20     ` David Bremner
  0 siblings, 0 replies; 6+ messages in thread
From: David Bremner @ 2023-03-31 11:20 UTC (permalink / raw)
  To: Kevin Boulain, notmuch

Kevin Boulain <kevin@boula.in> writes:

> This replaces two instances of Xapian::Query::MatchAll with the
> equivalent but thread-safe alternative Xapian::Query(std::string()).
> Xapian::Query::MatchAll maintains an internal pointer to a refcounted
> Xapian::Internal::QueryTerm.
>

Applied to master, thanks, and sorry for the delay.

By the way I see sfsexp has merged your related PR last week.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-31 11:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-25 17:46 [DRAFT 0/1] lib: replace some uses of Query::MatchAll with a thread-safe alternative Kevin Boulain
2023-02-25 17:46 ` [DRAFT 1/1] " Kevin Boulain
2023-02-26 12:18   ` David Bremner
2023-02-26 12:08 ` [DRAFT 0/1] " David Bremner
2023-03-02 17:59   ` [PATCH 1/1] " Kevin Boulain
2023-03-31 11:20     ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).