unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Thread subqueries
@ 2018-05-05 16:05 David Bremner
  2018-05-05 16:05 ` [PATCH 1/4] lib: add thread subqueries David Bremner
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: David Bremner @ 2018-05-05 16:05 UTC (permalink / raw)
  To: notmuch

This is the first non-WIP version of this series. It adds a small
optimization (something like a 10% speedup on SSD), and some
documentation and tests.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] lib: add thread subqueries.
  2018-05-05 16:05 Thread subqueries David Bremner
@ 2018-05-05 16:05 ` David Bremner
  2018-05-06 17:59   ` Jani Nikula
  2018-05-05 16:05 ` [PATCH 2/4] perf-test: add simple test for " David Bremner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2018-05-05 16:05 UTC (permalink / raw)
  To: notmuch

This change allows queries of the form

 thread:{from:me} and thread:{from:jian} and not thread:{from:dave}

This is still somewhat brute-force, but it's a big improvement over
both the shell script solution and the previous proposal [1], because it
does not build the whole thread structure just generate a
query. A further potential optimization is to replace the calls to
notmuch with more specialized Xapian code; in particular it's not
likely that reading all of the message metadata is a win here.

[1]: id:20170820213240.20526-1-david@tethera.net
---
 lib/Makefile.local           |  3 +-
 lib/database.cc              |  6 +++-
 lib/thread-fp.cc             | 67 ++++++++++++++++++++++++++++++++++++
 lib/thread-fp.h              | 42 ++++++++++++++++++++++
 test/T585-thread-subquery.sh | 46 +++++++++++++++++++++++++
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 lib/thread-fp.cc
 create mode 100644 lib/thread-fp.h
 create mode 100755 test/T585-thread-subquery.sh

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8aa03891..5dc057c0 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -58,7 +58,8 @@ libnotmuch_cxx_srcs =		\
 	$(dir)/query-fp.cc      \
 	$(dir)/config.cc	\
 	$(dir)/regexp-fields.cc	\
-	$(dir)/thread.cc
+	$(dir)/thread.cc \
+	$(dir)/thread-fp.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
 
diff --git a/lib/database.cc b/lib/database.cc
index 02444e09..9cf8062c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "thread-fp.h"
 #include "regexp-fields.h"
 #include "string-util.h"
 
@@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
-    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL },
+    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "is",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 	    fp = (new DateFieldProcessor())->release ();
 	else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
 	    fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
+	else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
+	    fp = (new ThreadFieldProcessor (*notmuch->query_parser, notmuch))->release ();
 	else
 	    fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
 					    *notmuch->query_parser, notmuch))->release ();
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
new file mode 100644
index 00000000..dd292bf6
--- /dev/null
+++ b/lib/thread-fp.cc
@@ -0,0 +1,67 @@
+/* thread-fp.cc - "thread:" field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2018 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#include "database-private.h"
+#include "thread-fp.h"
+#include <iostream>
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+
+Xapian::Query
+ThreadFieldProcessor::operator() (const std::string & str)
+{
+    notmuch_status_t status;
+    const char *thread_prefix = _find_prefix ("thread");
+
+    if (str.at (0) == '{') {
+	if (str.length () > 1 && str.at (str.size () - 1) == '}') {
+	    std::string subquery_str = str.substr (1, str.size () - 2);
+	    notmuch_query_t *subquery = notmuch_query_create (notmuch, subquery_str.c_str ());
+	    notmuch_messages_t *messages;
+	    std::set<std::string> terms;
+
+	    if (! subquery)
+		throw Xapian::QueryParserError ("failed to create subquery for '" + subquery_str + "'");
+
+	    status = notmuch_query_search_messages (subquery, &messages);
+	    if (status)
+		throw Xapian::QueryParserError ("failed to search messages for '" + subquery_str + "'");
+
+	    for (; notmuch_messages_valid (messages); notmuch_messages_move_to_next (messages)) {
+		std::string term = thread_prefix;
+		notmuch_message_t *message;
+		message = notmuch_messages_get (messages);
+		term += notmuch_message_get_thread_id (message);
+		terms.insert (term);
+	    }
+	    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
+	} else {
+	    throw Xapian::QueryParserError ("missing } in '" + str + "'");
+	}
+    } else {
+	/* literal thread id */
+	std::string term = thread_prefix + str;
+	return Xapian::Query (term);
+    }
+
+}
+#endif
diff --git a/lib/thread-fp.h b/lib/thread-fp.h
new file mode 100644
index 00000000..13725978
--- /dev/null
+++ b/lib/thread-fp.h
@@ -0,0 +1,42 @@
+/* thread-fp.h - thread field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 David Bremner
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ *
+ * Author: David Bremner <david@tethera.net>
+ */
+
+#ifndef NOTMUCH_THREAD_FP_H
+#define NOTMUCH_THREAD_FP_H
+
+#include <xapian.h>
+#include "notmuch.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+ protected:
+    Xapian::QueryParser &parser;
+    notmuch_database_t *notmuch;
+
+ public:
+    ThreadFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
+	: parser(parser_), notmuch(notmuch_) { };
+
+    Xapian::Query operator()(const std::string & str);
+};
+#endif
+#endif /* NOTMUCH_THREAD_FP_H */
diff --git a/test/T585-thread-subquery.sh b/test/T585-thread-subquery.sh
new file mode 100755
index 00000000..71ced149
--- /dev/null
+++ b/test/T585-thread-subquery.sh
@@ -0,0 +1,46 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2018 David Bremner
+#
+
+test_description='test of searching by using thread subqueries'
+
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_email_corpus
+
+test_begin_subtest "Basic query that matches no messages"
+count=$(notmuch count from:keithp and to:keithp)
+test_expect_equal 0 "$count"
+
+test_begin_subtest "Same query against threads"
+notmuch search thread:{from:keithp} and thread:{to:keithp} | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Mix thread and non-threads query"
+notmuch search thread:{from:keithp} and to:keithp | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [1/7] Lars Kellogg-Stedman| Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Compound subquery"
+notmuch search 'thread:"{from:keithp and date:2009}" and thread:{to:keithp}' | notmuch_search_sanitize > OUTPUT
+cat<<EOF > EXPECTED
+thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "Syntax/quoting error in subquery"
+notmuch search 'thread:{from:keithp and date:2009} and thread:{to:keithp}' 1>OUTPUT 2>&1
+cat<<EOF > EXPECTED
+notmuch search: A Xapian exception occurred
+A Xapian exception occurred parsing query: missing } in '{from:keithp'
+Query string was: thread:{from:keithp and date:2009} and thread:{to:keithp}
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] perf-test: add simple test for thread subqueries
  2018-05-05 16:05 Thread subqueries David Bremner
  2018-05-05 16:05 ` [PATCH 1/4] lib: add thread subqueries David Bremner
@ 2018-05-05 16:05 ` David Bremner
  2018-05-05 16:05 ` [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery David Bremner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: David Bremner @ 2018-05-05 16:05 UTC (permalink / raw)
  To: notmuch

This is not a particularly sensible query, but thread:{date:2010} is a
good way to generate fairly large intermediate queries.
---
 performance-test/T04-thread-subquery.sh | 13 +++++++++++++
 1 file changed, 13 insertions(+)
 create mode 100755 performance-test/T04-thread-subquery.sh

diff --git a/performance-test/T04-thread-subquery.sh b/performance-test/T04-thread-subquery.sh
new file mode 100755
index 00000000..665d5a64
--- /dev/null
+++ b/performance-test/T04-thread-subquery.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+test_description='thread subqueries'
+
+. $(dirname "$0")/perf-test-lib.sh || exit 1
+
+time_start
+
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+time_run "search thread:{} ..." "notmuch search thread:{date:2010} and thread:{from:linus}"
+
+time_done
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery
  2018-05-05 16:05 Thread subqueries David Bremner
  2018-05-05 16:05 ` [PATCH 1/4] lib: add thread subqueries David Bremner
  2018-05-05 16:05 ` [PATCH 2/4] perf-test: add simple test for " David Bremner
@ 2018-05-05 16:05 ` David Bremner
  2018-05-06 18:03   ` Jani Nikula
  2018-05-05 16:05 ` [PATCH 4/4] doc: document thread subqueries David Bremner
  2018-05-07 12:09 ` Thread subqueries David Bremner
  4 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2018-05-05 16:05 UTC (permalink / raw)
  To: notmuch

The observation is that we are only using the messages to get there
thread_id, which is kindof a pessimal access pattern for the current
notmuch_message_get_thread_id
---
 lib/message.cc        | 17 +++++++++++++++++
 lib/notmuch-private.h |  4 ++++
 lib/thread-fp.cc      |  2 +-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index d5db89b6..b2067076 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -318,6 +318,23 @@ _notmuch_message_get_term (notmuch_message_t *message,
     return value;
 }
 
+/*
+ * For special applications where we only want the thread id, reading
+ * in all metadata is a heavy I/O penalty.
+ */
+const char *
+_notmuch_message_get_thread_id_only (notmuch_message_t *message)
+{
+
+    Xapian::TermIterator i = message->doc.termlist_begin ();
+    Xapian::TermIterator end = message->doc.termlist_end ();
+
+    message->thread_id = _notmuch_message_get_term (message, i, end,
+						    _find_prefix ("thread"));
+    return message->thread_id;
+}
+
+
 static void
 _notmuch_message_ensure_metadata (notmuch_message_t *message, void *field)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 1093429c..4598577f 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -537,6 +537,10 @@ _notmuch_message_database (notmuch_message_t *message);
 
 void
 _notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
+
+const char *
+_notmuch_message_get_thread_id_only(notmuch_message_t *message);
+
 /* sha1.c */
 
 char *
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
index dd292bf6..661d00dd 100644
--- a/lib/thread-fp.cc
+++ b/lib/thread-fp.cc
@@ -50,7 +50,7 @@ ThreadFieldProcessor::operator() (const std::string & str)
 		std::string term = thread_prefix;
 		notmuch_message_t *message;
 		message = notmuch_messages_get (messages);
-		term += notmuch_message_get_thread_id (message);
+		term += _notmuch_message_get_thread_id_only (message);
 		terms.insert (term);
 	    }
 	    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] doc: document thread subqueries
  2018-05-05 16:05 Thread subqueries David Bremner
                   ` (2 preceding siblings ...)
  2018-05-05 16:05 ` [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery David Bremner
@ 2018-05-05 16:05 ` David Bremner
  2018-05-06 18:05   ` Jani Nikula
  2018-05-07 12:09 ` Thread subqueries David Bremner
  4 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2018-05-05 16:05 UTC (permalink / raw)
  To: notmuch

Mention both performance and quoting issues.
---
 doc/man7/notmuch-search-terms.rst | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index 248444e3..ec999eed 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -83,6 +83,22 @@ thread:<thread-id>
     messages). These thread ID values can be seen in the first column
     of output from **notmuch search**
 
+thread:{<notmuch query>}
+    If notmuch is built with **Xapian Field Processors** (see below),
+    threads may be searched for indirectly by providing an arbitrary
+    notmuch query in **{}**. For example, the following returns
+    threads containing a message from mallory and one (not neccesarily
+    the same message) with Subject containing the word "crypto".
+
+    ::
+
+       % notmuch search 'thread:"{from:mallory}" and thread:"{subject:crypto}"'
+
+    The performance of such queries can vary wildly. To understand
+    this, the user should think of the query **thread:{<something>}**
+    as expanding to all of the thread IDs which match **<something>**;
+    notmuch then performs a second search using the expanded query.
+
 path:<directory-path> or path:<directory-path>/** or path:/<regex>/
     The **path:** prefix searches for email messages that are in
     particular directories within the mail store. The directory must
@@ -277,8 +293,8 @@ Quoting
 -------
 
 Double quotes are also used by the notmuch query parser to protect
-boolean terms or regular expressions containing spaces or other
-special characters, e.g.
+boolean terms, regular expressions, or subqueries containing spaces or
+other special characters, e.g.
 
 ::
 
@@ -288,12 +304,17 @@ special characters, e.g.
 
    folder:"/^.*/(Junk|Spam)$/"
 
+::
+
+   thread:"{from:mallory and date:2009}"
+
 As with phrases, you need to protect the double quotes from the shell
 e.g.
 
 ::
 
    % notmuch search 'folder:"/^.*/(Junk|Spam)$/"'
+   % notmuch search 'thread:"{from:mallory and date:2009}" and thread:{to:mallory}'
 
 DATE AND TIME SEARCH
 ====================
@@ -435,6 +456,7 @@ Currently the following features require field processor support:
 - non-range date queries, e.g. "date:today"
 - named queries e.g. "query:my_special_query"
 - regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
+- thread subqueries, e.g. "thread:{from:bob}"
 
 SEE ALSO
 ========
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] lib: add thread subqueries.
  2018-05-05 16:05 ` [PATCH 1/4] lib: add thread subqueries David Bremner
@ 2018-05-06 17:59   ` Jani Nikula
  0 siblings, 0 replies; 15+ messages in thread
From: Jani Nikula @ 2018-05-06 17:59 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sat, 05 May 2018, David Bremner <david@tethera.net> wrote:
> This change allows queries of the form
>
>  thread:{from:me} and thread:{from:jian} and not thread:{from:dave}
>
> This is still somewhat brute-force, but it's a big improvement over
> both the shell script solution and the previous proposal [1], because it
> does not build the whole thread structure just generate a
> query. A further potential optimization is to replace the calls to
> notmuch with more specialized Xapian code; in particular it's not
> likely that reading all of the message metadata is a win here.
>
> [1]: id:20170820213240.20526-1-david@tethera.net
> ---
>  lib/Makefile.local           |  3 +-
>  lib/database.cc              |  6 +++-
>  lib/thread-fp.cc             | 67 ++++++++++++++++++++++++++++++++++++
>  lib/thread-fp.h              | 42 ++++++++++++++++++++++
>  test/T585-thread-subquery.sh | 46 +++++++++++++++++++++++++
>  5 files changed, 162 insertions(+), 2 deletions(-)
>  create mode 100644 lib/thread-fp.cc
>  create mode 100644 lib/thread-fp.h
>  create mode 100755 test/T585-thread-subquery.sh
>
> diff --git a/lib/Makefile.local b/lib/Makefile.local
> index 8aa03891..5dc057c0 100644
> --- a/lib/Makefile.local
> +++ b/lib/Makefile.local
> @@ -58,7 +58,8 @@ libnotmuch_cxx_srcs =		\
>  	$(dir)/query-fp.cc      \
>  	$(dir)/config.cc	\
>  	$(dir)/regexp-fields.cc	\
> -	$(dir)/thread.cc
> +	$(dir)/thread.cc \
> +	$(dir)/thread-fp.cc
>  
>  libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
>  
> diff --git a/lib/database.cc b/lib/database.cc
> index 02444e09..9cf8062c 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -21,6 +21,7 @@
>  #include "database-private.h"
>  #include "parse-time-vrp.h"
>  #include "query-fp.h"
> +#include "thread-fp.h"
>  #include "regexp-fields.h"
>  #include "string-util.h"
>  
> @@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
>      { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
>      { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
>      { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
> -    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL },
> +    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
> +						NOTMUCH_FIELD_PROCESSOR },
>      { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
>  						NOTMUCH_FIELD_PROCESSOR },
>      { "is",			"K",		NOTMUCH_FIELD_EXTERNAL |
> @@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
>  	    fp = (new DateFieldProcessor())->release ();
>  	else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
>  	    fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
> +	else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
> +	    fp = (new ThreadFieldProcessor (*notmuch->query_parser, notmuch))->release ();
>  	else
>  	    fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
>  					    *notmuch->query_parser, notmuch))->release ();
> diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
> new file mode 100644
> index 00000000..dd292bf6
> --- /dev/null
> +++ b/lib/thread-fp.cc
> @@ -0,0 +1,67 @@
> +/* thread-fp.cc - "thread:" field processor glue
> + *
> + * This file is part of notmuch.
> + *
> + * Copyright © 2018 David Bremner
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 3 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see https://www.gnu.org/licenses/ .
> + *
> + * Author: David Bremner <david@tethera.net>
> + */
> +
> +#include "database-private.h"
> +#include "thread-fp.h"
> +#include <iostream>
> +
> +#if HAVE_XAPIAN_FIELD_PROCESSOR
> +
> +Xapian::Query
> +ThreadFieldProcessor::operator() (const std::string & str)
> +{
> +    notmuch_status_t status;
> +    const char *thread_prefix = _find_prefix ("thread");
> +
> +    if (str.at (0) == '{') {
> +	if (str.length () > 1 && str.at (str.size () - 1) == '}') {

IIUC .length() and .size() are the same thing, but it's confusing to see
them both used on the same line.

Nitpick, I always favor dealing with error cases first, so you can do
the happy day scenario with less indent. So I'd check the opposite,
throw the error, and continue without the else. YMMV.

Otherwise, LGTM.

> +	    std::string subquery_str = str.substr (1, str.size () - 2);
> +	    notmuch_query_t *subquery = notmuch_query_create (notmuch, subquery_str.c_str ());
> +	    notmuch_messages_t *messages;
> +	    std::set<std::string> terms;
> +
> +	    if (! subquery)
> +		throw Xapian::QueryParserError ("failed to create subquery for '" + subquery_str + "'");
> +
> +	    status = notmuch_query_search_messages (subquery, &messages);
> +	    if (status)
> +		throw Xapian::QueryParserError ("failed to search messages for '" + subquery_str + "'");
> +
> +	    for (; notmuch_messages_valid (messages); notmuch_messages_move_to_next (messages)) {
> +		std::string term = thread_prefix;
> +		notmuch_message_t *message;
> +		message = notmuch_messages_get (messages);
> +		term += notmuch_message_get_thread_id (message);
> +		terms.insert (term);
> +	    }
> +	    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
> +	} else {
> +	    throw Xapian::QueryParserError ("missing } in '" + str + "'");
> +	}
> +    } else {
> +	/* literal thread id */
> +	std::string term = thread_prefix + str;
> +	return Xapian::Query (term);
> +    }
> +
> +}
> +#endif
> diff --git a/lib/thread-fp.h b/lib/thread-fp.h
> new file mode 100644
> index 00000000..13725978
> --- /dev/null
> +++ b/lib/thread-fp.h
> @@ -0,0 +1,42 @@
> +/* thread-fp.h - thread field processor glue
> + *
> + * This file is part of notmuch.
> + *
> + * Copyright © 2017 David Bremner
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 3 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see https://www.gnu.org/licenses/ .
> + *
> + * Author: David Bremner <david@tethera.net>
> + */
> +
> +#ifndef NOTMUCH_THREAD_FP_H
> +#define NOTMUCH_THREAD_FP_H
> +
> +#include <xapian.h>
> +#include "notmuch.h"
> +
> +#if HAVE_XAPIAN_FIELD_PROCESSOR
> +class ThreadFieldProcessor : public Xapian::FieldProcessor {
> + protected:
> +    Xapian::QueryParser &parser;
> +    notmuch_database_t *notmuch;
> +
> + public:
> +    ThreadFieldProcessor (Xapian::QueryParser &parser_, notmuch_database_t *notmuch_)
> +	: parser(parser_), notmuch(notmuch_) { };
> +
> +    Xapian::Query operator()(const std::string & str);
> +};
> +#endif
> +#endif /* NOTMUCH_THREAD_FP_H */
> diff --git a/test/T585-thread-subquery.sh b/test/T585-thread-subquery.sh
> new file mode 100755
> index 00000000..71ced149
> --- /dev/null
> +++ b/test/T585-thread-subquery.sh
> @@ -0,0 +1,46 @@
> +#!/usr/bin/env bash
> +#
> +# Copyright (c) 2018 David Bremner
> +#
> +
> +test_description='test of searching by using thread subqueries'
> +
> +. $(dirname "$0")/test-lib.sh || exit 1
> +
> +add_email_corpus
> +
> +test_begin_subtest "Basic query that matches no messages"
> +count=$(notmuch count from:keithp and to:keithp)
> +test_expect_equal 0 "$count"
> +
> +test_begin_subtest "Same query against threads"
> +notmuch search thread:{from:keithp} and thread:{to:keithp} | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Mix thread and non-threads query"
> +notmuch search thread:{from:keithp} and to:keithp | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [1/7] Lars Kellogg-Stedman| Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Compound subquery"
> +notmuch search 'thread:"{from:keithp and date:2009}" and thread:{to:keithp}' | notmuch_search_sanitize > OUTPUT
> +cat<<EOF > EXPECTED
> +thread:XXX   2009-11-18 [7/7] Lars Kellogg-Stedman, Mikhail Gusarov, Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_begin_subtest "Syntax/quoting error in subquery"
> +notmuch search 'thread:{from:keithp and date:2009} and thread:{to:keithp}' 1>OUTPUT 2>&1
> +cat<<EOF > EXPECTED
> +notmuch search: A Xapian exception occurred
> +A Xapian exception occurred parsing query: missing } in '{from:keithp'
> +Query string was: thread:{from:keithp and date:2009} and thread:{to:keithp}
> +EOF
> +test_expect_equal_file EXPECTED OUTPUT
> +
> +test_done
> -- 
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery
  2018-05-05 16:05 ` [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery David Bremner
@ 2018-05-06 18:03   ` Jani Nikula
  0 siblings, 0 replies; 15+ messages in thread
From: Jani Nikula @ 2018-05-06 18:03 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sat, 05 May 2018, David Bremner <david@tethera.net> wrote:
> The observation is that we are only using the messages to get there
> thread_id, which is kindof a pessimal access pattern for the current
> notmuch_message_get_thread_id

LGTM.

> ---
>  lib/message.cc        | 17 +++++++++++++++++
>  lib/notmuch-private.h |  4 ++++
>  lib/thread-fp.cc      |  2 +-
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/lib/message.cc b/lib/message.cc
> index d5db89b6..b2067076 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -318,6 +318,23 @@ _notmuch_message_get_term (notmuch_message_t *message,
>      return value;
>  }
>  
> +/*
> + * For special applications where we only want the thread id, reading
> + * in all metadata is a heavy I/O penalty.
> + */
> +const char *
> +_notmuch_message_get_thread_id_only (notmuch_message_t *message)
> +{
> +
> +    Xapian::TermIterator i = message->doc.termlist_begin ();
> +    Xapian::TermIterator end = message->doc.termlist_end ();
> +
> +    message->thread_id = _notmuch_message_get_term (message, i, end,
> +						    _find_prefix ("thread"));
> +    return message->thread_id;
> +}
> +
> +
>  static void
>  _notmuch_message_ensure_metadata (notmuch_message_t *message, void *field)
>  {
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 1093429c..4598577f 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -537,6 +537,10 @@ _notmuch_message_database (notmuch_message_t *message);
>  
>  void
>  _notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
> +
> +const char *
> +_notmuch_message_get_thread_id_only(notmuch_message_t *message);
> +
>  /* sha1.c */
>  
>  char *
> diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
> index dd292bf6..661d00dd 100644
> --- a/lib/thread-fp.cc
> +++ b/lib/thread-fp.cc
> @@ -50,7 +50,7 @@ ThreadFieldProcessor::operator() (const std::string & str)
>  		std::string term = thread_prefix;
>  		notmuch_message_t *message;
>  		message = notmuch_messages_get (messages);
> -		term += notmuch_message_get_thread_id (message);
> +		term += _notmuch_message_get_thread_id_only (message);
>  		terms.insert (term);
>  	    }
>  	    return Xapian::Query (Xapian::Query::OP_OR, terms.begin (), terms.end ());
> -- 
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] doc: document thread subqueries
  2018-05-05 16:05 ` [PATCH 4/4] doc: document thread subqueries David Bremner
@ 2018-05-06 18:05   ` Jani Nikula
  0 siblings, 0 replies; 15+ messages in thread
From: Jani Nikula @ 2018-05-06 18:05 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sat, 05 May 2018, David Bremner <david@tethera.net> wrote:
> Mention both performance and quoting issues.
> ---
>  doc/man7/notmuch-search-terms.rst | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
> index 248444e3..ec999eed 100644
> --- a/doc/man7/notmuch-search-terms.rst
> +++ b/doc/man7/notmuch-search-terms.rst
> @@ -83,6 +83,22 @@ thread:<thread-id>
>      messages). These thread ID values can be seen in the first column
>      of output from **notmuch search**
>  
> +thread:{<notmuch query>}
> +    If notmuch is built with **Xapian Field Processors** (see below),
> +    threads may be searched for indirectly by providing an arbitrary
> +    notmuch query in **{}**. For example, the following returns
> +    threads containing a message from mallory and one (not neccesarily

neccesarily typo.

Otherwise LGTM.

> +    the same message) with Subject containing the word "crypto".
> +
> +    ::
> +
> +       % notmuch search 'thread:"{from:mallory}" and thread:"{subject:crypto}"'
> +
> +    The performance of such queries can vary wildly. To understand
> +    this, the user should think of the query **thread:{<something>}**
> +    as expanding to all of the thread IDs which match **<something>**;
> +    notmuch then performs a second search using the expanded query.
> +
>  path:<directory-path> or path:<directory-path>/** or path:/<regex>/
>      The **path:** prefix searches for email messages that are in
>      particular directories within the mail store. The directory must
> @@ -277,8 +293,8 @@ Quoting
>  -------
>  
>  Double quotes are also used by the notmuch query parser to protect
> -boolean terms or regular expressions containing spaces or other
> -special characters, e.g.
> +boolean terms, regular expressions, or subqueries containing spaces or
> +other special characters, e.g.
>  
>  ::
>  
> @@ -288,12 +304,17 @@ special characters, e.g.
>  
>     folder:"/^.*/(Junk|Spam)$/"
>  
> +::
> +
> +   thread:"{from:mallory and date:2009}"
> +
>  As with phrases, you need to protect the double quotes from the shell
>  e.g.
>  
>  ::
>  
>     % notmuch search 'folder:"/^.*/(Junk|Spam)$/"'
> +   % notmuch search 'thread:"{from:mallory and date:2009}" and thread:{to:mallory}'
>  
>  DATE AND TIME SEARCH
>  ====================
> @@ -435,6 +456,7 @@ Currently the following features require field processor support:
>  - non-range date queries, e.g. "date:today"
>  - named queries e.g. "query:my_special_query"
>  - regular expression searches, e.g. "subject:/^\\[SPAM\\]/"
> +- thread subqueries, e.g. "thread:{from:bob}"
>  
>  SEE ALSO
>  ========
> -- 
> 2.17.0
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-05 16:05 Thread subqueries David Bremner
                   ` (3 preceding siblings ...)
  2018-05-05 16:05 ` [PATCH 4/4] doc: document thread subqueries David Bremner
@ 2018-05-07 12:09 ` David Bremner
  2018-05-07 12:39   ` Gaute Hope
  2018-05-11  8:43   ` Daniel Kahn Gillmor
  4 siblings, 2 replies; 15+ messages in thread
From: David Bremner @ 2018-05-07 12:09 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> This is the first non-WIP version of this series. It adds a small
> optimization (something like a 10% speedup on SSD), and some
> documentation and tests.

pushed to master, with Jani's suggestions.

d

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-07 12:09 ` Thread subqueries David Bremner
@ 2018-05-07 12:39   ` Gaute Hope
  2018-05-11  8:43   ` Daniel Kahn Gillmor
  1 sibling, 0 replies; 15+ messages in thread
From: Gaute Hope @ 2018-05-07 12:39 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

man. 7. mai 2018 kl. 14:09 skrev David Bremner <david@tethera.net>:

> David Bremner <david@tethera.net> writes:
>
> > This is the first non-WIP version of this series. It adds a small
> > optimization (something like a 10% speedup on SSD), and some
> > documentation and tests.
>
> pushed to master, with Jani's suggestions.


Looking forward to test this! Great effort!

Gaute

[-- Attachment #2: Type: text/html, Size: 934 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-07 12:09 ` Thread subqueries David Bremner
  2018-05-07 12:39   ` Gaute Hope
@ 2018-05-11  8:43   ` Daniel Kahn Gillmor
  2018-05-11 10:15     ` David Bremner
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Kahn Gillmor @ 2018-05-11  8:43 UTC (permalink / raw)
  To: David Bremner, notmuch

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

On Mon 2018-05-07 09:09:35 -0300, David Bremner wrote:
> David Bremner <david@tethera.net> writes:
>
>> This is the first non-WIP version of this series. It adds a small
>> optimization (something like a 10% speedup on SSD), and some
>> documentation and tests.
>
> pushed to master, with Jani's suggestions.

this is awesome.  thank you for pushing it forward!

I'm testing it out now and i am having trouble getting it to be properly
generic when the subquery has multiple terms.

0 dkg@alice:~$ notmuch count 'date:1month..now tag:dkg'
258
0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
notmuch count: A Xapian exception occurred
A Xapian exception occurred parsing query: missing } in '{date:1month..now'
Query string was: thread:{date:1month..now tag:dkg}
1 dkg@alice:~$ 

What i really want is of course something like:

    thread:{date:1month..now tag:dkg} tag:inbox

to find all the replies to threads i've recently participated in, but
that fails with the same error.

What am i missing?

   --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-11  8:43   ` Daniel Kahn Gillmor
@ 2018-05-11 10:15     ` David Bremner
  2018-05-11 15:41       ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 15+ messages in thread
From: David Bremner @ 2018-05-11 10:15 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, notmuch

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
> notmuch count: A Xapian exception occurred
> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
> Query string was: thread:{date:1month..now tag:dkg}
> 1 dkg@alice:~$ 

Pretty sure what you want here is

        $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

There is some related discussion in QUOTING in notmuch-search-terms(7),
and the thread:{} examples there all double quoting so they still work
if the terms are replaced by terms with spaces.

d

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-11 10:15     ` David Bremner
@ 2018-05-11 15:41       ` Daniel Kahn Gillmor
  2018-05-12 13:24         ` Tomi Ollila
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Kahn Gillmor @ 2018-05-11 15:41 UTC (permalink / raw)
  To: David Bremner, notmuch

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On Fri 2018-05-11 07:15:41 -0300, David Bremner wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
>> notmuch count: A Xapian exception occurred
>> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
>> Query string was: thread:{date:1month..now tag:dkg}
>> 1 dkg@alice:~$ 
>
> Pretty sure what you want here is
>
>         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

Thanks, yes, that's it.  I still find the quoting/assembling rules for
notmuch queries non-intuitive, but maybe one day i'll wrap my head
around them some day.  I certainly don't have any specific suggestions
for improvement.

This is a really useful feature, much appreciated!

        --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-11 15:41       ` Daniel Kahn Gillmor
@ 2018-05-12 13:24         ` Tomi Ollila
  2018-05-12 14:01           ` David Bremner
  0 siblings, 1 reply; 15+ messages in thread
From: Tomi Ollila @ 2018-05-12 13:24 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, David Bremner, notmuch

On Fri, May 11 2018, Daniel Kahn Gillmor wrote:

> On Fri 2018-05-11 07:15:41 -0300, David Bremner wrote:
>> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>>
>>> 0 dkg@alice:~$ notmuch count 'thread:{date:1month..now tag:dkg}'
>>> notmuch count: A Xapian exception occurred
>>> A Xapian exception occurred parsing query: missing } in '{date:1month..now'
>>> Query string was: thread:{date:1month..now tag:dkg}
>>> 1 dkg@alice:~$ 
>>
>> Pretty sure what you want here is
>>
>>         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

question: how does these differ (processing-wise):

         $ notmuch count  'thread:"date:1month..now tag:dkg"'
         $ notmuch count  'thread:{date:1month..now tag:dkg}'
         $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

understanding the reasons behind these might help to use these in desired
ways (or we could just say use "{...}" to get this to work).

Tomi

> Thanks, yes, that's it.  I still find the quoting/assembling rules for
> notmuch queries non-intuitive, but maybe one day i'll wrap my head
> around them some day.  I certainly don't have any specific suggestions
> for improvement.
>
> This is a really useful feature, much appreciated!
>
>         --dkg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Thread subqueries
  2018-05-12 13:24         ` Tomi Ollila
@ 2018-05-12 14:01           ` David Bremner
  0 siblings, 0 replies; 15+ messages in thread
From: David Bremner @ 2018-05-12 14:01 UTC (permalink / raw)
  To: Tomi Ollila, Daniel Kahn Gillmor, notmuch

Tomi Ollila <tomi.ollila@iki.fi> writes:
>
> question: how does these differ (processing-wise):
>
>          $ notmuch count  'thread:"date:1month..now tag:dkg"'

the thread field processor receives the string "date:1month..now tag:dkg"
(without the quotes) which it treats as a thread id, and doesn't match
anything

>          $ notmuch count  'thread:{date:1month..now tag:dkg}'

the t.f.p. receives the string "{date:1month..now"
(without quotes) because the top level query parser splits at spaces,
unless prevented by "". This it considers syntactically invalid, rather
than silently dropping the second term.

>          $ notmuch count 'thread:"{date:1month..now tag:dkg}"'

The t.f.p. receives the string "{date:1month..now tag:dkg}" (without
quotes). It notes the first and last character, and triggers a subquery
expansion.

The thing to keep in mind is that we have no control over the top level
"tokenization" by Xapian, except for using "".

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-05-12 14:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-05 16:05 Thread subqueries David Bremner
2018-05-05 16:05 ` [PATCH 1/4] lib: add thread subqueries David Bremner
2018-05-06 17:59   ` Jani Nikula
2018-05-05 16:05 ` [PATCH 2/4] perf-test: add simple test for " David Bremner
2018-05-05 16:05 ` [PATCH 3/4] lib: define specialized get_thread_id for use in thread subquery David Bremner
2018-05-06 18:03   ` Jani Nikula
2018-05-05 16:05 ` [PATCH 4/4] doc: document thread subqueries David Bremner
2018-05-06 18:05   ` Jani Nikula
2018-05-07 12:09 ` Thread subqueries David Bremner
2018-05-07 12:39   ` Gaute Hope
2018-05-11  8:43   ` Daniel Kahn Gillmor
2018-05-11 10:15     ` David Bremner
2018-05-11 15:41       ` Daniel Kahn Gillmor
2018-05-12 13:24         ` Tomi Ollila
2018-05-12 14:01           ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).