unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] lib: add support for thread:<message-id> queries
@ 2017-10-16 19:07 Jani Nikula
  2017-10-17 18:33 ` Daniel Kahn Gillmor
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jani Nikula @ 2017-10-16 19:07 UTC (permalink / raw)
  To: notmuch

Add support for querying threads using message-ids in addition to
thread-ids. The main benefit is that thread queries via message-ids
are portable across databases, re-indexing, and thread joining, while
thread ids can be somewhat transient. A thread:<message-id> query can
be shared between notmuch users, while a thread:<thread-id> query may
cease to work after the regular delivery of a message that joins
threads.

What previously required:

$ notmuch search $(notmuch search --output=threads id:<message-id>)

can now be reduced to:

$ notmuch search thread:<message-id>

We limit the query to message-ids that have @ to optimize regular
thread-id queries and to avoid collisions with thread-ids which are
guaranteed (or which we can guarantee) to not contain @. This seems
reasonable, as valid message-ids will have @.

The performance penalty for regular thread-id queries seems to be
neglible, and thread queries via message-ids seem to be about 10%
slower than the direct thread-id query counterpart.
---
 doc/man7/notmuch-search-terms.rst |  9 ++++---
 lib/Makefile.local                |  1 +
 lib/database.cc                   |  6 ++++-
 lib/thread-fp.cc                  | 53 +++++++++++++++++++++++++++++++++++++++
 lib/thread-fp.h                   | 39 ++++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 4 deletions(-)
 create mode 100644 lib/thread-fp.cc
 create mode 100644 lib/thread-fp.h

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index b27f31f7545c..f3723ceb959b 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -52,7 +52,7 @@ indicate user-supplied values):
 
 -  id:<message-id>
 
--  thread:<thread-id>
+-  thread:<message-id> or thread:<thread-id>
 
 -  folder:<maildir-folder>
 
@@ -101,10 +101,13 @@ For **id:**, message ID values are the literal contents of the
 Message-ID: header of email messages, but without the '<', '>'
 delimiters.
 
-The **thread:** prefix can be used with the thread ID values that are
+The **thread:** prefix can be used with either the message ID of any
+of the messages in the thread or the thread ID values that are
 generated internally by notmuch (and do not appear in email messages).
 These thread ID values can be seen in the first column of output from
-**notmuch search**
+**notmuch search**. The thread ID values are transient. Thread queries
+by message ID are only available if notmuch is built with **Xapian
+Field Processors** (see below).
 
 The **path:** prefix searches for email messages that are in
 particular directories within the mail store. The directory must be
diff --git a/lib/Makefile.local b/lib/Makefile.local
index 8aa03891d775..123ef04c64a9 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -58,6 +58,7 @@ libnotmuch_cxx_srcs =		\
 	$(dir)/query-fp.cc      \
 	$(dir)/config.cc	\
 	$(dir)/regexp-fields.cc	\
+	$(dir)/thread-fp.cc	\
 	$(dir)/thread.cc
 
 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
diff --git a/lib/database.cc b/lib/database.cc
index 35c66939bdcf..82d2dcf22aa0 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -21,6 +21,7 @@
 #include "database-private.h"
 #include "parse-time-vrp.h"
 #include "query-fp.h"
+#include "thread-fp.h"
 #include "regexp-fields.h"
 #include "string-util.h"
 
@@ -258,7 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
-    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL },
+    { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "is",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -317,6 +319,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 	    fp = (new DateFieldProcessor())->release ();
 	else if (STRNCMP_LITERAL(prefix->name, "query") == 0)
 	    fp = (new QueryFieldProcessor (*notmuch->query_parser, notmuch))->release ();
+	else if (STRNCMP_LITERAL(prefix->name, "thread") == 0)
+	    fp = (new ThreadFieldProcessor (notmuch))->release ();
 	else
 	    fp = (new RegexpFieldProcessor (prefix->name, prefix->flags,
 					    *notmuch->query_parser, notmuch))->release ();
diff --git a/lib/thread-fp.cc b/lib/thread-fp.cc
new file mode 100644
index 000000000000..f15aabba18c0
--- /dev/null
+++ b/lib/thread-fp.cc
@@ -0,0 +1,53 @@
+/* thread-fp.cc - "thread:" field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ */
+
+#include "database-private.h"
+#include "thread-fp.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+Xapian::Query
+ThreadFieldProcessor::operator() (const std::string & thread)
+{
+    std::string term_prefix = _find_prefix ("thread");
+    std::string thread_id;
+    notmuch_message_t *message = NULL;
+
+    /*
+     * Only support thread queries via message-ids that have @. The
+     * reason for this is twofold: First, it's an optimization for
+     * regular thread-id queries. Second, and more importantly, it
+     * prevents (potentially malicious) message-id and thread-id
+     * collisions. Regular thread-ids are guaranteed to not have @ in
+     * them, and we only lose the ability to query threads by invalid
+     * message-ids that don't have @ in them.
+     */
+    if (strchr (thread.c_str (), '@'))
+	notmuch_database_find_message (notmuch, thread.c_str (), &message);
+
+    if (message) {
+	thread_id = notmuch_message_get_thread_id (message);
+	notmuch_message_destroy (message);
+    } else {
+	thread_id = thread;
+    }
+
+    return Xapian::Query (term_prefix + thread_id);
+}
+#endif
diff --git a/lib/thread-fp.h b/lib/thread-fp.h
new file mode 100644
index 000000000000..3f88a5831b6b
--- /dev/null
+++ b/lib/thread-fp.h
@@ -0,0 +1,39 @@
+/* thread-fp.h - thread field processor glue
+ *
+ * This file is part of notmuch.
+ *
+ * Copyright © 2017 Jani Nikula
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see https://www.gnu.org/licenses/ .
+ */
+
+#ifndef NOTMUCH_THREAD_FP_H
+#define NOTMUCH_THREAD_FP_H
+
+#include <xapian.h>
+#include "notmuch.h"
+
+#if HAVE_XAPIAN_FIELD_PROCESSOR
+class ThreadFieldProcessor : public Xapian::FieldProcessor {
+ protected:
+    notmuch_database_t *notmuch;
+
+ public:
+    ThreadFieldProcessor (notmuch_database_t *notmuch_)
+	: notmuch(notmuch_) { };
+
+    Xapian::Query operator()(const std::string & str);
+};
+#endif
+#endif /* NOTMUCH_THREAD_FP_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-16 19:07 [PATCH] lib: add support for thread:<message-id> queries Jani Nikula
@ 2017-10-17 18:33 ` Daniel Kahn Gillmor
  2017-10-18  2:00 ` David Bremner
  2017-10-20 10:20 ` Mark Walters
  2 siblings, 0 replies; 8+ messages in thread
From: Daniel Kahn Gillmor @ 2017-10-17 18:33 UTC (permalink / raw)
  To: Jani Nikula, notmuch

[-- Attachment #1: Type: text/plain, Size: 1686 bytes --]

On Mon 2017-10-16 22:07:54 +0300, Jani Nikula wrote:
> Add support for querying threads using message-ids in addition to
> thread-ids. The main benefit is that thread queries via message-ids
> are portable across databases, re-indexing, and thread joining, while
> thread ids can be somewhat transient. A thread:<message-id> query can
> be shared between notmuch users, while a thread:<thread-id> query may
> cease to work after the regular delivery of a message that joins
> threads.
>
> What previously required:
>
> $ notmuch search $(notmuch search --output=threads id:<message-id>)
>
> can now be reduced to:
>
> $ notmuch search thread:<message-id>
>
> We limit the query to message-ids that have @ to optimize regular
> thread-id queries and to avoid collisions with thread-ids which are
> guaranteed (or which we can guarantee) to not contain @. This seems
> reasonable, as valid message-ids will have @.
>
> The performance penalty for regular thread-id queries seems to be
> neglible, and thread queries via message-ids seem to be about 10%
> slower than the direct thread-id query counterpart.

The rationale for this patch really good.  This is functionality we
should make available directly without the shell substitution
shenanigans described above, and i like the elegance of determining
whether it's a message-id or a thread id based on the presence of an '@'
symbol.

i have read the patch and it seems reasonable to me, though i'm not
particularly confident in my ability to catch any subtle bugs in logic
about manipulating Xapian.

I think we should adopt this proposal, though i welcome additional
review from people with more Xapian-fu than myself.

       --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-16 19:07 [PATCH] lib: add support for thread:<message-id> queries Jani Nikula
  2017-10-17 18:33 ` Daniel Kahn Gillmor
@ 2017-10-18  2:00 ` David Bremner
  2017-10-21 11:16   ` Jani Nikula
  2017-10-20 10:20 ` Mark Walters
  2 siblings, 1 reply; 8+ messages in thread
From: David Bremner @ 2017-10-18  2:00 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:


> +    if (strchr (thread.c_str (), '@'))
> +	notmuch_database_find_message (notmuch, thread.c_str (), &message);
> +
> +    if (message) {
> +	thread_id = notmuch_message_get_thread_id (message);
> +	notmuch_message_destroy (message);
> +    } else {
> +	thread_id = thread;
> +    }
> +
> +    return Xapian::Query (term_prefix + thread_id);
> +}

The other field processors throw Xapian::QueryParserErrors in case bad
things happen (e.g. failure to read from the database). This seems to be
better than nothing as it allows transmitting some detail to the
user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
of course, speaking of tests, this should have one.

d

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-16 19:07 [PATCH] lib: add support for thread:<message-id> queries Jani Nikula
  2017-10-17 18:33 ` Daniel Kahn Gillmor
  2017-10-18  2:00 ` David Bremner
@ 2017-10-20 10:20 ` Mark Walters
  2017-10-20 10:35   ` David Bremner
  2 siblings, 1 reply; 8+ messages in thread
From: Mark Walters @ 2017-10-20 10:20 UTC (permalink / raw)
  To: Jani Nikula, notmuch


Hi

On Mon, 16 Oct 2017, Jani Nikula <jani@nikula.org> wrote:
> Add support for querying threads using message-ids in addition to
> thread-ids. The main benefit is that thread queries via message-ids
> are portable across databases, re-indexing, and thread joining, while
> thread ids can be somewhat transient. A thread:<message-id> query can
> be shared between notmuch users, while a thread:<thread-id> query may
> cease to work after the regular delivery of a message that joins
> threads.

I like the idea.

Just one thought -- is any form of recursion allowed in this sort of
operator so could we have thread:'some query' which would expand to all
threads matching the query? I am guessing this may not be possible now
but would it be worth requiring the thread form to be
thread:id:<message id> rather than thread:<message id>?

Best wishes

Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-20 10:20 ` Mark Walters
@ 2017-10-20 10:35   ` David Bremner
  0 siblings, 0 replies; 8+ messages in thread
From: David Bremner @ 2017-10-20 10:35 UTC (permalink / raw)
  To: Mark Walters, Jani Nikula, notmuch

Mark Walters <markwalters1009@gmail.com> writes:

> Hi
>
> On Mon, 16 Oct 2017, Jani Nikula <jani@nikula.org> wrote:
>> Add support for querying threads using message-ids in addition to
>> thread-ids. The main benefit is that thread queries via message-ids
>> are portable across databases, re-indexing, and thread joining, while
>> thread ids can be somewhat transient. A thread:<message-id> query can
>> be shared between notmuch users, while a thread:<thread-id> query may
>> cease to work after the regular delivery of a message that joins
>> threads.
>
> I like the idea.
>
> Just one thought -- is any form of recursion allowed in this sort of
> operator so could we have thread:'some query' which would expand to all
> threads matching the query? I am guessing this may not be possible now
> but would it be worth requiring the thread form to be
> thread:id:<message id> rather than thread:<message id>?

In id:20170820213240.20526-1-david@tethera.net I used thread:{} to
introduce an arbitrary query. That syntax seems ok to me, and doesn't
conflict with Jani's patch. I think Jani mentioned at the time that
thread:{id:foo} was a too much typing.

d

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-18  2:00 ` David Bremner
@ 2017-10-21 11:16   ` Jani Nikula
  2017-10-21 11:53     ` David Bremner
  0 siblings, 1 reply; 8+ messages in thread
From: Jani Nikula @ 2017-10-21 11:16 UTC (permalink / raw)
  To: David Bremner, notmuch

On Tue, 17 Oct 2017, David Bremner <david@tethera.net> wrote:
> Jani Nikula <jani@nikula.org> writes:
>
>
>> +    if (strchr (thread.c_str (), '@'))
>> +	notmuch_database_find_message (notmuch, thread.c_str (), &message);
>> +
>> +    if (message) {
>> +	thread_id = notmuch_message_get_thread_id (message);
>> +	notmuch_message_destroy (message);
>> +    } else {
>> +	thread_id = thread;
>> +    }
>> +
>> +    return Xapian::Query (term_prefix + thread_id);
>> +}
>
> The other field processors throw Xapian::QueryParserErrors in case bad
> things happen (e.g. failure to read from the database). This seems to be
> better than nothing as it allows transmitting some detail to the
> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
> of course, speaking of tests, this should have one.

I'm not sure what you're suggesting. Do you mean I should open-code
notmuch_database_find_message() or notmuch_message_get_thread_id() to
get at the exceptions thrown within them?

BR,
Jani.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-21 11:16   ` Jani Nikula
@ 2017-10-21 11:53     ` David Bremner
  2017-10-21 12:07       ` Jani Nikula
  0 siblings, 1 reply; 8+ messages in thread
From: David Bremner @ 2017-10-21 11:53 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> On Tue, 17 Oct 2017, David Bremner <david@tethera.net> wrote:
>> Jani Nikula <jani@nikula.org> writes:
>>
>> The other field processors throw Xapian::QueryParserErrors in case bad
>> things happen (e.g. failure to read from the database). This seems to be
>> better than nothing as it allows transmitting some detail to the
>> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
>> of course, speaking of tests, this should have one.
>
> I'm not sure what you're suggesting. Do you mean I should open-code
> notmuch_database_find_message() or notmuch_message_get_thread_id() to
> get at the exceptions thrown within them?
>
> BR,
> Jani.

nothing so fancy. Consider the following bit of query-fp.cc

    status = notmuch_database_get_config (notmuch, key.c_str (), &expansion);
    if (status) {
	throw Xapian::QueryParserError ("error looking up key" + name);
    }


That could no doubt be improved to report the specifics of `status`, but
it's already better than silently ignoring the error, as this will
eventually be reported back to library clients. I agree the exception ->
error-code -> exception -> error-code  path is pretty gross, but thats
the price we pay for mostly pretending we're not writing C++.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib: add support for thread:<message-id> queries
  2017-10-21 11:53     ` David Bremner
@ 2017-10-21 12:07       ` Jani Nikula
  0 siblings, 0 replies; 8+ messages in thread
From: Jani Nikula @ 2017-10-21 12:07 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sat, 21 Oct 2017, David Bremner <david@tethera.net> wrote:
> Jani Nikula <jani@nikula.org> writes:
>
>> On Tue, 17 Oct 2017, David Bremner <david@tethera.net> wrote:
>>> Jani Nikula <jani@nikula.org> writes:
>>>
>>> The other field processors throw Xapian::QueryParserErrors in case bad
>>> things happen (e.g. failure to read from the database). This seems to be
>>> better than nothing as it allows transmitting some detail to the
>>> user. See e.g. "regex error reporting" in test/T650-regexp-query.sh. And
>>> of course, speaking of tests, this should have one.
>>
>> I'm not sure what you're suggesting. Do you mean I should open-code
>> notmuch_database_find_message() or notmuch_message_get_thread_id() to
>> get at the exceptions thrown within them?
>>
>> BR,
>> Jani.
>
> nothing so fancy. Consider the following bit of query-fp.cc
>
>     status = notmuch_database_get_config (notmuch, key.c_str (), &expansion);
>     if (status) {
> 	throw Xapian::QueryParserError ("error looking up key" + name);
>     }
>
>
> That could no doubt be improved to report the specifics of `status`, but
> it's already better than silently ignoring the error, as this will
> eventually be reported back to library clients. I agree the exception ->
> error-code -> exception -> error-code  path is pretty gross, but thats
> the price we pay for mostly pretending we're not writing C++.

Okay. Seems like we are in agreement we want this feature, so I'll fix
that up and write some tests when I have a moment.

Thanks,
Jani.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-21 12:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-16 19:07 [PATCH] lib: add support for thread:<message-id> queries Jani Nikula
2017-10-17 18:33 ` Daniel Kahn Gillmor
2017-10-18  2:00 ` David Bremner
2017-10-21 11:16   ` Jani Nikula
2017-10-21 11:53     ` David Bremner
2017-10-21 12:07       ` Jani Nikula
2017-10-20 10:20 ` Mark Walters
2017-10-20 10:35   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).