unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] WIP: add searching by body:
@ 2019-02-18 11:56 David Bremner
  2019-02-18 13:06 ` David Bremner
  2019-03-04  2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
  0 siblings, 2 replies; 16+ messages in thread
From: David Bremner @ 2019-02-18 11:56 UTC (permalink / raw)
  To: notmuch

---
this basically impliments a suggestion of Olly Betts on IRC.

I don't _think_ it requires indexing, since the new queries work on the old database.

In principle this should result in smaller indexes and somewhat faster
indexing, as it doesn't add terms twice anymore.

 lib/database.cc   |  6 ++++++
 lib/message.cc    | 10 +++++-----
 test/T730-body.sh | 28 ++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+), 5 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..27c2d042 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -259,6 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -302,6 +304,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -326,6 +330,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..64349f83 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1443,13 +1443,13 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	message->termpos = term_gen->get_termpos () + 100;
 
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+    } else {
+	term_gen->set_termpos (message->termpos);
+	term_gen->index_text (text);
+	/* Create a term gap, as above. */
+	message->termpos = term_gen->get_termpos () + 100;
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
-    message->termpos = term_gen->get_termpos () + 100;
-
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..8318f9af
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+
+test_done
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] WIP: add searching by body:
  2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
@ 2019-02-18 13:06 ` David Bremner
  2019-03-04  2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
  1 sibling, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-02-18 13:06 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> ---
> this basically impliments a suggestion of Olly Betts on IRC.
>
> I don't _think_ it requires indexing, since the new queries work on the old database.
>
> In principle this should result in smaller indexes and somewhat faster
> indexing, as it doesn't add terms twice anymore.

To clarify, this is not supposed to change the behaviour of unprefixed
terms in queries: they should still match occurances in headers.  So
"notmuch search foo" should still match a message with "Subject: foo"

d

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
  2019-02-18 13:06 ` David Bremner
@ 2019-03-04  2:29 ` David Bremner
  2019-03-05  1:26   ` Matt Armstrong
  1 sibling, 1 reply; 16+ messages in thread
From: David Bremner @ 2019-03-04  2:29 UTC (permalink / raw)
  To: David Bremner, notmuch

The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for negated 'body:' searches to work
correctly.
---
 doc/man7/notmuch-search-terms.rst |  5 +++-
 lib/database.cc                   |  6 +++++
 lib/message.cc                    | 10 +++----
 test/T730-body.sh                 | 43 +++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 6 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index f7a39ceb..fd8bf634 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see
 
    notmuch search 'from:"/bob@.*[.]example[.]com/"'
 
+body:<word-or-quoted-phrase>
+    Match terms in the body of messages.
+
 from:<name-or-address> or from:/<regex>/
     The **from:** prefix is used to match the name or address of
     the sender of an email message.
@@ -249,7 +252,7 @@ follows.
 Boolean
    **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-  **to:**, **attachment:**, **mimetype:**
+  **body:**, **to:**, **attachment:**, **mimetype:**
 Special
    **from:**, **query:**, **subject:**
 
diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..27c2d042 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -259,6 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -302,6 +304,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -326,6 +330,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..64349f83 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1443,13 +1443,13 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	message->termpos = term_gen->get_termpos () + 100;
 
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+    } else {
+	term_gen->set_termpos (message->termpos);
+	term_gen->index_text (text);
+	/* Create a term gap, as above. */
+	message->termpos = term_gen->get_termpos () + 100;
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
-    message->termpos = term_gen->get_termpos () + 100;
-
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..548b30a4
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'negated body: prefix'
+notmuch search thebody and not body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search unprefixed for prefixed term'
+notmuch search subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-04  2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
@ 2019-03-05  1:26   ` Matt Armstrong
  2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Armstrong @ 2019-03-05  1:26 UTC (permalink / raw)
  To: David Bremner, notmuch

David, interesting idea.  I'm not very familiar with this code or its
conventions so my feedback should be taken with that in mind.  More
below.


David Bremner <david@tethera.net> writes:

> diff --git a/lib/database.cc b/lib/database.cc
> index 9cf8062c..27c2d042 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -259,6 +259,8 @@ prefix_t prefix_table[] = {
>      { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
>      { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
>      { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
> +    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
> +						NOTMUCH_FIELD_PROBABILISTIC},
>      { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
>  						NOTMUCH_FIELD_PROCESSOR },
>      { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |

Above this new code in database.cc there is a comment describing the
schema.  E.g. "Mail document" describes id:, thread:, etc.  Add a
description of body: there?

Also, near those comments there is a double-space in the phrase
'uniquely identified by its "id" field' that you might fix while you're
nearby.


> diff --git a/lib/message.cc b/lib/message.cc
> index 6f2f6345..64349f83 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -1443,13 +1443,13 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
>  	message->termpos = term_gen->get_termpos () + 100;
>  
>  	_notmuch_message_invalidate_metadata (message, prefix_name);
> +    } else {
> +	term_gen->set_termpos (message->termpos);
> +	term_gen->index_text (text);
> +	/* Create a term gap, as above. */
> +	message->termpos = term_gen->get_termpos () + 100;
>      }
>  
> -    term_gen->set_termpos (message->termpos);
> -    term_gen->index_text (text);
> -    /* Create a term gap, as above. */
> -    message->termpos = term_gen->get_termpos () + 100;
> -
>      return NOTMUCH_PRIVATE_STATUS_SUCCESS;
>  }

Instead of the above I think I find what follows more clear.  This makes
it obvious which logic depends on the presence of a prefix and which
logic does not, which was a question I immediately had reading the code.

    term_gen->set_termpos(message->termpos);
    if (prefix_name) {
      term_gen->index_text (text, 1, _find_prefix (prefix_name));
    } else {
      term_gen->index_text (text);
    }
    /* Create a gap between this an the next terms so they don't appear to be a
     * phrase. */
    message->termpos = term_gen->get_termpos () + 100;
    if (prefix_name) {
      _notmuch_message_invalidate_metadata (message, prefix_name);
    }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* v2. add body: / drop double indexing of headers
  2019-03-05  1:26   ` Matt Armstrong
@ 2019-03-13  0:47     ` David Bremner
  2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
                         ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: David Bremner @ 2019-03-13  0:47 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, notmuch

It turns out the that the database schema comments Matt referred to
are pretty out of date, so the first 3 commits here are cleanup for
that commentary, and could go in independently.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] lib: drop comment about only indexing one file.
  2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
@ 2019-03-13  0:47       ` David Bremner
  2019-03-13  0:47       ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-03-13  0:47 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, notmuch

Although the situation is complicated by the value fields (which are
taken from a single file), this comment is now more false than true.
---
 lib/database.cc | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..fc42c4ba 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -66,11 +66,10 @@ typedef struct {
  * Mail document
  * -------------
  * A mail document is associated with a particular email message. It
- * is stored in one or more files on disk (though only one has its
- * content indexed) and is uniquely identified  by its "id" field
- * (which is generally the message ID). It is indexed with the
- * following prefixed terms which the database uses to construct
- * threads, etc.:
+ * is stored in one or more files on disk and is uniquely identified
+ * by its "id" field (which is generally the message ID). It is
+ * indexed with the following prefixed terms which the database uses
+ * to construct threads, etc.:
  *
  *    Single terms of given prefix:
  *
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs.
  2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
  2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
@ 2019-03-13  0:47       ` David Bremner
  2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
  2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
  3 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-03-13  0:47 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, notmuch

---
 lib/database.cc | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/database.cc b/lib/database.cc
index fc42c4ba..f33f0af6 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -63,6 +63,12 @@ typedef struct {
  * We currently have three different types of documents (mail, ghost,
  * and directory) and also some metadata.
  *
+ * There are two kinds of prefixes used in notmuch. There are the
+ * human friendly 'prefix names' like "thread:", which are also used
+ * in the query parser, and the actual prefix terms in the database
+ * (e.g. "G"). The correspondence is maintained in the file scope data
+ * structure 'prefix_table'.
+ *
  * Mail document
  * -------------
  * A mail document is associated with a particular email message. It
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] lib: update commentary about path/folder terms
  2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
  2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
  2019-03-13  0:47       ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
@ 2019-03-13  0:47       ` David Bremner
  2019-03-31 17:53         ` David Bremner
  2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
  3 siblings, 1 reply; 16+ messages in thread
From: David Bremner @ 2019-03-13  0:47 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, notmuch

We missed this when we changed to binary fields.
---
 lib/database.cc | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index f33f0af6..09ab9cb0 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -124,9 +124,11 @@ typedef struct {
  *
  * In addition, terms from the content of the message are added with
  * "from", "to", "attachment", and "subject" prefixes for use by the
- * user in searching. Similarly, terms from the path of the mail
- * message are added with "folder" and "path" prefixes. But the
- * database doesn't really care itself about any of these.
+ * user in searching.
+ *
+ * The path of the containing folder is added with the "folder" prefix
+ * (see _notmuch_message_add_folder_terms).  Sub-paths of the the path
+ * of the mail message are added with the "path" prefix.
  *
  * The data portion of a mail document is empty.
  *
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice.
  2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
                         ` (2 preceding siblings ...)
  2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
@ 2019-03-13  0:47       ` David Bremner
  2019-03-13  5:30         ` David Bremner
  3 siblings, 1 reply; 16+ messages in thread
From: David Bremner @ 2019-03-13  0:47 UTC (permalink / raw)
  To: Matt Armstrong, David Bremner, notmuch

The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for negated 'body:' searches to work
correctly.
---
 doc/man7/notmuch-search-terms.rst |  5 +++-
 lib/database.cc                   | 15 ++++++++---
 lib/message.cc                    | 22 +++++++---------
 test/T730-body.sh                 | 43 +++++++++++++++++++++++++++++++
 4 files changed, 68 insertions(+), 17 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index f7a39ceb..fd8bf634 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see
 
    notmuch search 'from:"/bob@.*[.]example[.]com/"'
 
+body:<word-or-quoted-phrase>
+    Match terms in the body of messages.
+
 from:<name-or-address> or from:/<regex>/
     The **from:** prefix is used to match the name or address of
     the sender of an email message.
@@ -249,7 +252,7 @@ follows.
 Boolean
    **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-  **to:**, **attachment:**, **mimetype:**
+  **body:**, **to:**, **attachment:**, **mimetype:**
 Special
    **from:**, **query:**, **subject:**
 
diff --git a/lib/database.cc b/lib/database.cc
index 09ab9cb0..50c0d233 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -122,9 +122,12 @@ typedef struct {
  *	LAST_MOD:	The revision number as of the last tag or
  *			filename change.
  *
- * In addition, terms from the content of the message are added with
- * "from", "to", "attachment", and "subject" prefixes for use by the
- * user in searching.
+ * The prefixed terms described above are also searchable without an
+ * explicit field name, but as of notmuch 0.29 this is due to
+ * query-parser setup, not extra terms in the database.  In addition,
+ * terms from the content of the message are added without a prefix
+ * for use by the user in searching. Note that the prefix name "body"
+ * is used to refer to the empty prefix string in the database.
  *
  * The path of the containing folder is added with the "folder" prefix
  * (see _notmuch_message_add_folder_terms).  Sub-paths of the the path
@@ -266,6 +269,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -309,6 +314,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -333,6 +340,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..38a48933 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1419,8 +1419,9 @@ _notmuch_message_add_term (notmuch_message_t *message,
 }
 
 /* Parse 'text' and add a term to 'message' for each parsed word. Each
- * term will be added both prefixed (if prefix_name is not NULL) and
- * also non-prefixed). */
+ * term will be added with the appropriate prefix if prefix_name is
+ * non-NULL.
+ */
 notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
@@ -1432,22 +1433,17 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
 
     term_gen->set_document (message->doc);
+    term_gen->set_termpos (message->termpos);
 
     if (prefix_name) {
-	const char *prefix = _find_prefix (prefix_name);
-
-	term_gen->set_termpos (message->termpos);
-	term_gen->index_text (text, 1, prefix);
-	/* Create a gap between this an the next terms so they don't
-	 * appear to be a phrase. */
-	message->termpos = term_gen->get_termpos () + 100;
-
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+	term_gen->index_text (text, 1, _find_prefix (prefix_name));
+    } else {
+	term_gen->index_text (text);
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
+    /* Create a gap between this an the next terms so they don't
+     * appear to be a phrase. */
     message->termpos = term_gen->get_termpos () + 100;
 
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..548b30a4
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'negated body: prefix'
+notmuch search thebody and not body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search unprefixed for prefixed term'
+notmuch search subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice.
  2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
@ 2019-03-13  5:30         ` David Bremner
  2019-03-13 11:44           ` [PATCH] " David Bremner
  0 siblings, 1 reply; 16+ messages in thread
From: David Bremner @ 2019-03-13  5:30 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

>
> Reindexing will be needed for negated 'body:' searches to work
> correctly.

I guess whether or not this needs a forced upgrade, there should still
be a database feature defined (see feature_names in database.cc)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-13  5:30         ` David Bremner
@ 2019-03-13 11:44           ` David Bremner
  2019-03-19  0:39             ` David Bremner
  0 siblings, 1 reply; 16+ messages in thread
From: David Bremner @ 2019-03-13 11:44 UTC (permalink / raw)
  To: David Bremner, notmuch

The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for negated 'body:' searches to work
correctly.
---
 doc/man7/notmuch-search-terms.rst |  5 +++-
 lib/database-private.h            |  6 +++++
 lib/database.cc                   | 20 +++++++++++---
 lib/message.cc                    | 22 +++++++---------
 test/T730-body.sh                 | 43 +++++++++++++++++++++++++++++++
 5 files changed, 79 insertions(+), 17 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index f7a39ceb..fd8bf634 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see
 
    notmuch search 'from:"/bob@.*[.]example[.]com/"'
 
+body:<word-or-quoted-phrase>
+    Match terms in the body of messages.
+
 from:<name-or-address> or from:/<regex>/
     The **from:** prefix is used to match the name or address of
     the sender of an email message.
@@ -249,7 +252,7 @@ follows.
 Boolean
    **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-  **to:**, **attachment:**, **mimetype:**
+  **body:**, **to:**, **attachment:**, **mimetype:**
 Special
    **from:**, **query:**, **subject:**
 
diff --git a/lib/database-private.h b/lib/database-private.h
index a499b259..293f2db4 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -108,6 +108,12 @@ enum _notmuch_features {
      *
      * Introduced: version 3. */
     NOTMUCH_FEATURE_LAST_MOD = 1 << 6,
+
+    /* If set, unprefixed terms are stored only for the message body,
+     * not for headers.
+     *
+     * Introduced: version 3. */
+    NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY = 1 << 7,
 };
 
 /* In C++, a named enum is its own type, so define bitwise operators
diff --git a/lib/database.cc b/lib/database.cc
index 09ab9cb0..508dc94c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -122,9 +122,12 @@ typedef struct {
  *	LAST_MOD:	The revision number as of the last tag or
  *			filename change.
  *
- * In addition, terms from the content of the message are added with
- * "from", "to", "attachment", and "subject" prefixes for use by the
- * user in searching.
+ * The prefixed terms described above are also searchable without an
+ * explicit field name, but as of notmuch 0.29 this is due to
+ * query-parser setup, not extra terms in the database.  In addition,
+ * terms from the content of the message are added without a prefix
+ * for use by the user in searching. Note that the prefix name "body"
+ * is used to refer to the empty prefix string in the database.
  *
  * The path of the containing folder is added with the "folder" prefix
  * (see _notmuch_message_add_folder_terms).  Sub-paths of the the path
@@ -266,6 +269,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -309,6 +314,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -333,6 +340,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
@@ -390,6 +399,10 @@ static const struct {
       "indexed MIME types", "w"},
     { NOTMUCH_FEATURE_LAST_MOD,
       "modification tracking", "w"},
+    /* Existing databases will work fine for all queries not involving
+     * 'body:' */
+    { NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY,
+      "index body and headers separately", "w"},
 };
 
 const char *
@@ -663,6 +676,7 @@ notmuch_database_create_verbose (const char *path,
      * new databases have them. */
     notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
     notmuch->features |= NOTMUCH_FEATURE_INDEXED_MIMETYPES;
+    notmuch->features |= NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY;
 
     status = notmuch_database_upgrade (notmuch, NULL, NULL);
     if (status) {
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..38a48933 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1419,8 +1419,9 @@ _notmuch_message_add_term (notmuch_message_t *message,
 }
 
 /* Parse 'text' and add a term to 'message' for each parsed word. Each
- * term will be added both prefixed (if prefix_name is not NULL) and
- * also non-prefixed). */
+ * term will be added with the appropriate prefix if prefix_name is
+ * non-NULL.
+ */
 notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
@@ -1432,22 +1433,17 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
 
     term_gen->set_document (message->doc);
+    term_gen->set_termpos (message->termpos);
 
     if (prefix_name) {
-	const char *prefix = _find_prefix (prefix_name);
-
-	term_gen->set_termpos (message->termpos);
-	term_gen->index_text (text, 1, prefix);
-	/* Create a gap between this an the next terms so they don't
-	 * appear to be a phrase. */
-	message->termpos = term_gen->get_termpos () + 100;
-
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+	term_gen->index_text (text, 1, _find_prefix (prefix_name));
+    } else {
+	term_gen->index_text (text);
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
+    /* Create a gap between this an the next terms so they don't
+     * appear to be a phrase. */
     message->termpos = term_gen->get_termpos () + 100;
 
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..548b30a4
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'negated body: prefix'
+notmuch search thebody and not body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search unprefixed for prefixed term'
+notmuch search subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-13 11:44           ` [PATCH] " David Bremner
@ 2019-03-19  0:39             ` David Bremner
  2019-03-29 13:17               ` David Bremner
                                 ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: David Bremner @ 2019-03-19  0:39 UTC (permalink / raw)
  To: David Bremner, notmuch

The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for 'body:' searches to work correctly;
otherwise they will also match messages where the term occur in
headers (demonstrated by the new tests in T530-upgrade.sh)
---

Compared to the previous version this adds a couple more tests and
clarifies the commit message comment about reindexing.

 doc/man7/notmuch-search-terms.rst |  5 +++-
 lib/database-private.h            |  6 +++++
 lib/database.cc                   | 20 +++++++++++---
 lib/message.cc                    | 22 +++++++---------
 test/T530-upgrade.sh              | 16 ++++++++++++
 test/T730-body.sh                 | 43 +++++++++++++++++++++++++++++++
 6 files changed, 95 insertions(+), 17 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index f7a39ceb..fd8bf634 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see
 
    notmuch search 'from:"/bob@.*[.]example[.]com/"'
 
+body:<word-or-quoted-phrase>
+    Match terms in the body of messages.
+
 from:<name-or-address> or from:/<regex>/
     The **from:** prefix is used to match the name or address of
     the sender of an email message.
@@ -249,7 +252,7 @@ follows.
 Boolean
    **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-  **to:**, **attachment:**, **mimetype:**
+  **body:**, **to:**, **attachment:**, **mimetype:**
 Special
    **from:**, **query:**, **subject:**
 
diff --git a/lib/database-private.h b/lib/database-private.h
index a499b259..293f2db4 100644
--- a/lib/database-private.h
+++ b/lib/database-private.h
@@ -108,6 +108,12 @@ enum _notmuch_features {
      *
      * Introduced: version 3. */
     NOTMUCH_FEATURE_LAST_MOD = 1 << 6,
+
+    /* If set, unprefixed terms are stored only for the message body,
+     * not for headers.
+     *
+     * Introduced: version 3. */
+    NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY = 1 << 7,
 };
 
 /* In C++, a named enum is its own type, so define bitwise operators
diff --git a/lib/database.cc b/lib/database.cc
index 09ab9cb0..508dc94c 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -122,9 +122,12 @@ typedef struct {
  *	LAST_MOD:	The revision number as of the last tag or
  *			filename change.
  *
- * In addition, terms from the content of the message are added with
- * "from", "to", "attachment", and "subject" prefixes for use by the
- * user in searching.
+ * The prefixed terms described above are also searchable without an
+ * explicit field name, but as of notmuch 0.29 this is due to
+ * query-parser setup, not extra terms in the database.  In addition,
+ * terms from the content of the message are added without a prefix
+ * for use by the user in searching. Note that the prefix name "body"
+ * is used to refer to the empty prefix string in the database.
  *
  * The path of the containing folder is added with the "folder" prefix
  * (see _notmuch_message_add_folder_terms).  Sub-paths of the the path
@@ -266,6 +269,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -309,6 +314,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -333,6 +340,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
@@ -390,6 +399,10 @@ static const struct {
       "indexed MIME types", "w"},
     { NOTMUCH_FEATURE_LAST_MOD,
       "modification tracking", "w"},
+    /* Existing databases will work fine for all queries not involving
+     * 'body:' */
+    { NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY,
+      "index body and headers separately", "w"},
 };
 
 const char *
@@ -663,6 +676,7 @@ notmuch_database_create_verbose (const char *path,
      * new databases have them. */
     notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
     notmuch->features |= NOTMUCH_FEATURE_INDEXED_MIMETYPES;
+    notmuch->features |= NOTMUCH_FEATURE_UNPREFIX_BODY_ONLY;
 
     status = notmuch_database_upgrade (notmuch, NULL, NULL);
     if (status) {
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..38a48933 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1419,8 +1419,9 @@ _notmuch_message_add_term (notmuch_message_t *message,
 }
 
 /* Parse 'text' and add a term to 'message' for each parsed word. Each
- * term will be added both prefixed (if prefix_name is not NULL) and
- * also non-prefixed). */
+ * term will be added with the appropriate prefix if prefix_name is
+ * non-NULL.
+ */
 notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
@@ -1432,22 +1433,17 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
 
     term_gen->set_document (message->doc);
+    term_gen->set_termpos (message->termpos);
 
     if (prefix_name) {
-	const char *prefix = _find_prefix (prefix_name);
-
-	term_gen->set_termpos (message->termpos);
-	term_gen->index_text (text, 1, prefix);
-	/* Create a gap between this an the next terms so they don't
-	 * appear to be a phrase. */
-	message->termpos = term_gen->get_termpos () + 100;
-
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+	term_gen->index_text (text, 1, _find_prefix (prefix_name));
+    } else {
+	term_gen->index_text (text);
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
+    /* Create a gap between this an the next terms so they don't
+     * appear to be a phrase. */
     message->termpos = term_gen->get_termpos () + 100;
 
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
diff --git a/test/T530-upgrade.sh b/test/T530-upgrade.sh
index 69ebec68..2124dde2 100755
--- a/test/T530-upgrade.sh
+++ b/test/T530-upgrade.sh
@@ -117,4 +117,20 @@ MAIL_DIR/bar/new/21:2,
 MAIL_DIR/bar/new/22:2,
 MAIL_DIR/cur/51:2,"
 
+test_begin_subtest "body: same as unprefixed before reindex"
+notmuch search --output=messages body:close > OUTPUT
+notmuch search --output=messages close  > EXPECTED
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest "body: subset of unprefixed after reindex"
+notmuch reindex '*'
+notmuch search --output=messages body:close | sort > BODY
+notmuch search --output=messages close | sort > UNPREFIXED
+diff -e UNPREFIXED BODY | cut -c2- > OUTPUT
+cat <<EOF > EXPECTED
+d
+d
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..548b30a4
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'negated body: prefix'
+notmuch search thebody and not body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search unprefixed for prefixed term'
+notmuch search subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-19  0:39             ` David Bremner
@ 2019-03-29 13:17               ` David Bremner
  2019-04-14 11:32               ` David Bremner
  2019-04-17 11:55               ` David Bremner
  2 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-03-29 13:17 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> This follows a suggestion of Olly Betts to use the facility (since
> Xapian 1.0.4) to add the same field with multiple prefixes. The double
> indexing of previous versions is thus replaced with a query time
> expension of unprefixed query terms to the various prefixed
> equivalent.

This patch leads to approximately a 10% decrease in database size on our performance
suite (2.1G -> 1.9G) before compaction.  After compaction, old / new is
1.4G -> 1.3G

With the caveat that the benchmark machine was not completely idle, it
also leads to a roughly 10% speedup.

Existing indexing:

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   565.17	534.82	28.22	474632	0/13854576
  notmuch new #2        0.03	0.00	0.00	9512	0/160
  notmuch new #3        0.00	0.00	0.00	9368	0/8
  notmuch new #4        0.00	0.00	0.00	9412	0/8
  notmuch new #5        0.00	0.00	0.00	9384	0/8
  notmuch new #6        0.00	0.00	0.00	9388	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       16.25	2.65	3.05	12668	104/40104
  dump *                3.90	3.79	0.10	26048	0/27928
  restore *             4.51	4.10	0.41	9564	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        374.69	197.56	169.55	118644	0/1818656
  tag * +existing_tag   0.00	0.00	0.00	9232	0/0
  tag * -existing_tag   318.47	151.46	164.56	36260	0/1819584
  tag * -missing_tag    0.00	0.00	0.00	9336	0/0

T03-reindex.sh: Testing tagging                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  reindex *             688.27	488.02	197.59	11142680	0/4908120
  reindex *             648.04	456.06	191.78	11139124	0/2696120
  reindex *             650.70	459.08	191.48	11139088	0/2696680

T04-thread-subquery.sh: Testing thread subqueries       [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  search thread:{} ...  2.45	2.29	0.15	94696	0/144
  search thread:{} ...  2.43	2.23	0.20	94228	0/144
  search thread:{} ...  2.46	2.26	0.20	94224	0/144

With new indexing:

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   494.31	466.96	24.28	447428	0/12093344
  notmuch new #2        0.03	0.00	0.00	9356	0/144
  notmuch new #3        0.01	0.01	0.00	9420	0/8
  notmuch new #4        0.00	0.00	0.00	9388	0/8
  notmuch new #5        0.00	0.00	0.00	9416	0/8
  notmuch new #6        0.01	0.00	0.01	9424	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       14.21	2.41	2.71	12664	0/38952
  dump *                3.70	3.57	0.12	26092	0/27928
  restore *             4.19	3.78	0.41	9412	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        353.31	183.89	161.49	111244	0/1693872
  tag * +existing_tag   0.00	0.00	0.00	9316	0/0
  tag * -existing_tag   284.07	137.15	144.33	36712	0/1659200
  tag * -missing_tag    0.00	0.00	0.00	9240	0/0

T03-reindex.sh: Testing tagging                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  reindex *             640.19	431.23	196.99	10214564	1510/4504024
  reindex *             611.46	412.37	193.07	10211852	1056/2557688
  reindex *             612.95	415.40	194.97	10211848	0/2555032

T04-thread-subquery.sh: Testing thread subqueries       [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  search thread:{} ...  2.34	2.12	0.21	96452	0/144
  search thread:{} ...  2.35	2.17	0.18	96208	0/144
  search thread:{} ...  2.33	2.08	0.25	94740	0/144

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] lib: update commentary about path/folder terms
  2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
@ 2019-03-31 17:53         ` David Bremner
  0 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-03-31 17:53 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> We missed this when we changed to binary fields.

pushed these 3 documentation patches.

d

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-19  0:39             ` David Bremner
  2019-03-29 13:17               ` David Bremner
@ 2019-04-14 11:32               ` David Bremner
  2019-04-17 11:55               ` David Bremner
  2 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-04-14 11:32 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> The new `body:` field (in Xapian terms) or prefix (in slightly
> sloppier notmuch) terms allows matching terms that occur only in the
> body.
>
> Unprefixed query terms should continue to match anywhere (header or
> body) in the message.

Last call for review. I'll push this change sometime in the next week
unless I am convinced otherwise.

d

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
  2019-03-19  0:39             ` David Bremner
  2019-03-29 13:17               ` David Bremner
  2019-04-14 11:32               ` David Bremner
@ 2019-04-17 11:55               ` David Bremner
  2 siblings, 0 replies; 16+ messages in thread
From: David Bremner @ 2019-04-17 11:55 UTC (permalink / raw)
  To: notmuch

David Bremner <david@tethera.net> writes:

> The new `body:` field (in Xapian terms) or prefix (in slightly
> sloppier notmuch) terms allows matching terms that occur only in the
> body.

as promised / threatened, pushed to master.

d

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-04-17 11:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
2019-02-18 13:06 ` David Bremner
2019-03-04  2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-05  1:26   ` Matt Armstrong
2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
2019-03-13  0:47       ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
2019-03-31 17:53         ` David Bremner
2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-13  5:30         ` David Bremner
2019-03-13 11:44           ` [PATCH] " David Bremner
2019-03-19  0:39             ` David Bremner
2019-03-29 13:17               ` David Bremner
2019-04-14 11:32               ` David Bremner
2019-04-17 11:55               ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).