unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Subject: [PATCH] lib: add 'body:' field, stop indexing headers twice.
Date: Sun,  3 Mar 2019 22:29:12 -0400	[thread overview]
Message-ID: <20190304022912.13924-1-david@tethera.net> (raw)
In-Reply-To: <20190218115622.31466-1-david@tethera.net>

The new `body:` field (in Xapian terms) or prefix (in slightly
sloppier notmuch) terms allows matching terms that occur only in the
body.

Unprefixed query terms should continue to match anywhere (header or
body) in the message.

This follows a suggestion of Olly Betts to use the facility (since
Xapian 1.0.4) to add the same field with multiple prefixes. The double
indexing of previous versions is thus replaced with a query time
expension of unprefixed query terms to the various prefixed
equivalent.

Reindexing will be needed for negated 'body:' searches to work
correctly.
---
 doc/man7/notmuch-search-terms.rst |  5 +++-
 lib/database.cc                   |  6 +++++
 lib/message.cc                    | 10 +++----
 test/T730-body.sh                 | 43 +++++++++++++++++++++++++++++++
 4 files changed, 58 insertions(+), 6 deletions(-)
 create mode 100755 test/T730-body.sh

diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst
index f7a39ceb..fd8bf634 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -44,6 +44,9 @@ results to those whose value matches a regular expression (see
 
    notmuch search 'from:"/bob@.*[.]example[.]com/"'
 
+body:<word-or-quoted-phrase>
+    Match terms in the body of messages.
+
 from:<name-or-address> or from:/<regex>/
     The **from:** prefix is used to match the name or address of
     the sender of an email message.
@@ -249,7 +252,7 @@ follows.
 Boolean
    **tag:**, **id:**, **thread:**, **folder:**, **path:**, **property:**
 Probabilistic
-  **to:**, **attachment:**, **mimetype:**
+  **body:**, **to:**, **attachment:**, **mimetype:**
 Special
    **from:**, **query:**, **subject:**
 
diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..27c2d042 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -259,6 +259,8 @@ prefix_t prefix_table[] = {
     { "directory",		"XDIRECTORY",	NOTMUCH_FIELD_NO_FLAGS },
     { "file-direntry",		"XFDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
     { "directory-direntry",	"XDDIRENTRY",	NOTMUCH_FIELD_NO_FLAGS },
+    { "body",			"",		NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC},
     { "thread",			"G",		NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROCESSOR },
     { "tag",			"K",		NOTMUCH_FIELD_EXTERNAL |
@@ -302,6 +304,8 @@ prefix_t prefix_table[] = {
 static void
 _setup_query_field_default (const prefix_t *prefix, notmuch_database_t *notmuch)
 {
+    if (prefix->prefix)
+	notmuch->query_parser->add_prefix("",prefix->prefix);
     if (prefix->flags & NOTMUCH_FIELD_PROBABILISTIC)
 	notmuch->query_parser->add_prefix (prefix->name, prefix->prefix);
     else
@@ -326,6 +330,8 @@ _setup_query_field (const prefix_t *prefix, notmuch_database_t *notmuch)
 					    *notmuch->query_parser, notmuch))->release ();
 
 	/* we treat all field-processor fields as boolean in order to get the raw input */
+	if (prefix->prefix)
+	    notmuch->query_parser->add_prefix("",prefix->prefix);
 	notmuch->query_parser->add_boolean_prefix (prefix->name, fp);
     } else {
 	_setup_query_field_default (prefix, notmuch);
diff --git a/lib/message.cc b/lib/message.cc
index 6f2f6345..64349f83 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1443,13 +1443,13 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
 	message->termpos = term_gen->get_termpos () + 100;
 
 	_notmuch_message_invalidate_metadata (message, prefix_name);
+    } else {
+	term_gen->set_termpos (message->termpos);
+	term_gen->index_text (text);
+	/* Create a term gap, as above. */
+	message->termpos = term_gen->get_termpos () + 100;
     }
 
-    term_gen->set_termpos (message->termpos);
-    term_gen->index_text (text);
-    /* Create a term gap, as above. */
-    message->termpos = term_gen->get_termpos () + 100;
-
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
diff --git a/test/T730-body.sh b/test/T730-body.sh
new file mode 100755
index 00000000..548b30a4
--- /dev/null
+++ b/test/T730-body.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+test_description='search body'
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_message "[body]=thebody-1" "[subject]=subject-1"
+add_message "[body]=nothing-to-see-here-1" "[subject]=thebody-1"
+
+test_begin_subtest 'search with body: prefix'
+notmuch search body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search without body: prefix'
+notmuch search thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'negated body: prefix'
+notmuch search thebody and not body:thebody | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; thebody-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search unprefixed for prefixed term'
+notmuch search subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+thread:XXX   2001-01-05 [1/1] Notmuch Test Suite; subject-1 (inbox unread)
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'search with body: prefix for term only in subject'
+notmuch search body:subject | notmuch_search_sanitize > OUTPUT
+cat <<EOF > EXPECTED
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.20.1

  parent reply	other threads:[~2019-03-04  2:29 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
2019-02-18 13:06 ` David Bremner
2019-03-04  2:29 ` David Bremner [this message]
2019-03-05  1:26   ` [PATCH] lib: add 'body:' field, stop indexing headers twice Matt Armstrong
2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
2019-03-13  0:47       ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
2019-03-31 17:53         ` David Bremner
2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-13  5:30         ` David Bremner
2019-03-13 11:44           ` [PATCH] " David Bremner
2019-03-19  0:39             ` David Bremner
2019-03-29 13:17               ` David Bremner
2019-04-14 11:32               ` David Bremner
2019-04-17 11:55               ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190304022912.13924-1-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).