unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH RFC] index: add body: search query term
@ 2018-10-10  5:53 William Casarin
  2018-10-10 10:43 ` David Bremner
  0 siblings, 1 reply; 4+ messages in thread
From: William Casarin @ 2018-10-10  5:53 UTC (permalink / raw)
  To: notmuch

This adds the ability to search specifically on the body

eg.

    notmuch search tag:notmuch and body:PATCH

Signed-off-by: William Casarin <jb55@jb55.com>
---

Hey there,

I'm looking to add the ability to search specifically on the body. I
was poking around in the indexer, added these lines and reindexed a
few tags. It appears to work!

I was just wondering if there's anything I'm missing? That seemed a
bit too easy. I noticed there are some NOTMUCH_FIELDS that I'm not
sure what they do.

If anyone has any xapian knowledge that could shine some insight into
what the next steps might be, if any.

Thanks!
Will


 lib/database.cc | 3 +++
 lib/index.cc    | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 9cf8062c..0b085b21 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
     { "subject",		"XSUBJECT",	NOTMUCH_FIELD_EXTERNAL |
 						NOTMUCH_FIELD_PROBABILISTIC |
 						NOTMUCH_FIELD_PROCESSOR},
+    { "body",			"XBODY",	NOTMUCH_FIELD_EXTERNAL |
+						NOTMUCH_FIELD_PROBABILISTIC |
+						NOTMUCH_FIELD_PROCESSOR},
 };
 
 static void
diff --git a/lib/index.cc b/lib/index.cc
index 3f694387..299b8770 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
     body = (char *) g_byte_array_free (byte_array, false);
 
     if (body) {
-	_notmuch_message_gen_terms (message, NULL, body);
+	_notmuch_message_gen_terms (message, "body", body);
 
 	free (body);
     }
-- 
2.19.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] index: add body: search query term
  2018-10-10  5:53 [PATCH RFC] index: add body: search query term William Casarin
@ 2018-10-10 10:43 ` David Bremner
  2018-10-10 16:34   ` William Casarin
  0 siblings, 1 reply; 4+ messages in thread
From: David Bremner @ 2018-10-10 10:43 UTC (permalink / raw)
  To: William Casarin, notmuch

William Casarin <jb55@jb55.com> writes:

>
>  lib/database.cc | 3 +++
>  lib/index.cc    | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 9cf8062c..0b085b21 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -297,6 +297,9 @@ prefix_t prefix_table[] = {
>      { "subject",		"XSUBJECT",	NOTMUCH_FIELD_EXTERNAL |
>  						NOTMUCH_FIELD_PROBABILISTIC |
>  						NOTMUCH_FIELD_PROCESSOR},
> +    { "body",			"XBODY",	NOTMUCH_FIELD_EXTERNAL |
> +						NOTMUCH_FIELD_PROBABILISTIC |
> +						NOTMUCH_FIELD_PROCESSOR},
>  };
>  
>  static void
> diff --git a/lib/index.cc b/lib/index.cc
> index 3f694387..299b8770 100644
> --- a/lib/index.cc
> +++ b/lib/index.cc
> @@ -506,7 +506,7 @@ _index_mime_part (notmuch_message_t *message,
>      body = (char *) g_byte_array_free (byte_array, false);
>  
>      if (body) {
> -	_notmuch_message_gen_terms (message, NULL, body);
> +	_notmuch_message_gen_terms (message, "body", body);
>  
>  	free (body);
>      }
> -- 

I think you'll find you broke non-prefixed queries. Does the test suite
still pass? If so, we need more tests. Anyway, if you add a second set
of terms I'd be intersted how much this bloats the index. Ideally with
the performance corpus so we can all reproduce the experiment.

d

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] index: add body: search query term
  2018-10-10 10:43 ` David Bremner
@ 2018-10-10 16:34   ` William Casarin
  2018-10-10 16:36     ` William Casarin
  0 siblings, 1 reply; 4+ messages in thread
From: William Casarin @ 2018-10-10 16:34 UTC (permalink / raw)
  To: David Bremner, notmuch

David Bremner <david@tethera.net> writes:

> William Casarin <jb55@jb55.com> writes:

> I think you'll find you broke non-prefixed queries. Does the test suite
> still pass? If so, we need more tests.

yeah they seem to pass. but you're right, something seems a bit off:

    ./notmuch count subject:github or body:github and tag:notmuch
    3271

    ./notmuch count github and tag:notmuch
    665

> of terms I'd be intersted how much this bloats the index. Ideally with
> the performance corpus so we can all reproduce the experiment.

sounds good, I was wondering that as well.

I wonder if it's all worth the effort though, since a workaround could
be:

    notmuch search <query> and not subject:<query>

If it's too annoying to have a body prefix, due to index bloat or
performance issues, would doing something hacky such as translating
'body:<query>' to '<query> and not subject:<query>' make sense?

Will

-- 
https://jb55.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] index: add body: search query term
  2018-10-10 16:34   ` William Casarin
@ 2018-10-10 16:36     ` William Casarin
  0 siblings, 0 replies; 4+ messages in thread
From: William Casarin @ 2018-10-10 16:36 UTC (permalink / raw)
  To: David Bremner, notmuch

William Casarin <jb55@jb55.com> writes:

> I wonder if it's all worth the effort though, since a workaround could
> be:
>
>     notmuch search <query> and not subject:<query>
>
> If it's too annoying to have a body prefix, due to index bloat or
> performance issues, would doing something hacky such as translating
> 'body:<query>' to '<query> and not subject:<query>' make sense?

Thinking about this some more, this is not exactly the same, since this
would explicitly exclude subjects, whereas the body query wouldn't care
what the subject was.

-- 
https://jb55.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-10 16:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-10  5:53 [PATCH RFC] index: add body: search query term William Casarin
2018-10-10 10:43 ` David Bremner
2018-10-10 16:34   ` William Casarin
2018-10-10 16:36     ` William Casarin

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).