unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Searching for phrases in the body of an email
@ 2015-07-17 12:11 Suvayu Ali
  2015-07-17 15:48 ` J. Lewis Muir
  0 siblings, 1 reply; 9+ messages in thread
From: Suvayu Ali @ 2015-07-17 12:11 UTC (permalink / raw)
  To: Notmuch list

Hi,

I'm trying to find those annoying emails which have useless plain text
parts.  As I recall, they had a phrase something along the lines of "not
available in plain text" or "no plain text".  So of course I searched
for "plain text".  But that returns hundreds of messages with no obvious
matches, I can't even find the phrase "plain text" in the body for most
of the results!

Here is an example:

$ notmuch search --limit=1 -- no NEAR "plain text"
thread:000000000000a2a6   Sat. 00:30 [1/1] NASA Jet Propulsion Laboratory; NASA's Curiosity Mars Rover Tracks Sunspots (2015 2015-07 inbox)
$ notmuch show --format=raw -- thread:000000000000a2a6 | grep 'plain text'
$

To make this stranger, here are more numbers:

$ notmuch show -- $(notmuch search --output=messages -- no NEAR "plain text") | \
      grep -c -e 'plain text'
7
$ notmuch count -- no NEAR "plain text"
461

I do not understand this at all!  Any thoughts?

Thanks in advance,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-17 12:11 Searching for phrases in the body of an email Suvayu Ali
@ 2015-07-17 15:48 ` J. Lewis Muir
  2015-07-18  9:11   ` Suvayu Ali
  0 siblings, 1 reply; 9+ messages in thread
From: J. Lewis Muir @ 2015-07-17 15:48 UTC (permalink / raw)
  To: Notmuch list

On 7/17/15 7:11 AM, Suvayu Ali wrote:
> Hi,
>
> I'm trying to find those annoying emails which have useless plain
> text parts.  As I recall, they had a phrase something along the lines
> of "not available in plain text" or "no plain text".  So of course I
> searched for "plain text".  But that returns hundreds of messages with
> no obvious matches, I can't even find the phrase "plain text" in the
> body for most of the results!

[snip]

> I do not understand this at all!  Any thoughts?

Hello, Suvayu.

I can't speak to the notmuch search results since I actually don't have
experience with it (I'm planning to switch my email setup to using
notmuch, but I actually haven't switched yet!), but I can give a few
ideas for some of your puzzlements:

1. Perhaps you are remembering the "no plain text" message incorrectly?
   For example, the message could have referred to "text/plain" or
   "plaintext" (no space).  These would be sufficiently different to not
   match your grep pattern.

2. Perhaps your email client rendered the "no plain text" message when
   it encountered an email with only a "text/html" content type?  In
   this case, the "no plain text" (or whatever) message would not be
   present in the email itself since it would be generated by the email
   client when rendering the email.

3. A really long shot, but could a line wrap have occurred after "plain"
   such that "text" appeared on the next line?  Your grep pattern would
   not match that.

Regards,

Lewis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-17 15:48 ` J. Lewis Muir
@ 2015-07-18  9:11   ` Suvayu Ali
  2015-07-18 14:54     ` Xu Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Suvayu Ali @ 2015-07-18  9:11 UTC (permalink / raw)
  To: notmuch

Hi Lewis,

On Fri, Jul 17, 2015 at 10:48:57AM -0500, J. Lewis Muir wrote:
> 
> 1. Perhaps you are remembering the "no plain text" message incorrectly?
>    For example, the message could have referred to "text/plain" or
>    "plaintext" (no space).  These would be sufficiently different to not
>    match your grep pattern.

True, but my puzzlement is notmuch shouldn't return those results in the
first place, since I provided a quoted string: "plain text", unless of
course I need to escape the quotes.  Okay, just checked it, doesn't make
a difference in the number of hits from notmuch.

> 2. Perhaps your email client rendered the "no plain text" message when
>    it encountered an email with only a "text/html" content type?  In
>    this case, the "no plain text" (or whatever) message would not be
>    present in the email itself since it would be generated by the email
>    client when rendering the email.

This is possible, but I use mutt.  As far as I know, it doesn't do
"smart" things like that.  I also recall looking at the mime parts
individually as I was surprised at the behaviour, and it was indeed a
useless text/plain part with that message.

> 3. A really long shot, but could a line wrap have occurred after "plain"
>    such that "text" appeared on the next line?  Your grep pattern would
>    not match that.

Good point, I tried grepping for this instead: 'plain[[:space:]/]+text',
no luck.

Thanks for your comments.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18  9:11   ` Suvayu Ali
@ 2015-07-18 14:54     ` Xu Wang
  2015-07-18 15:32       ` Suvayu Ali
  0 siblings, 1 reply; 9+ messages in thread
From: Xu Wang @ 2015-07-18 14:54 UTC (permalink / raw)
  To: notmuch

On Sat, Jul 18, 2015 at 5:11 AM, Suvayu Ali <fatkasuvayu+linux@gmail.com> wrote:
> Hi Lewis,
>
> On Fri, Jul 17, 2015 at 10:48:57AM -0500, J. Lewis Muir wrote:
>>
>> 1. Perhaps you are remembering the "no plain text" message incorrectly?
>>    For example, the message could have referred to "text/plain" or
>>    "plaintext" (no space).  These would be sufficiently different to not
>>    match your grep pattern.
>
> True, but my puzzlement is notmuch shouldn't return those results in the
> first place, since I provided a quoted string: "plain text", unless of
> course I need to escape the quotes.  Okay, just checked it, doesn't make
> a difference in the number of hits from notmuch.
>
>> 2. Perhaps your email client rendered the "no plain text" message when
>>    it encountered an email with only a "text/html" content type?  In
>>    this case, the "no plain text" (or whatever) message would not be
>>    present in the email itself since it would be generated by the email
>>    client when rendering the email.
>
> This is possible, but I use mutt.  As far as I know, it doesn't do
> "smart" things like that.  I also recall looking at the mime parts
> individually as I was surprised at the behaviour, and it was indeed a
> useless text/plain part with that message.
>
>> 3. A really long shot, but could a line wrap have occurred after "plain"
>>    such that "text" appeared on the next line?  Your grep pattern would
>>    not match that.
>
> Good point, I tried grepping for this instead: 'plain[[:space:]/]+text',
> no luck.
>
> Thanks for your comments.
>
> Cheers,
>
> --
> Suvayu

Hi Suvayu,

First note that I believe notmuch search is case insensitive by
default, so your grep should be case insensitive as well.

More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
parsed. Maybe it is parsed as {no NEAR plain} or {text}.

You would like to search for the exact phrase, correct? How about the
following?

notmuch search no adj plain adj text

Best,

Xu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18 14:54     ` Xu Wang
@ 2015-07-18 15:32       ` Suvayu Ali
  2015-07-18 15:53         ` Jani Nikula
  2015-07-18 16:34         ` Xu Wang
  0 siblings, 2 replies; 9+ messages in thread
From: Suvayu Ali @ 2015-07-18 15:32 UTC (permalink / raw)
  To: notmuch

Hi Xu,

On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> 
> First note that I believe notmuch search is case insensitive by
> default, so your grep should be case insensitive as well.

Good point, I tried that, didn't change the numbers much.  The number of
matches from grep went up to 24, whereas notmuch count says 463.

> More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> 

Exactly, that's what I do not understand.

> You would like to search for the exact phrase, correct? How about the
> following?
> 
> notmuch search no adj plain adj text

Good suggestion.  I tried it, and gives me very consistent numbers:

$ notmuch count -- no ADJ plain ADJ text
20
$ notmuch show -- $(notmuch search --output=messages -- no NEAR \"plain\ text\") | \
      grep -c -iE 'plain[[:space:]/]+text'
24

Of course this does not help me solve my original goal, but I guess now
I can try different queries based on your idea.

Thanks a lot!

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18 15:32       ` Suvayu Ali
@ 2015-07-18 15:53         ` Jani Nikula
  2015-07-18 22:24           ` Suvayu Ali
  2015-07-18 16:34         ` Xu Wang
  1 sibling, 1 reply; 9+ messages in thread
From: Jani Nikula @ 2015-07-18 15:53 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On Jul 18, 2015 6:32 PM, "Suvayu Ali" <fatkasuvayu+linux@gmail.com> wrote:
>
> Hi Xu,
>
> On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> >
> > First note that I believe notmuch search is case insensitive by
> > default, so your grep should be case insensitive as well.
>
> Good point, I tried that, didn't change the numbers much.  The number of
> matches from grep went up to 24, whereas notmuch count says 463.
>
> > More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> > parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> >
>
> Exactly, that's what I do not understand.
>

export NOTMUCH_DEBUG_QUERY=1

might help.

> > You would like to search for the exact phrase, correct? How about the
> > following?
> >
> > notmuch search no adj plain adj text
>
> Good suggestion.  I tried it, and gives me very consistent numbers:
>
> $ notmuch count -- no ADJ plain ADJ text
> 20
> $ notmuch show -- $(notmuch search --output=messages -- no NEAR \"plain\
text\") | \
>       grep -c -iE 'plain[[:space:]/]+text'
> 24
>
> Of course this does not help me solve my original goal, but I guess now
> I can try different queries based on your idea.
>
> Thanks a lot!
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

[-- Attachment #2: Type: text/html, Size: 2090 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18 15:32       ` Suvayu Ali
  2015-07-18 15:53         ` Jani Nikula
@ 2015-07-18 16:34         ` Xu Wang
  2015-07-18 22:33           ` Suvayu Ali
  1 sibling, 1 reply; 9+ messages in thread
From: Xu Wang @ 2015-07-18 16:34 UTC (permalink / raw)
  To: notmuch

On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
<fatkasuvayu+linux@gmail.com> wrote:

> Of course this does not help me solve my original goal, but I guess now
> I can try different queries based on your idea.

Ah I see. Your goal is to search for phrases close to "no plain text".
But if you use fuzzy searching but an exact grep, then it is normal
that the numbers are not consistent, no? Because your grep is not
fuzzy.

Kind regards,

Xu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18 15:53         ` Jani Nikula
@ 2015-07-18 22:24           ` Suvayu Ali
  0 siblings, 0 replies; 9+ messages in thread
From: Suvayu Ali @ 2015-07-18 22:24 UTC (permalink / raw)
  To: notmuch

Hi Jani,

On Sat, Jul 18, 2015 at 06:53:53PM +0300, Jani Nikula wrote:
> On Jul 18, 2015 6:32 PM, "Suvayu Ali" <fatkasuvayu+linux@gmail.com> wrote:
> > On Sat, Jul 18, 2015 at 10:54:30AM -0400, Xu Wang wrote:
> > >
> > > First note that I believe notmuch search is case insensitive by
> > > default, so your grep should be case insensitive as well.
> >
> > Good point, I tried that, didn't change the numbers much.  The number of
> > matches from grep went up to 24, whereas notmuch count says 463.
> >
> > > More importantly, I'm not sure how 'no NEAR "plain text" ' syntax is
> > > parsed. Maybe it is parsed as {no NEAR plain} or {text}.
> > >
> >
> > Exactly, that's what I do not understand.
> >
> 
> export NOTMUCH_DEBUG_QUERY=1
> 
> might help.

That helped a lot!  This is what I get:

  $ notmuch count -- no NEAR \"plain\ text\"
  Query string is:
  no NEAR "plain text"
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND text:(pos=4)))
  465
  $ notmuch count -- \"plain\ text\"
  Query string is:
  "plain text"
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND (plain:(pos=1) PHRASE 2 text:(pos=2))))
  870

I wanted the "plain text" to be treated as a phrase, as in the second
case.  I have tried nesting the quotes.  The closest I got to was this:

  $ notmuch count -- no NEAR 'plain\ text'
  Query string is:
  no NEAR plain\ text
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND (no:(pos=1) NEAR 11 plain:(pos=2)) AND Ztext:(pos=3)))
  151

I then tried this:

  $ notmuch count -- no NEAR \(plain ADJ/1 text\)
  Query string is:
  no NEAR (plain ADJ/1 text)
  Exclude query is:
  Xapian::Query()
  Final query is:
  Xapian::Query((Tmail AND Zno:(pos=1) AND near:(pos=2) AND Zplain:(pos=3) AND (adj:(pos=4) PHRASE 2 1:(pos=5)) AND Ztext:(pos=6)))
  0

Again, this is not what I was expecting.  With the last one, I was
expecting to group "plain" and "text" within a distance of 1, in the
given order, and then requring "no" to be near (within 10 words, the
default) the "plain ADJ/1 text" combination.

Is my understanding of the query language completely wrong?  Apart from
`man notmuch-search-terms', I looked here:
http://xapian.org/docs/queryparser.html

Thanks for any help.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Searching for phrases in the body of an email
  2015-07-18 16:34         ` Xu Wang
@ 2015-07-18 22:33           ` Suvayu Ali
  0 siblings, 0 replies; 9+ messages in thread
From: Suvayu Ali @ 2015-07-18 22:33 UTC (permalink / raw)
  To: notmuch

On Sat, Jul 18, 2015 at 12:34:16PM -0400, Xu Wang wrote:
> On Sat, Jul 18, 2015 at 11:32 AM, Suvayu Ali
> <fatkasuvayu+linux@gmail.com> wrote:
> 
> > Of course this does not help me solve my original goal, but I guess now
> > I can try different queries based on your idea.
> 
> Ah I see. Your goal is to search for phrases close to "no plain text".
> But if you use fuzzy searching but an exact grep, then it is normal
> that the numbers are not consistent, no? Because your grep is not
> fuzzy.

My grep was this (case insensitive): 'plain[[:space:]/]+text'.  Since I
thought I was searching for the _phrase_ "plain text", that would be
adequately fuzzy.  However after following Jani's advice, I realise it
wasn't always being treated as a phrase, neither was NEAR being treated
as an operator.  

I wanted to combine a phrase (plain text) with the NEAR query (NEAR no),
but maybe that combination is not possible.  That's why I tried to
combine NEAR and ADJ (as per your suggestion) by grouping them, that
does not seem to work either!

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-18 22:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 12:11 Searching for phrases in the body of an email Suvayu Ali
2015-07-17 15:48 ` J. Lewis Muir
2015-07-18  9:11   ` Suvayu Ali
2015-07-18 14:54     ` Xu Wang
2015-07-18 15:32       ` Suvayu Ali
2015-07-18 15:53         ` Jani Nikula
2015-07-18 22:24           ` Suvayu Ali
2015-07-18 16:34         ` Xu Wang
2015-07-18 22:33           ` Suvayu Ali

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).