unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* strange behavior of indexing of and searching for strings containing '[]'
@ 2010-02-05 16:44 Jameson Rollins
  2010-02-05 18:08 ` Oliver Charles
  2010-02-05 23:48 ` Olly Betts
  0 siblings, 2 replies; 6+ messages in thread
From: Jameson Rollins @ 2010-02-05 16:44 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]

Hey, folks.  I've been noticing some strange behavior of notmuch search
results for strings containing '[]'.  Here are some searches for some
exact strings in messages subjects:

servo:~ 0$ notmuch search subject:'emacs paned UI'
thread:533da424197bb6ba61a42b667d5d8d8f   Wed. 14:12 [2/2] Tad Fisher, Jameson Rollins; [notmuch] Emacs paned UI ()
servo:~ 0$ 

So that's fine and expected.  This however is not:

servo:~ 0$ notmuch search subject:'[notmuch] emacs paned UI'
thread:5f2cb4b108773a39161b33c86e54f7fd  4 mins. ago [1/1] Jameson Rollins; [notmuch] loss of duplicate messages (inbox)
servo:~ 0$ 

Not only did it not turn up the message that *does* match that exact
string in it's subject line, it actually turns up a completely different
message that doesn't match the search term at all!

This search actually turns up both:

servo:~ 0$ notmuch search subject:'notmuch emacs paned UI'
thread:5f2cb4b108773a39161b33c86e54f7fd  5 mins. ago [1/1] Jameson Rollins; [notmuch] loss of duplicate messages (inbox)
thread:533da424197bb6ba61a42b667d5d8d8f   Wed. 14:12 [2/2] Tad Fisher, Jameson Rollins; [notmuch] Emacs paned UI ()
servo:~ 0$ 

Which is again strange, because the second message does not at all match
that search term.

Does anyone have any idea what's going on here?  I think I saw mention
of this issue on IRC somewhere, but I thought I should bring it up
explicitly here.  This is definitely some buggy behavior.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange behavior of indexing of and searching for strings containing '[]'
  2010-02-05 16:44 strange behavior of indexing of and searching for strings containing '[]' Jameson Rollins
@ 2010-02-05 18:08 ` Oliver Charles
  2010-02-05 23:48 ` Olly Betts
  1 sibling, 0 replies; 6+ messages in thread
From: Oliver Charles @ 2010-02-05 18:08 UTC (permalink / raw)
  To: Jameson Rollins; +Cc: Notmuch Mail

On Fri, Feb 5, 2010 at 4:44 PM, Jameson Rollins
<jrollins@finestructure.net> wrote:
> Does anyone have any idea what's going on here?  I think I saw mention
> of this issue on IRC somewhere, but I thought I should bring it up
> explicitly here.  This is definitely some buggy behavior.

Afaik, stuff in between [] is not indexed, but that doesn't quite
explain the other weird results.

-- 
    Oliver Charles / aCiD2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange behavior of indexing of and searching for strings containing '[]'
  2010-02-05 16:44 strange behavior of indexing of and searching for strings containing '[]' Jameson Rollins
  2010-02-05 18:08 ` Oliver Charles
@ 2010-02-05 23:48 ` Olly Betts
  2010-02-08 17:24   ` Jameson Rollins
  1 sibling, 1 reply; 6+ messages in thread
From: Olly Betts @ 2010-02-05 23:48 UTC (permalink / raw)
  To: notmuch

On 2010-02-05, Jameson Rollins wrote:
> Hey, folks.  I've been noticing some strange behavior of notmuch search
> results for strings containing '[]'.  Here are some searches for some
> exact strings in messages subjects:

The '[]' is a red herring.  Xapian's TermGenerator and QueryParser classes
treat these two characters pretty much as if they were spaces.

> servo:~ 0$ notmuch search subject:'emacs paned UI'

Note that the '' is quoting for the shell only here.  So Xapian sees:

    subject:emacs paned UI

Assuming you are defaulting to an AND search, that's `emacs in the subject'
AND `paned anywhere in the indexed text' AND `UI anywhere in the indexed text'.

To specify a quoted phrase you want "" anyway (not ''), so the command
matching what I think you intended to search for is:

    notmuch search 'subject:"emacs paned UI"'
 
> servo:~ 0$ notmuch search subject:'[notmuch] emacs paned UI'

    notmuch search 'subject:"[notmuch] emacs paned UI"'

Which should return identical results to:

    notmuch search 'subject:"notmuch emacs paned UI"'

> thread:5f2cb4b108773a39161b33c86e54f7fd  4 mins. ago [1/1] Jameson Rollins;=
>  [notmuch] loss of duplicate messages (inbox)
> servo:~ 0$=20
>
> Not only did it not turn up the message that *does* match that exact
> string in it's subject line, it actually turns up a completely different
> message that doesn't match the search term at all!

It matches the notmuch in the subject, and presumably emacs, paned, and UI
in the body.

> [snip the rest - the same explanations apply]

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange behavior of indexing of and searching for strings containing '[]'
  2010-02-05 23:48 ` Olly Betts
@ 2010-02-08 17:24   ` Jameson Rollins
  2010-02-08 17:35     ` Jed Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Jameson Rollins @ 2010-02-08 17:24 UTC (permalink / raw)
  To: Olly Betts, notmuch

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

On Fri, 5 Feb 2010 23:48:03 +0000 (UTC), Olly Betts <olly@survex.com> wrote:
> On 2010-02-05, Jameson Rollins wrote:
> > Hey, folks.  I've been noticing some strange behavior of notmuch search
> > results for strings containing '[]'.  Here are some searches for some
> > exact strings in messages subjects:
> 
> The '[]' is a red herring.  Xapian's TermGenerator and QueryParser classes
> treat these two characters pretty much as if they were spaces.

Ah.  Thanks for the response, Olly.  This clears things up a lot.

> > servo:~ 0$ notmuch search subject:'emacs paned UI'
> 
> Note that the '' is quoting for the shell only here.  So Xapian sees:
> 
>     subject:emacs paned UI
> 
> Assuming you are defaulting to an AND search, that's `emacs in the subject'
> AND `paned anywhere in the indexed text' AND `UI anywhere in the indexed text'.

I don't think that this is exactly correct.  The quoting is interpreted
by the shell in order to construct a single string that is then passed
as an argument to the program.  Notmuch should then be seeing the single
string argument as the search parameter, and not breaking it up further.

Here's an example of what I mean:

servo:~/tmp/cdtemp.AYroUf 0$ cat parse 
#!/bin/bash
for arg; do echo "$arg"; done
servo:~/tmp/cdtemp.AYroUf 0$ ./parse subject:foo bar baz
subject:foo
bar
baz
servo:~/tmp/cdtemp.AYroUf 0$ ./parse subject:'foo bar' baz
subject:foo bar
baz
servo:~/tmp/cdtemp.AYroUf 0$ ./parse subject:"foo bar" baz
subject:foo bar
baz
servo:~/tmp/cdtemp.AYroUf 0$ 

As you can see in the last command, the argument "subject'foo bar'" is
passed as a single string by the shell, and should therefore be
interpreted as such by notmuch.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange behavior of indexing of and searching for strings containing '[]'
  2010-02-08 17:24   ` Jameson Rollins
@ 2010-02-08 17:35     ` Jed Brown
  2010-02-08 17:37       ` Jameson Rollins
  0 siblings, 1 reply; 6+ messages in thread
From: Jed Brown @ 2010-02-08 17:35 UTC (permalink / raw)
  To: Jameson Rollins, Olly Betts, notmuch

On Mon, 08 Feb 2010 12:24:06 -0500, Jameson Rollins <jrollins@finestructure.net> wrote:
> I don't think that this is exactly correct.  The quoting is interpreted
> by the shell in order to construct a single string that is then passed
> as an argument to the program.

The command line distinguishes, but the constructed query does not.
Look at query-string.c, the arguments are just concatenated.

Jed

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: strange behavior of indexing of and searching for strings containing '[]'
  2010-02-08 17:35     ` Jed Brown
@ 2010-02-08 17:37       ` Jameson Rollins
  0 siblings, 0 replies; 6+ messages in thread
From: Jameson Rollins @ 2010-02-08 17:37 UTC (permalink / raw)
  To: Jed Brown, Olly Betts, notmuch

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Mon, 08 Feb 2010 18:35:44 +0100, Jed Brown <jed@59A2.org> wrote:
> On Mon, 08 Feb 2010 12:24:06 -0500, Jameson Rollins <jrollins@finestructure.net> wrote:
> > I don't think that this is exactly correct.  The quoting is interpreted
> > by the shell in order to construct a single string that is then passed
> > as an argument to the program.
> 
> The command line distinguishes, but the constructed query does not.
> Look at query-string.c, the arguments are just concatenated.

Hi, Jed.  Yes, this is clear from the behavior, but I'm claiming it's a
bug that should be fixed.  It produces unexpected behavior with
confusing results.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-02-08 17:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-05 16:44 strange behavior of indexing of and searching for strings containing '[]' Jameson Rollins
2010-02-05 18:08 ` Oliver Charles
2010-02-05 23:48 ` Olly Betts
2010-02-08 17:24   ` Jameson Rollins
2010-02-08 17:35     ` Jed Brown
2010-02-08 17:37       ` Jameson Rollins

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).