unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Strange match to my query
@ 2011-01-25 23:29 Mark Anderson
  2011-01-26  0:51 ` Austin Clements
  2011-02-25 18:19 ` Jameson Rollins
  0 siblings, 2 replies; 13+ messages in thread
From: Mark Anderson @ 2011-01-25 23:29 UTC (permalink / raw)
  To: notmuch

Hi guys, What's up? ("Notmuch")

Apparently matching on email addresses doesn't work the way I hoped.

While debugging why my to:x@y.com search was matching far too many
entries, I whittled it down to this:

WORD1=hello
WORD2=goodbye
MSGID=junk$(date +%s)
TESTDIR=$(notmuch config get database.path)/.tmp/new
TESTMAIL=$TESTDIR/$MSGID:2,

mkdir -p $TESTDIR

echo Testcase for $WORD1@$WORD2, msgid: $MSGID@junk.com

echo "From: nobody@nobody.com
To: c@${WORD1}.com, K-R@${WORD2}.com
Date: Mon, 24 Jan 2011 23:41:34 -0600
Subject: Error
Message-ID: <$MSGID@junk.com>

Not empty body.=

" > $TESTMAIL

notmuch new
notmuch search --output=files to:$WORD1@$WORD2
notmuch search --output=files to:\"$WORD1@$WORD2\"

Why does that match, but this doesn't?

notmuch search --output=files to:\'$WORD1@$WORD2\'

Apparently single quotes are the only quote for Xapian's parser?

I guess this is a strong vote for the quick integration of the custom
parser with optimization passes that turn emails into phrases that can't
match across multiple emails.

This was just an egregious example of notmuch giving me notmuch of what
I wanted, or actually, far too much of what I didn't want.

Thanks,
-Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-01-25 23:29 Strange match to my query Mark Anderson
@ 2011-01-26  0:51 ` Austin Clements
  2011-01-26  2:19   ` Carl Worth
  2011-02-25 18:19 ` Jameson Rollins
  1 sibling, 1 reply; 13+ messages in thread
From: Austin Clements @ 2011-01-26  0:51 UTC (permalink / raw)
  To: Mark Anderson; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

Well-constructed test message.  Xapian's query parser is actually doing the
right thing [1] and this is a bug in the way notmuch indexes address list
headers.  For each address, _notmuch_message_gen_terms resets the term
generator's term position, so your To header indexes with positions as
  c:1 hello:2 com:3 K:1 R:2 world:3 com:4
Thus, the phrase query "hello world" matches hello in position 2 and world
in position 3.  Probably the right thing for notmuch to do is to jump up the
term generator position between each address so phrase queries don't cross
them or span them.

[1] Your to:\'$WORD1@$WORD2\' query didn't work because Xapian doesn't
accept a single quote after a prefix.

On Tue, Jan 25, 2011 at 6:29 PM, Mark Anderson <MarkR.Anderson@amd.com>wrote:

> Hi guys, What's up? ("Notmuch")
>
> Apparently matching on email addresses doesn't work the way I hoped.
>
> While debugging why my to:x@y.com <to%3Ax@y.com> search was matching far
> too many
> entries, I whittled it down to this:
>
> WORD1=hello
> WORD2=goodbye
> MSGID=junk$(date +%s)
> TESTDIR=$(notmuch config get database.path)/.tmp/new
> TESTMAIL=$TESTDIR/$MSGID:2,
>
> mkdir -p $TESTDIR
>
> echo Testcase for $WORD1@$WORD2, msgid: $MSGID@junk.com
>
> echo "From: nobody@nobody.com
> To: c@${WORD1}.com, K-R@${WORD2}.com
> Date: Mon, 24 Jan 2011 23:41:34 -0600
> Subject: Error
> Message-ID: <$MSGID@junk.com>
>
> Not empty body.=
>
> " > $TESTMAIL
>
> notmuch new
> notmuch search --output=files to:$WORD1@$WORD2
> notmuch search --output=files to:\"$WORD1@$WORD2\"
>
> Why does that match, but this doesn't?
>
> notmuch search --output=files to:\'$WORD1@$WORD2\'
>
> Apparently single quotes are the only quote for Xapian's parser?
>
> I guess this is a strong vote for the quick integration of the custom
> parser with optimization passes that turn emails into phrases that can't
> match across multiple emails.
>
> This was just an egregious example of notmuch giving me notmuch of what
> I wanted, or actually, far too much of what I didn't want.
>
> Thanks,
> -Mark
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>

[-- Attachment #2: Type: text/html, Size: 3023 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-01-26  0:51 ` Austin Clements
@ 2011-01-26  2:19   ` Carl Worth
  2011-01-26  5:59     ` Carl Worth
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Worth @ 2011-01-26  2:19 UTC (permalink / raw)
  To: Austin Clements, Mark Anderson; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

On Tue, 25 Jan 2011 19:51:14 -0500, Austin Clements <amdragon@gmail.com> wrote:
> Well-constructed test message.  Xapian's query parser is actually doing the
> right thing [1] and this is a bug in the way notmuch indexes address list
> headers.  For each address, _notmuch_message_gen_terms resets the term
> generator's term position, so your To header indexes with positions as
>   c:1 hello:2 com:3 K:1 R:2 world:3 com:4

Thanks, Austin!

I was actually giving a demo of notmuch to someone yesterday who was
really interested in the details of how Xapian actually stores things.

I dug around a bit with delve and we were both really surprised by the
position results we were seeing. Neither of us could make any sense of
them at all.

And thanks, Mark for the bug report and the nice test case. I'll add
this to the test suite, and fix it. And that will give us yet one more
reason for all of us to rebuild our databases after the upcoming
release.

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-01-26  2:19   ` Carl Worth
@ 2011-01-26  5:59     ` Carl Worth
  2011-01-26 17:02       ` Mark Anderson
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Worth @ 2011-01-26  5:59 UTC (permalink / raw)
  To: Austin Clements, Mark Anderson; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 419 bytes --]

On Wed, 26 Jan 2011 12:19:17 +1000, Carl Worth <cworth@cworth.org> wrote:
> And thanks, Mark for the bug report and the nice test case. I'll add
> this to the test suite, and fix it. And that will give us yet one more
> reason for all of us to rebuild our databases after the upcoming
> release.

I've added a test case for this now, fixed the bug, and pushed out the
new code.

Thanks again for the bug report.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-01-26  5:59     ` Carl Worth
@ 2011-01-26 17:02       ` Mark Anderson
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Anderson @ 2011-01-26 17:02 UTC (permalink / raw)
  To: Carl Worth, Austin Clements; +Cc: notmuch@notmuchmail.org

On Tue, 25 Jan 2011 23:59:50 -0600, Carl Worth <cworth@cworth.org> wrote:
> On Wed, 26 Jan 2011 12:19:17 +1000, Carl Worth <cworth@cworth.org> wrote:
> > And thanks, Mark for the bug report and the nice test case. I'll add
> > this to the test suite, and fix it. And that will give us yet one more
> > reason for all of us to rebuild our databases after the upcoming
> > release.
> 
> I've added a test case for this now, fixed the bug, and pushed out the
> new code.
> 
> Thanks again for the bug report.

That's great, apparently submitting the testcase was the best thing I
could do, because I didn't realize that I needed a 2-part name to align
the term lists, although I did start from one.  And now at least I know
that I can't construct the correct query without an updated notmuch.

It was very confusing trying to bend my head around the issue and tell
myself that I just didn't understand how notmuch worked at all on
searching through email addresses.

Glad to see such a quick response to my bug report.

> -Carl

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-01-25 23:29 Strange match to my query Mark Anderson
  2011-01-26  0:51 ` Austin Clements
@ 2011-02-25 18:19 ` Jameson Rollins
  2011-02-25 20:57   ` Mark Anderson
  1 sibling, 1 reply; 13+ messages in thread
From: Jameson Rollins @ 2011-02-25 18:19 UTC (permalink / raw)
  To: Mark Anderson, notmuch

[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]

On Tue, 25 Jan 2011 16:29:22 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
> Apparently matching on email addresses doesn't work the way I hoped.
> 
> While debugging why my to:x@y.com search was matching far too many
> entries, I whittled it down to this:
> 
> WORD1=hello
> WORD2=goodbye
> MSGID=junk$(date +%s)
> TESTDIR=$(notmuch config get database.path)/.tmp/new
> TESTMAIL=$TESTDIR/$MSGID:2,
> 
> mkdir -p $TESTDIR
> 
> echo Testcase for $WORD1@$WORD2, msgid: $MSGID@junk.com
> 
> echo "From: nobody@nobody.com
> To: c@${WORD1}.com, K-R@${WORD2}.com
> Date: Mon, 24 Jan 2011 23:41:34 -0600
> Subject: Error
> Message-ID: <$MSGID@junk.com>
> 
> Not empty body.=
> 
> " > $TESTMAIL
> 
> notmuch new
> notmuch search --output=files to:$WORD1@$WORD2
> notmuch search --output=files to:\"$WORD1@$WORD2\"
> 
> Why does that match, but this doesn't?
> 
> notmuch search --output=files to:\'$WORD1@$WORD2\'

Hey, guys.  Reopening an old thread here, found while trying to track
down a similar problem.

I'm confused why any of these searches should return anything at all.
"$WORD1@$WORD2" doesn't actually match either of the addresses in the
test message, especially when quoted.  The expanded addresses should be:

  c@hello.com
  K-R@goodbye.com

Why should

  hello@goodbye

match anything?  And in fact it doesn't for me if I recreate the same
setup.  Am I missing something?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-02-25 18:19 ` Jameson Rollins
@ 2011-02-25 20:57   ` Mark Anderson
  2011-02-25 21:29     ` Jameson Rollins
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Anderson @ 2011-02-25 20:57 UTC (permalink / raw)
  To: Jameson Rollins, notmuch@notmuchmail.org

On Fri, 25 Feb 2011 12:19:21 -0600, Jameson Rollins <jrollins@finestructure.net> wrote:
> On Tue, 25 Jan 2011 16:29:22 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
> > Apparently matching on email addresses doesn't work the way I hoped.
> > 
> > While debugging why my to:x@y.com search was matching far too many
> > entries, I whittled it down to this:
> > 
> > WORD1=hello
> > WORD2=goodbye
> > MSGID=junk$(date +%s)
> > TESTDIR=$(notmuch config get database.path)/.tmp/new
> > TESTMAIL=$TESTDIR/$MSGID:2,
> > 
> > mkdir -p $TESTDIR
> > 
> > echo Testcase for $WORD1@$WORD2, msgid: $MSGID@junk.com
> > 
> > echo "From: nobody@nobody.com
> > To: c@${WORD1}.com, K-R@${WORD2}.com
> > Date: Mon, 24 Jan 2011 23:41:34 -0600
> > Subject: Error
> > Message-ID: <$MSGID@junk.com>
> > 
> > Not empty body.=
> > 
> > " > $TESTMAIL
> > 
> > notmuch new
> > notmuch search --output=files to:$WORD1@$WORD2
> > notmuch search --output=files to:\"$WORD1@$WORD2\"
> > 
> > Why does that match, but this doesn't?
> > 
> > notmuch search --output=files to:\'$WORD1@$WORD2\'
> 
> Hey, guys.  Reopening an old thread here, found while trying to track
> down a similar problem.
> 
> I'm confused why any of these searches should return anything at all.
> "$WORD1@$WORD2" doesn't actually match either of the addresses in the
> test message, especially when quoted.  The expanded addresses should be:
> 
>   c@hello.com
>   K-R@goodbye.com
> 
> Why should
> 
>   hello@goodbye
> 
> match anything?  And in fact it doesn't for me if I recreate the same
> setup.  Am I missing something?

It shouldn't match anything, that's the value of finding this bug.

What happened is the term counter was reset for each email address, so
the term list for emails in "to:" looks something like this:

0 c      K
1 hello  R
2 com    goodbye
3        com

So it matched a hello at 1 and a goodbye at 2.

I don't remember where the discussion on this went, but it was on the
list.

Perhaps you should search for it, it should take notmuch to
find... *duck*

-Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-02-25 20:57   ` Mark Anderson
@ 2011-02-25 21:29     ` Jameson Rollins
  2011-03-01 23:00       ` Mark Anderson
  0 siblings, 1 reply; 13+ messages in thread
From: Jameson Rollins @ 2011-02-25 21:29 UTC (permalink / raw)
  To: notmuch@notmuchmail.org

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On Fri, 25 Feb 2011 13:57:23 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
> It shouldn't match anything, that's the value of finding this bug.
> 
> What happened is the term counter was reset for each email address, so
> the term list for emails in "to:" looks something like this:
> 
> 0 c      K
> 1 hello  R
> 2 com    goodbye
> 3        com
> 
> So it matched a hello at 1 and a goodbye at 2.

I see now.  I was confused about which problem you were reporting.

So I am in fact still seeing this bug, although I am ostensibly using a
version that includes the patch to fix it (db70f3f0).  Does this fix
require rebuilding the database?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-02-25 21:29     ` Jameson Rollins
@ 2011-03-01 23:00       ` Mark Anderson
  2011-03-01 23:15         ` Jameson Rollins
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Anderson @ 2011-03-01 23:00 UTC (permalink / raw)
  To: Jameson Rollins, notmuch@notmuchmail.org

On Fri, 25 Feb 2011 15:29:05 -0600, Jameson Rollins <jrollins@finestructure.net> wrote:
> So I am in fact still seeing this bug, although I am ostensibly using a
> version that includes the patch to fix it (db70f3f0).  Does this fix
> require rebuilding the database?

Yes.

The termlist is constructed when the message is added to the database,
so the database must be reconstructed.

Newer messages will index email addresses so that they can't be matched
by overlapping term indexes.  However, the corpus of your database is
not going to change without manual intervention.

A simple rebuild when you go to bed can look like:

notmuch dump >dump.txt; notmuch new; notmuch restore dump.txt

-Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-03-01 23:00       ` Mark Anderson
@ 2011-03-01 23:15         ` Jameson Rollins
  2011-03-01 23:40           ` Mark Anderson
  2011-04-14  7:32           ` Florian Friesdorf
  0 siblings, 2 replies; 13+ messages in thread
From: Jameson Rollins @ 2011-03-01 23:15 UTC (permalink / raw)
  To: Mark Anderson, notmuch@notmuchmail.org

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

On Tue, 1 Mar 2011 16:00:51 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
> On Fri, 25 Feb 2011 15:29:05 -0600, Jameson Rollins <jrollins@finestructure.net> wrote:
> > So I am in fact still seeing this bug, although I am ostensibly using a
> > version that includes the patch to fix it (db70f3f0).  Does this fix
> > require rebuilding the database?
> 
> Yes.
>
> The termlist is constructed when the message is added to the database,
> so the database must be reconstructed.
> 
> Newer messages will index email addresses so that they can't be matched
> by overlapping term indexes.  However, the corpus of your database is
> not going to change without manual intervention.

Ok, that's what I thought.  Thanks for the feedback, Mark.
 
> A simple rebuild when you go to bed can look like:

I think you're missing an important step:

notmuch dump >dump.txt
mv $(notmuch config get database.path){,.bak}
notmuch new
notmuch restore dump.txt

but I get the idea ;)

Thanks again.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-03-01 23:15         ` Jameson Rollins
@ 2011-03-01 23:40           ` Mark Anderson
  2011-04-14  7:32           ` Florian Friesdorf
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Anderson @ 2011-03-01 23:40 UTC (permalink / raw)
  To: Jameson Rollins, notmuch@notmuchmail.org

On Tue, 1 Mar 2011 17:15:22 -0600, Jameson Rollins <jrollins@finestructure.net> wrote:
> On Tue, 1 Mar 2011 16:00:51 -0700, Mark Anderson <MarkR.Anderson@amd.com> wrote:
>  
> > A simple rebuild when you go to bed can look like:
> 
> I think you're missing an important step:
> 
> notmuch dump >dump.txt
> mv $(notmuch config get database.path){,.bak}
> notmuch new
> notmuch restore dump.txt

True, that would be much better than my proposed flow.

-Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-03-01 23:15         ` Jameson Rollins
  2011-03-01 23:40           ` Mark Anderson
@ 2011-04-14  7:32           ` Florian Friesdorf
  2011-04-19  6:21             ` Jameson Graef Rollins
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Friesdorf @ 2011-04-14  7:32 UTC (permalink / raw)
  To: Jameson Rollins, Mark Anderson, notmuch@notmuchmail.org

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

On Tue, 01 Mar 2011 15:15:22 -0800, Jameson Rollins <jrollins@finestructure.net> wrote:
> > A simple rebuild when you go to bed can look like:
> 
> I think you're missing an important step:
> 
> notmuch dump >dump.txt
> mv $(notmuch config get database.path){,.bak}

Catching up and confused here: Shouldn't this be:

mv $(notmuch config get database.path)/.notmuch{,.bak}

Otherwise I would move away all my emails, not just notmuch's database.

> notmuch new
> notmuch restore dump.txt

-- 
Florian Friesdorf <flo@chaoflow.net>
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: flo@chaoflow.net
IRC: chaoflow on freenode,ircnet,blafasel,OFTC

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Strange match to my query
  2011-04-14  7:32           ` Florian Friesdorf
@ 2011-04-19  6:21             ` Jameson Graef Rollins
  0 siblings, 0 replies; 13+ messages in thread
From: Jameson Graef Rollins @ 2011-04-19  6:21 UTC (permalink / raw)
  To: Florian Friesdorf, Mark Anderson, notmuch@notmuchmail.org

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Thu, 14 Apr 2011 09:32:30 +0200, Florian Friesdorf <flo@chaoflow.net> wrote:
> On Tue, 01 Mar 2011 15:15:22 -0800, Jameson Rollins <jrollins@finestructure.net> wrote:
> > > A simple rebuild when you go to bed can look like:
> > 
> > I think you're missing an important step:
> > 
> > notmuch dump >dump.txt
> > mv $(notmuch config get database.path){,.bak}
> 
> Catching up and confused here: Shouldn't this be:
> 
> mv $(notmuch config get database.path)/.notmuch{,.bak}

Yes, you're right.  Nice correction correction!

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-04-19  6:22 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-25 23:29 Strange match to my query Mark Anderson
2011-01-26  0:51 ` Austin Clements
2011-01-26  2:19   ` Carl Worth
2011-01-26  5:59     ` Carl Worth
2011-01-26 17:02       ` Mark Anderson
2011-02-25 18:19 ` Jameson Rollins
2011-02-25 20:57   ` Mark Anderson
2011-02-25 21:29     ` Jameson Rollins
2011-03-01 23:00       ` Mark Anderson
2011-03-01 23:15         ` Jameson Rollins
2011-03-01 23:40           ` Mark Anderson
2011-04-14  7:32           ` Florian Friesdorf
2011-04-19  6:21             ` Jameson Graef Rollins

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).