* [PATCH 1/5] test: Fix from/to search test queries
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
@ 2014-06-16 2:40 ` Austin Clements
2014-06-16 2:40 ` [PATCH 2/5] test: Add search tests for combined name/address queries Austin Clements
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2014-06-16 2:40 UTC (permalink / raw)
To: notmuch
Two of the search tests for "from" and "to" queries were clearly
trying to search for prefixed phrases, but forgot to shell quote the
phrases. Fix this by quoting them correctly.
---
test/T080-search.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/test/T080-search.sh b/test/T080-search.sh
index a7a0b18..4f0e16c 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -59,7 +59,7 @@ test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] searchbyfrom@example.
test_begin_subtest "Search by from: (name)"
add_message '[subject]="search by from (name)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[from]="Search By From Name <test@example.com>"'
-output=$(notmuch search from:"Search By From Name" | notmuch_search_sanitize)
+output=$(notmuch search 'from:"Search By From Name"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
test_begin_subtest "Search by to: (address)"
@@ -69,7 +69,7 @@ test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; s
test_begin_subtest "Search by to: (name)"
add_message '[subject]="search by to (name)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' '[to]="Search By To Name <test@example.com>"'
-output=$(notmuch search to:"Search By To Name" | notmuch_search_sanitize)
+output=$(notmuch search 'to:"Search By To Name"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
test_begin_subtest "Search by subject: (phrase)"
--
2.0.0.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/5] test: Add search tests for combined name/address queries
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
2014-06-16 2:40 ` [PATCH 1/5] test: Fix from/to search test queries Austin Clements
@ 2014-06-16 2:40 ` Austin Clements
2014-06-16 2:40 ` [PATCH 3/5] lib: Index name and address of from/to headers as a phrase Austin Clements
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2014-06-16 2:40 UTC (permalink / raw)
To: notmuch
Two of these are currently known-broken. We index the name and
address parts in two separate calls to _notmuch_message_gen_terms.
Currently this has the effect of placing the term positions of the
prefixed terms from the second call right after those of the first
call, but screws up the term positions of the non-prefixed terms.
---
test/T080-search.sh | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/test/T080-search.sh b/test/T080-search.sh
index 4f0e16c..8ed5701 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -62,6 +62,15 @@ add_message '[subject]="search by from (name)"' '[date]="Sat, 01 Jan 2000 12:00:
output=$(notmuch search 'from:"Search By From Name"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
+test_begin_subtest "Search by from: (name and address)"
+output=$(notmuch search 'from:"Search By From Name <test@example.com>"' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
+
+test_begin_subtest "Search by from: without prefix (name and address)"
+test_subtest_known_broken
+output=$(notmuch search '"Search By From Name <test@example.com>"' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
+
test_begin_subtest "Search by to: (address)"
add_message '[subject]="search by to (address)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' [to]=searchbyto@example.com
output=$(notmuch search to:searchbyto@example.com | notmuch_search_sanitize)
@@ -72,6 +81,15 @@ add_message '[subject]="search by to (name)"' '[date]="Sat, 01 Jan 2000 12:00:00
output=$(notmuch search 'to:"Search By To Name"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
+test_begin_subtest "Search by to: (name and adress)"
+output=$(notmuch search 'to:"Search By To Name <test@example.com>"' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
+
+test_begin_subtest "Search by to: without prefix (name and adress)"
+test_subtest_known_broken
+output=$(notmuch search '"Search By To Name <test@example.com>"' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
+
test_begin_subtest "Search by subject: (phrase)"
add_message '[subject]="subject search test (phrase)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"'
add_message '[subject]="this phrase should not match the subject search test"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"'
--
2.0.0.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/5] lib: Index name and address of from/to headers as a phrase
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
2014-06-16 2:40 ` [PATCH 1/5] test: Fix from/to search test queries Austin Clements
2014-06-16 2:40 ` [PATCH 2/5] test: Add search tests for combined name/address queries Austin Clements
@ 2014-06-16 2:40 ` Austin Clements
2014-06-16 2:40 ` [PATCH 4/5] test: Known-broken test for overlapping/adjacent termpos Austin Clements
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2014-06-16 2:40 UTC (permalink / raw)
To: notmuch
Previously, we indexed the name and address parts of from/to headers
with two calls to _notmuch_message_gen_terms. In general, this
indicates that these parts are separate phrases. However, because of
an implementation quirk, the two calls to _notmuch_message_gen_terms
generated adjacent term positions for the prefixed terms, which
happens to be the right thing to do in this case, but the wrong thing
to do for all other calls. Furthermore, _notmuch_message_gen_terms
produced potentially overlapping term positions for the un-prefixed
copies of the terms, which is simply wrong.
This change indexes both the name and address in a single call to
_notmuch_message_gen_terms, indicating that they should be part of a
single phrase. This masks the problem with the un-prefixed terms
(fixing the two known-broken tests) and puts us in a position to fix
the unintentionally phrases generated by other calls to
_notmuch_message_gen_terms.
---
lib/index.cc | 24 ++++++++++--------------
test/T080-search.sh | 2 --
2 files changed, 10 insertions(+), 16 deletions(-)
diff --git a/lib/index.cc b/lib/index.cc
index e1e2a38..1a2e63d 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -231,26 +231,22 @@ _index_address_mailbox (notmuch_message_t *message,
InternetAddress *address)
{
InternetAddressMailbox *mailbox = INTERNET_ADDRESS_MAILBOX (address);
- const char *name, *addr;
+ const char *name, *addr, *combined;
void *local = talloc_new (message);
name = internet_address_get_name (address);
addr = internet_address_mailbox_get_addr (mailbox);
- /* In the absence of a name, we'll strip the part before the @
- * from the address. */
- if (! name) {
- const char *at;
+ /* Combine the name and address and index them as a phrase. */
+ if (name && addr)
+ combined = talloc_asprintf (local, "%s %s", name, addr);
+ else if (name)
+ combined = name;
+ else
+ combined = addr;
- at = strchr (addr, '@');
- if (at)
- name = talloc_strndup (local, addr, at - addr);
- }
-
- if (name)
- _notmuch_message_gen_terms (message, prefix_name, name);
- if (addr)
- _notmuch_message_gen_terms (message, prefix_name, addr);
+ if (combined)
+ _notmuch_message_gen_terms (message, prefix_name, combined);
talloc_free (local);
}
diff --git a/test/T080-search.sh b/test/T080-search.sh
index 8ed5701..b63bf02 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -67,7 +67,6 @@ output=$(notmuch search 'from:"Search By From Name <test@example.com>"' | notmuc
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
test_begin_subtest "Search by from: without prefix (name and address)"
-test_subtest_known_broken
output=$(notmuch search '"Search By From Name <test@example.com>"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
@@ -86,7 +85,6 @@ output=$(notmuch search 'to:"Search By To Name <test@example.com>"' | notmuch_se
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
test_begin_subtest "Search by to: without prefix (name and adress)"
-test_subtest_known_broken
output=$(notmuch search '"Search By To Name <test@example.com>"' | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; search by to (name) (inbox unread)"
--
2.0.0.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/5] test: Known-broken test for overlapping/adjacent termpos
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
` (2 preceding siblings ...)
2014-06-16 2:40 ` [PATCH 3/5] lib: Index name and address of from/to headers as a phrase Austin Clements
@ 2014-06-16 2:40 ` Austin Clements
2014-06-16 2:40 ` [PATCH 5/5] lib: Separate all phrases indexed by _notmuch_message_gen_terms Austin Clements
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Austin Clements @ 2014-06-16 2:40 UTC (permalink / raw)
To: notmuch
This adds two known-broken tests and one working test related to the
term positions assigned to terms from different headers or MIME parts.
The first test fails because we don't create a termpos gap between
different headers. The second test fails because we don't adjust
termpos at all when indexing multiple parts.
---
test/T080-search.sh | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/test/T080-search.sh b/test/T080-search.sh
index b63bf02..3f6b565 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -145,4 +145,44 @@ add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12
output=$(notmuch search "bödý" | notmuch_search_sanitize)
test_expect_equal "$output" "thread:XXX 2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)"
+
+cat <<EOF > ${MAIL_DIR}/termpos
+From: Source <source@example.com>
+To: Dest <dest@example.com>
+Subject: part overlap test
+Date: Sat, 01 January 2000 00:00:00 +0000
+Message-ID: <termpos>
+MIME-Version: 1.0
+Content-Type: multipart/mixed; boundary="==-=="
+
+--==-==
+Content-Type: text/plain
+
+a b c
+
+--==-==
+Content-Type: text/plain
+
+x y z
+
+--==-==--
+EOF
+notmuch new > /dev/null
+
+test_begin_subtest "headers do not have adjacent term positions"
+test_subtest_known_broken
+# Regression test for a bug where term positions for non-prefixed
+# terms weren't updated
+output=$(notmuch search id:termpos and '"com dest"')
+test_expect_equal "$output" ""
+
+test_begin_subtest "parts have non-overlapping term positions"
+test_subtest_known_broken
+output=$(notmuch search id:termpos and '"a y c"')
+test_expect_equal "$output" ""
+
+test_begin_subtest "parts do not have adjacent term positions"
+output=$(notmuch search id:termpos and '"c x"')
+test_expect_equal "$output" ""
+
test_done
--
2.0.0.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/5] lib: Separate all phrases indexed by _notmuch_message_gen_terms
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
` (3 preceding siblings ...)
2014-06-16 2:40 ` [PATCH 4/5] test: Known-broken test for overlapping/adjacent termpos Austin Clements
@ 2014-06-16 2:40 ` Austin Clements
2014-06-18 21:11 ` David Bremner
2014-06-21 19:20 ` [PATCH 0/5] Fix adjacent and overlapping termpos Tomi Ollila
2014-06-22 10:02 ` David Bremner
6 siblings, 1 reply; 9+ messages in thread
From: Austin Clements @ 2014-06-16 2:40 UTC (permalink / raw)
To: notmuch
This adds a 100 termpos gap between all phrases indexed by
_notmuch_message_gen_terms. This fixes a bug where terms from the end
of one header and the beginning of another header could match together
in a single phrase and a separate bug where term positions of
un-prefixed terms overlapped.
This fix only affects newly indexed messages. Messages that are
already indexed won't benefit from this fix without re-indexing, but
the fix won't make things any worse for existing messages.
---
lib/message.cc | 9 +++++++--
test/T080-search.sh | 2 --
2 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/lib/message.cc b/lib/message.cc
index 9243b76..d0b7351 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1023,16 +1023,21 @@ _notmuch_message_gen_terms (notmuch_message_t *message,
return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
term_gen->set_document (message->doc);
- term_gen->set_termpos (message->termpos);
if (prefix_name) {
const char *prefix = _find_prefix (prefix_name);
+ term_gen->set_termpos (message->termpos);
term_gen->index_text (text, 1, prefix);
- message->termpos = term_gen->get_termpos ();
+ /* Create a gap between this an the next terms so they don't
+ * appear to be a phrase. */
+ message->termpos = term_gen->get_termpos () + 100;
}
+ term_gen->set_termpos (message->termpos);
term_gen->index_text (text);
+ /* Create a term gap, as above. */
+ message->termpos = term_gen->get_termpos () + 100;
return NOTMUCH_PRIVATE_STATUS_SUCCESS;
}
diff --git a/test/T080-search.sh b/test/T080-search.sh
index 3f6b565..05027fb 100755
--- a/test/T080-search.sh
+++ b/test/T080-search.sh
@@ -170,14 +170,12 @@ EOF
notmuch new > /dev/null
test_begin_subtest "headers do not have adjacent term positions"
-test_subtest_known_broken
# Regression test for a bug where term positions for non-prefixed
# terms weren't updated
output=$(notmuch search id:termpos and '"com dest"')
test_expect_equal "$output" ""
test_begin_subtest "parts have non-overlapping term positions"
-test_subtest_known_broken
output=$(notmuch search id:termpos and '"a y c"')
test_expect_equal "$output" ""
--
2.0.0.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 5/5] lib: Separate all phrases indexed by _notmuch_message_gen_terms
2014-06-16 2:40 ` [PATCH 5/5] lib: Separate all phrases indexed by _notmuch_message_gen_terms Austin Clements
@ 2014-06-18 21:11 ` David Bremner
0 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2014-06-18 21:11 UTC (permalink / raw)
To: Austin Clements, notmuch
Austin Clements <amdragon@MIT.EDU> writes:
> This adds a 100 termpos gap between all phrases indexed by
> _notmuch_message_gen_terms. This fixes a bug where terms from the end
> of one header and the beginning of another header could match together
> in a single phrase and a separate bug where term positions of
> un-prefixed terms overlapped.
>
> This fix only affects newly indexed messages. Messages that are
> already indexed won't benefit from this fix without re-indexing, but
> the fix won't make things any worse for existing messages.
The series looks OK to me. It took me a little while to understand the
problem with multiple parts was that the term positions currently start
from zero for each part. If you happen to be re-rolling the series for
some other reason, maybe you could be more explicit about that. I
wouldn't bother just for that though.
d
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] Fix adjacent and overlapping termpos
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
` (4 preceding siblings ...)
2014-06-16 2:40 ` [PATCH 5/5] lib: Separate all phrases indexed by _notmuch_message_gen_terms Austin Clements
@ 2014-06-21 19:20 ` Tomi Ollila
2014-06-22 10:02 ` David Bremner
6 siblings, 0 replies; 9+ messages in thread
From: Tomi Ollila @ 2014-06-21 19:20 UTC (permalink / raw)
To: Austin Clements, notmuch
On Mon, Jun 16 2014, Austin Clements <amdragon@MIT.EDU> wrote:
> This series fixes several related bugs that caused indexing to
> generate adjacent and overlapping term positions, which caused
> messages to incorrectly match phrase queries. This obsoletes (and
> fixes) the known-broken tests from
> id:1400359552-10928-1-git-send-email-amdragon@mit.edu.
The code looks good to me and tests pass. +1
Tomi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/5] Fix adjacent and overlapping termpos
2014-06-16 2:40 [PATCH 0/5] Fix adjacent and overlapping termpos Austin Clements
` (5 preceding siblings ...)
2014-06-21 19:20 ` [PATCH 0/5] Fix adjacent and overlapping termpos Tomi Ollila
@ 2014-06-22 10:02 ` David Bremner
6 siblings, 0 replies; 9+ messages in thread
From: David Bremner @ 2014-06-22 10:02 UTC (permalink / raw)
To: Austin Clements, notmuch
Austin Clements <amdragon@MIT.EDU> writes:
> This series fixes several related bugs that caused indexing to
> generate adjacent and overlapping term positions, which caused
> messages to incorrectly match phrase queries. This obsoletes (and
> fixes) the known-broken tests from
> id:1400359552-10928-1-git-send-email-amdragon@mit.edu.
>
pushed to release and master. Please have a quick look at
id:1403381426-19813-1-git-send-email-david@tethera.net
to check the corresponding NEWS
d
^ permalink raw reply [flat|nested] 9+ messages in thread