unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: [PATCH 15/15] test: add known broken test for multiple thread terms per message
Date: Tue, 31 Jul 2018 06:45:55 +0800	[thread overview]
Message-ID: <20180730224555.26047-16-david@tethera.net> (raw)
In-Reply-To: <20180730224555.26047-1-david@tethera.net>

Having multiple thread terms on a message document seems to be the
underlying cause of some confusing results from notmuch search.

The presence of these multiple thread terms is presumably an indexing
bug, related to multiple files with the same message-id.

The files here are synthesized from a reproducer for the problems in
id:1523007700.l8xm6nm6af.naveen@linux.ibm.com. It isn't quite clear
this is the same issue (the symptoms using notmuch-search are a bit
different).
---
 test/.gitignore                         |  1 +
 test/Makefile.local                     |  7 ++++++-
 test/T720-database-schema.sh            | 16 ++++++++++++++++
 test/corpora/threading/mutant-ref/file1 |  9 +++++++++
 test/corpora/threading/mutant-ref/file2 |  9 +++++++++
 test/corpora/threading/mutant-ref/file3 |  9 +++++++++
 test/corpora/threading/mutant-ref/file4 |  7 +++++++
 test/corpora/threading/mutant-ref/file5 |  7 +++++++
 test/term-report.cc                     | 22 ++++++++++++++++++++++
 9 files changed, 86 insertions(+), 1 deletion(-)
 create mode 100755 test/T720-database-schema.sh
 create mode 100644 test/corpora/threading/mutant-ref/file1
 create mode 100644 test/corpora/threading/mutant-ref/file2
 create mode 100644 test/corpora/threading/mutant-ref/file3
 create mode 100644 test/corpora/threading/mutant-ref/file4
 create mode 100644 test/corpora/threading/mutant-ref/file5
 create mode 100644 test/term-report.cc

diff --git a/test/.gitignore b/test/.gitignore
index 73fe7e24..71bbd7ed 100644
--- a/test/.gitignore
+++ b/test/.gitignore
@@ -8,4 +8,5 @@
 /make-db-version
 /test-results
 /ghost-report
+/term-report
 /tmp.*
diff --git a/test/Makefile.local b/test/Makefile.local
index 1cf09778..c39feace 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -44,6 +44,9 @@ $(dir)/make-db-version: $(dir)/make-db-version.o
 $(dir)/ghost-report: $(dir)/ghost-report.o
 	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/term-report: $(dir)/term-report.o
+	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -54,7 +57,9 @@ test_main_srcs=$(dir)/arg-test.c \
 	      $(dir)/symbol-test.cc \
 	      $(dir)/make-db-version.cc \
 	      $(dir)/ghost-report.cc \
-	      $(dir)/message-id-parse.c
+	      $(dir)/message-id-parse.c \
+	      $(dir)/term-report.cc
+
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/T720-database-schema.sh b/test/T720-database-schema.sh
new file mode 100755
index 00000000..a6a99239
--- /dev/null
+++ b/test/T720-database-schema.sh
@@ -0,0 +1,16 @@
+#!/usr/bin/env bash
+test_description="database schema in lib/database.cc"
+
+. $(dirname "$0")/test-lib.sh || exit 1
+
+add_email_corpus threading
+
+test_begin_subtest "every document has at most one thread term"
+test_subtest_known_broken
+${TEST_DIRECTORY}/term-report ${MAIL_DIR}/.notmuch/xapian | perl -ane 'pop(@F); printf "%d\n",scalar(grep { m/^G/ } @F);' | sort -u > OUTPUT
+cat <<EOF >> EXPECTED
+0
+1
+EOF
+test_expect_equal_file EXPECTED OUTPUT
+test_done
diff --git a/test/corpora/threading/mutant-ref/file1 b/test/corpora/threading/mutant-ref/file1
new file mode 100644
index 00000000..97f8db58
--- /dev/null
+++ b/test/corpora/threading/mutant-ref/file1
@@ -0,0 +1,9 @@
+From: Alice <alice@example.org>
+To: Daniel <daniel@example.org>
+Subject: leaf message
+In-Reply-To: <mutant-ref-parent1@example.org>
+References: <mutant-ref-parent1@example.org>
+Message-ID: <mutant-ref-leaf@example.org>
+Date: Thu, 16 Jun 2016 22:14:41 -0400
+
+body
diff --git a/test/corpora/threading/mutant-ref/file2 b/test/corpora/threading/mutant-ref/file2
new file mode 100644
index 00000000..2b2ccd1d
--- /dev/null
+++ b/test/corpora/threading/mutant-ref/file2
@@ -0,0 +1,9 @@
+From: Alice <alice@example.org>
+To: Daniel <daniel@example.org>
+Subject: leaf message
+In-Reply-To: <mutant-ref-parent2@example.org>
+References: <mutant-ref-parent2@example.org>
+Message-ID: <mutant-ref-leaf@example.org>
+Date: Thu, 16 Jun 2016 22:14:41 -0400
+
+body
diff --git a/test/corpora/threading/mutant-ref/file3 b/test/corpora/threading/mutant-ref/file3
new file mode 100644
index 00000000..a8e705bc
--- /dev/null
+++ b/test/corpora/threading/mutant-ref/file3
@@ -0,0 +1,9 @@
+From: Alice <alice@example.org>
+To: Daniel <daniel@example.org>
+Subject: leaf message
+In-Reply-To: <mutant-ref-parent3@example.org>
+References: <mutant-ref-parent3@example.org>
+Message-ID: <mutant-ref-leaf@example.org>
+Date: Thu, 16 Jun 2016 22:14:41 -0400
+
+body
diff --git a/test/corpora/threading/mutant-ref/file4 b/test/corpora/threading/mutant-ref/file4
new file mode 100644
index 00000000..3a0a5a13
--- /dev/null
+++ b/test/corpora/threading/mutant-ref/file4
@@ -0,0 +1,7 @@
+From: Daniel <daniel@example.org>
+To: Alice <alice@example.org>
+Subject: existing parent
+Message-ID: <mutant-ref-parent2@example.org>
+Date: Fri, 17 Jun 2016 22:14:41 -0400
+
+body
diff --git a/test/corpora/threading/mutant-ref/file5 b/test/corpora/threading/mutant-ref/file5
new file mode 100644
index 00000000..8f525d63
--- /dev/null
+++ b/test/corpora/threading/mutant-ref/file5
@@ -0,0 +1,7 @@
+From: Daniel <daniel@example.org>
+To: Alice <alice@example.org>
+Subject: existing parent
+Message-ID: <mutant-ref-parent3@example.org>
+Date: Fri, 17 Jun 2016 22:14:41 -0400
+
+body
diff --git a/test/term-report.cc b/test/term-report.cc
new file mode 100644
index 00000000..88cd1bf5
--- /dev/null
+++ b/test/term-report.cc
@@ -0,0 +1,22 @@
+#include <iostream>
+#include <cstdlib>
+#include <xapian.h>
+
+int main(int argc, char **argv) {
+
+    if (argc < 2) {
+	std::cerr << "usage: term-report xapian-dir" << std::endl;
+	exit(1);
+    }
+
+    Xapian::Database db(argv[1]);
+    for (Xapian::docid id(1); id < db.get_lastdocid(); id++) {
+	std::cout << id;
+	for (Xapian::TermIterator iter = db.termlist_begin(id);
+	     iter != db.termlist_end(id);
+	     iter++) {
+	    std::cout << " " << *iter;
+	}
+	std::cout << std::endl;
+    }
+}
-- 
2.18.0

  parent reply	other threads:[~2018-07-30 22:46 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-30 22:45 Threading patches v2 David Bremner
2018-07-30 22:45 ` [PATCH 01/15] util: add DEBUG_PRINTF, rename error_util.h -> debug_print.h David Bremner
2018-07-30 22:45 ` [PATCH 02/15] test: start threading test corpus David Bremner
2018-07-30 22:45 ` [PATCH 03/15] test: add known broken tests for "ghost roots" David Bremner
2018-07-30 22:45 ` [PATCH 04/15] lib/thread: sort child messages by date David Bremner
2018-07-30 22:45 ` [PATCH 05/15] lib: read reference terms into message struct David Bremner
2018-07-30 22:45 ` [PATCH 06/15] lib/thread: refactor in-reply-to test David Bremner
2018-07-30 22:45 ` [PATCH 07/15] lib: calculate message depth in thread David Bremner
2018-07-30 22:45 ` [PATCH 08/15] lib/thread: rewrite _parent_or_toplevel to use depths David Bremner
2018-07-30 22:45 ` [PATCH 09/15] lib/thread: change _resolve_thread_relationships " David Bremner
2018-07-30 22:45 ` [PATCH 10/15] test: add known broken test for good In-Reply-To / bad References David Bremner
2018-07-30 22:45 ` [PATCH 11/15] test/thread-replies: mangle In-Reply-To's David Bremner
2018-07-30 22:45 ` [PATCH 12/15] util/string-util: export skip_space David Bremner
2018-07-30 22:45 ` [PATCH 13/15] lib: add _notmuch_message_id_parse_strict David Bremner
2018-08-01  0:46   ` Amin Bandali
2018-08-01  4:58   ` [Patch v1.1] " David Bremner
2018-07-30 22:45 ` [PATCH 14/15] lib: change parent strategy to use In-Reply-To if it looks sane David Bremner
2018-07-30 22:45 ` David Bremner [this message]
2018-08-01 14:53 ` Threading patches v2 Gregor Zattler
2018-08-27  1:53   ` [PATCH] WIP: sort top level messages in thread David Bremner
2018-08-27 13:44     ` Gregor Zattler
2018-08-28 21:33       ` Amin Bandali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180730224555.26047-16-david@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).