unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] test thread breakage when messages are removed and re-added
@ 2016-03-31 17:34 Daniel Kahn Gillmor
  2016-04-01 22:27 ` Daniel Kahn Gillmor
                   ` (4 more replies)
  0 siblings, 5 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-03-31 17:34 UTC (permalink / raw)
  To: Notmuch Mail

This test (T590-thread-breakage.sh) currently fails!

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

I think this happens because if you insert "B" initially (before
anything is known about "A"), then a "ghost message" gets added to the
database in reference to "A" that is in the same thread, which "A"
takes over when it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I don't know how to easily fix this, but i see a few options:

ghost-on-removal
----------------

We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
------------------------------------------

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
------------------

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-----------------

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.
---
 test/T590-thread-breakage.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 0000000..704f504
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage by reindexing (currently broken)'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/a <<EOF
+Subject: First message
+Message-ID: <a@example.net>
+From: Alice <alice@example.net>
+To: Bob <bob@example.net>
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+EOF
+}
+
+message_b() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/b <<EOF
+Subject: Second message
+Message-ID: <b@example.net>
+In-Reply-To: <a@example.net>
+References: <a@example.net>
+From: Bob <bob@example.net>
+To: Alice <alice@example.net>
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+EOF
+}
+
+
+test_thread_count() {
+    notmuch new >/dev/null
+    test_begin_subtest "${2:-Expecting $1 thread(s)}"
+    count=$(notmuch count --output=threads)
+    test_expect_equal "$count" "$1"
+}
+
+test_thread_count 0 'There should be no threads initially'
+
+message_a
+test_thread_count 1 'One message in: one thread'
+
+message_b
+test_thread_count 1 'Second message in the same thread: one thread'
+
+rm -f ${MAIL_DIR}/cur/a
+test_thread_count 1 'First message removed: still only one thread'
+
+message_a
+# this is known to fail (it shows 2 threads) because no "ghost
+# message" was created for message A when it was removed from the
+# index, despite message B still pointing to it.
+test_thread_count 1 'First message reappears: should return to the same thread'
+
+test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] test thread breakage when messages are removed and re-added
  2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-01 22:27 ` Daniel Kahn Gillmor
  2016-04-01 23:31 ` [PATCH 1/2] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-01 22:27 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

On Thu 2016-03-31 14:34:53 -0300, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> ghost-on-removal
> ----------------
>
> We could unilaterally add a ghost upon message removal.  This has a
> few disadvantages: the message index would leak information about what
> messages the user has ever been exposed to, and we also create a
> perpetually-growing dataset -- the ghosts can never be removed.
>
> ghost-on-removal-when-shared-thread-exists
> ------------------------------------------
>
> We could add a ghost upon message removal iff there are other
> non-ghost messages with the same thread ID.
>
> We'd also need to remove all ghost messages that share a thread when
> the last non-ghost message in that thread is removed.
>
> This still has a bit of information leakage, though: the message index
> would reveal that i've seen a newer message in a thread, even if i had
> deleted it from my message store

One more proposal along these lines:

track-non-ghost-count-per-thread
--------------------------------

If each thread had a count of all the non-ghost messages associated with
it, and that count was properly maintained by the database, then upon
message deletion you would decrement the count.  when that count reached
zero, you could destroy the thread.

This has the same metadata leakage as
ghost-on-removal-when-shared-thread-exists, but i think it might be more
efficient, if we can cope with the denormalized database.

This does have the downside of needing a database transition, though:
we'd have to add this count to all threads in a database upgrade.



What do people think of the different options?  what do you prefer?  is
there some better approach that i've missed?

     --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 1/2] verify during thread-breakage that messages are removed as well
  2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-01 22:27 ` Daniel Kahn Gillmor
@ 2016-04-01 23:31 ` Daniel Kahn Gillmor
  2016-04-01 23:31   ` [PATCH 2/2] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-01 23:31 UTC (permalink / raw)
  To: Notmuch Mail

One risk of fixes to the thread-breakage problem is that we could fail
to remove the search term indexes entirely.  These additional subtests
should guard against that.
---
 test/T590-thread-breakage.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 704f504..f3e725c 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -17,6 +17,7 @@ To: Bob <bob@example.net>
 Date: Thu, 31 Mar 2016 20:10:00 -0400
 
 This is the first message in the thread.
+Apple
 EOF
 }
 
@@ -32,10 +33,18 @@ To: Alice <alice@example.net>
 Date: Thu, 31 Mar 2016 20:15:00 -0400
 
 This is the second message in the thread.
+Banana
 EOF
 }
 
 
+test_subject_count() {
+    notmuch new >/dev/null
+    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+    count=$(notmuch count --output=threads "$1")
+    test_expect_equal "$count" "$2"
+}
+
 test_thread_count() {
     notmuch new >/dev/null
     test_begin_subtest "${2:-Expecting $1 thread(s)}"
@@ -47,17 +56,25 @@ test_thread_count 0 'There should be no threads initially'
 
 message_a
 test_thread_count 1 'One message in: one thread'
+test_subject_count apple 1
+test_subject_count banana 0
 
 message_b
 test_thread_count 1 'Second message in the same thread: one thread'
+test_subject_count apple 1
+test_subject_count banana 1
 
 rm -f ${MAIL_DIR}/cur/a
 test_thread_count 1 'First message removed: still only one thread'
+test_subject_count apple 0
+test_subject_count banana 1
 
 message_a
 # this is known to fail (it shows 2 threads) because no "ghost
 # message" was created for message A when it was removed from the
 # index, despite message B still pointing to it.
 test_thread_count 1 'First message reappears: should return to the same thread'
+test_subject_count apple 1
+test_subject_count banana 1
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2] fix thread breakage via ghost-on-removal
  2016-04-01 23:31 ` [PATCH 1/2] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
@ 2016-04-01 23:31   ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-01 23:31 UTC (permalink / raw)
  To: Notmuch Mail

This is the solution to T590-thread-breakage.sh that just adds a ghost
message after removing each message.

It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.

Note that _resolve_message_id_to_thread_id also introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
---
 lib/database.cc | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 3b342f1..6c9e34a 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2557,9 +2557,33 @@ notmuch_database_remove_message (notmuch_database_t *notmuch,
 
     if (status == NOTMUCH_STATUS_SUCCESS && message) {
 	    status = _notmuch_message_remove_filename (message, filename);
-	    if (status == NOTMUCH_STATUS_SUCCESS)
+	    if (status == NOTMUCH_STATUS_SUCCESS) {
+		const char *mid, *tid;
+		notmuch_message_t *ghost;
+		notmuch_private_status_t private_status;
+	    
+		mid = notmuch_message_get_message_id (message);
+		tid = notmuch_message_get_thread_id (message);
+		/* remove the message */
 		_notmuch_message_delete (message);
-	    else if (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
+	    
+		/* and reintroduce a ghost in its place */
+		ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+		if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+		    /* this is deeply weird, and we should not have
+		       gotten into this state.  is there a better
+		       error message to return here? */
+		    status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+		} else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+		    private_status = _notmuch_message_initialize_ghost (ghost, tid);
+		    if (! private_status) {
+			_notmuch_message_sync (ghost);
+			notmuch_message_destroy (ghost);
+		    }
+		}
+	    
+		status = COERCE_STATUS (private_status, "Error converting to ghost message");
+	    } else if (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
 		_notmuch_message_sync (message);
 
 	    notmuch_message_destroy (message);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 1/7] test thread breakage when messages are removed and re-added
  2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-01 22:27 ` Daniel Kahn Gillmor
  2016-04-01 23:31 ` [PATCH 1/2] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
@ 2016-04-02 14:15 ` Daniel Kahn Gillmor
  2016-04-02 14:15   ` [PATCH v2 2/7] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
                     ` (6 more replies)
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
  4 siblings, 7 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

This test (T590-thread-breakage.sh) currently fails!

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

I think this happens because if you insert "B" initially (before
anything is known about "A"), then a "ghost message" gets added to the
database in reference to "A" that is in the same thread, which "A"
takes over when it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I don't know how to easily fix this, but i see a few options:

ghost-on-removal
----------------

We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
------------------------------------------

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
------------------

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-----------------

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.
---
 test/T590-thread-breakage.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 0000000..704f504
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,63 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage by reindexing (currently broken)'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/a <<EOF
+Subject: First message
+Message-ID: <a@example.net>
+From: Alice <alice@example.net>
+To: Bob <bob@example.net>
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+EOF
+}
+
+message_b() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/b <<EOF
+Subject: Second message
+Message-ID: <b@example.net>
+In-Reply-To: <a@example.net>
+References: <a@example.net>
+From: Bob <bob@example.net>
+To: Alice <alice@example.net>
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+EOF
+}
+
+
+test_thread_count() {
+    notmuch new >/dev/null
+    test_begin_subtest "${2:-Expecting $1 thread(s)}"
+    count=$(notmuch count --output=threads)
+    test_expect_equal "$count" "$1"
+}
+
+test_thread_count 0 'There should be no threads initially'
+
+message_a
+test_thread_count 1 'One message in: one thread'
+
+message_b
+test_thread_count 1 'Second message in the same thread: one thread'
+
+rm -f ${MAIL_DIR}/cur/a
+test_thread_count 1 'First message removed: still only one thread'
+
+message_a
+# this is known to fail (it shows 2 threads) because no "ghost
+# message" was created for message A when it was removed from the
+# index, despite message B still pointing to it.
+test_thread_count 1 'First message reappears: should return to the same thread'
+
+test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 2/7] verify during thread-breakage that messages are removed as well
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-06  1:20     ` David Bremner
  2016-04-02 14:15   ` [PATCH v2 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

One risk of fixes to the thread-breakage problem is that we could fail
to remove the search term indexes entirely.  These additional subtests
should guard against that.
---
 test/T590-thread-breakage.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 704f504..f3e725c 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -17,6 +17,7 @@ To: Bob <bob@example.net>
 Date: Thu, 31 Mar 2016 20:10:00 -0400
 
 This is the first message in the thread.
+Apple
 EOF
 }
 
@@ -32,10 +33,18 @@ To: Alice <alice@example.net>
 Date: Thu, 31 Mar 2016 20:15:00 -0400
 
 This is the second message in the thread.
+Banana
 EOF
 }
 
 
+test_subject_count() {
+    notmuch new >/dev/null
+    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+    count=$(notmuch count --output=threads "$1")
+    test_expect_equal "$count" "$2"
+}
+
 test_thread_count() {
     notmuch new >/dev/null
     test_begin_subtest "${2:-Expecting $1 thread(s)}"
@@ -47,17 +56,25 @@ test_thread_count 0 'There should be no threads initially'
 
 message_a
 test_thread_count 1 'One message in: one thread'
+test_subject_count apple 1
+test_subject_count banana 0
 
 message_b
 test_thread_count 1 'Second message in the same thread: one thread'
+test_subject_count apple 1
+test_subject_count banana 1
 
 rm -f ${MAIL_DIR}/cur/a
 test_thread_count 1 'First message removed: still only one thread'
+test_subject_count apple 0
+test_subject_count banana 1
 
 message_a
 # this is known to fail (it shows 2 threads) because no "ghost
 # message" was created for message A when it was removed from the
 # index, despite message B still pointing to it.
 test_thread_count 1 'First message reappears: should return to the same thread'
+test_subject_count apple 1
+test_subject_count banana 1
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 3/7] fix thread breakage via ghost-on-removal
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-02 14:15   ` [PATCH v2 2/7] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-05  6:53     ` Tomi Ollila
  2016-04-06  1:39     ` David Bremner
  2016-04-02 14:15   ` [PATCH v2 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

ghost-on-removal the solution to T590-thread-breakage.sh that just
adds a ghost message after removing each message.

It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.

Note that _resolve_message_id_to_thread_id also introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.

This differs from v1 of this changeset in that we implement the change
in _notmuch_message_delete, a more "internal" function.
---
 lib/database.cc |  2 +-
 lib/message.cc  | 29 ++++++++++++++++++++++++++---
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 3b342f1..d5733c9 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2557,7 +2557,7 @@ notmuch_database_remove_message (notmuch_database_t *notmuch,
 
     if (status == NOTMUCH_STATUS_SUCCESS && message) {
 	    status = _notmuch_message_remove_filename (message, filename);
-	    if (status == NOTMUCH_STATUS_SUCCESS)
+	    if (status == NOTMUCH_STATUS_SUCCESS) 
 		_notmuch_message_delete (message);
 	    else if (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
 		_notmuch_message_sync (message);
diff --git a/lib/message.cc b/lib/message.cc
index 8d72ea2..e414e9c 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1037,20 +1037,43 @@ _notmuch_message_sync (notmuch_message_t *message)
     message->modified = FALSE;
 }
 
-/* Delete a message document from the database. */
+/* Delete a message document from the database, leaving a ghost
+ * message in its place */
 notmuch_status_t
 _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
+    const char *mid, *tid;
+    notmuch_message_t *ghost;
+    notmuch_private_status_t private_status;
+    notmuch_database_t *notmuch;
+	    
+    mid = notmuch_message_get_message_id (message);
+    tid = notmuch_message_get_thread_id (message);
+    notmuch = message->notmuch;
 
     status = _notmuch_database_ensure_writable (message->notmuch);
     if (status)
 	return status;
 
-    db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
+    db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-    return NOTMUCH_STATUS_SUCCESS;
+	    
+    /* and reintroduce a ghost in its place */
+    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+    if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	/* this is deeply weird, and we should not have gotten into
+	   this state.  is there a better error message to return
+	   here? */
+	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+    } else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	if (! private_status)
+	    _notmuch_message_sync (ghost);
+    }
+    notmuch_message_destroy (ghost);
+    return COERCE_STATUS (private_status, "Error converting to ghost message");
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 4/7] Add internal functions to search for alternate doc types
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-02 14:15   ` [PATCH v2 2/7] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
  2016-04-02 14:15   ` [PATCH v2 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-06  1:52     ` David Bremner
  2016-04-02 14:15   ` [PATCH v2 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

Publicly we are only exposing the non-ghost documents (of "type"
"mail").  But internally we might want to inspect the ghost messages
as well.

This changeset adds two new private interfaces to queries to recover
information about alternate document types.
---
 lib/notmuch-private.h | 10 ++++++++++
 lib/query.cc          | 18 ++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 5dd4770..cbfc144 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -477,6 +477,16 @@ void
 _notmuch_doc_id_set_remove (notmuch_doc_id_set_t *doc_ids,
                             unsigned int doc_id);
 
+notmuch_status_t
+_notmuch_query_search_messages_type_st (notmuch_query_t *query,
+					const char *type,
+					notmuch_messages_t **out);
+
+notmuch_status_t
+_notmuch_query_count_messages_type_st (notmuch_query_t *query,
+				       const char *type,
+				       unsigned *count_out);
+
 /* message.cc */
 
 void
diff --git a/lib/query.cc b/lib/query.cc
index e627bfc..ba6ee49 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -187,6 +187,14 @@ notmuch_status_t
 notmuch_query_search_messages_st (notmuch_query_t *query,
 				  notmuch_messages_t **out)
 {
+    return _notmuch_query_search_messages_type_st (query, "mail", out);
+}
+
+notmuch_status_t
+_notmuch_query_search_messages_type_st (notmuch_query_t *query,
+					const char *type,
+					notmuch_messages_t **out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     notmuch_mset_messages_t *messages;
@@ -208,7 +216,7 @@ notmuch_query_search_messages_st (notmuch_query_t *query,
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	Xapian::MSetIterator iterator;
@@ -554,6 +562,12 @@ notmuch_query_count_messages (notmuch_query_t *query)
 notmuch_status_t
 notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 {
+    return _notmuch_query_count_messages_type_st (query, "mail", count_out);
+}
+
+notmuch_status_t
+_notmuch_query_count_messages_type_st (notmuch_query_t *query, const char *type, unsigned *count_out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     Xapian::doccount count = 0;
@@ -562,7 +576,7 @@ notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 5/7] Introduce _notmuch_message_has_term()
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (2 preceding siblings ...)
  2016-04-02 14:15   ` [PATCH v2 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-06  2:04     ` David Bremner
  2016-04-02 14:15   ` [PATCH v2 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

It can be useful to easily tell if a given message has a given term
associated with it.
---
 lib/message.cc        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/notmuch-private.h | 13 +++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index e414e9c..fab70fd 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1216,6 +1216,55 @@ _notmuch_message_remove_term (notmuch_message_t *message,
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+notmuch_bool_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value)
+{
+    notmuch_bool_t out;
+    notmuch_private_status_t st =
+	_notmuch_message_has_term_st (message, prefix_name, value, &out);
+    if (st)
+	return FALSE;
+    return out;
+}
+
+
+notmuch_private_status_t
+_notmuch_message_has_term_st (notmuch_message_t *message,
+			      const char *prefix_name,
+			      const char *value,
+			      notmuch_bool_t *result)
+{
+    char *term;
+    notmuch_bool_t out = FALSE;
+    notmuch_private_status_t status = NOTMUCH_PRIVATE_STATUS_SUCCESS;
+
+    if (value == NULL)
+	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
+
+    term = talloc_asprintf (message, "%s%s",
+			    _find_prefix (prefix_name), value);
+
+    if (strlen (term) > NOTMUCH_TERM_MAX)
+	return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
+
+    try {
+	/* Look for the exact term */
+	Xapian::TermIterator i = message->doc.termlist_begin ();
+	i.skip_to (term);
+	if (i != message->doc.termlist_end () &&
+	    !strcmp ((*i).c_str (), term))
+	    out = TRUE;
+    } catch (Xapian::Error &error) {
+	status = NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+    }
+    talloc_free (term);
+
+    *result = out;
+    return status;
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index cbfc144..00391a9 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -279,6 +279,19 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 			      const char *prefix_name,
 			      const char *value);
 
+/* _notmuch_message_has_term designed to be simple: if there is an
+ * error, it will return false */
+notmuch_bool_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value);
+
+notmuch_private_status_t
+_notmuch_message_has_term_st (notmuch_message_t *message,
+			      const char *prefix_name,
+			      const char *value,
+			      notmuch_bool_t *result);
+
 notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 6/7] On deletion, replace with ghost when other active messages in thread
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (3 preceding siblings ...)
  2016-04-02 14:15   ` [PATCH v2 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-02 14:15   ` [PATCH v2 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  2016-04-02 16:19   ` [PATCH 1/2] test thread breakage when messages are removed and re-added David Bremner
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.

Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
---
 lib/message.cc | 58 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index fab70fd..f715e39 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1044,11 +1044,14 @@ _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
-    const char *mid, *tid;
+    const char *mid, *tid, *query_string;
     notmuch_message_t *ghost;
     notmuch_private_status_t private_status;
     notmuch_database_t *notmuch;
-	    
+    notmuch_query_t *query;
+    unsigned int count = 0;
+    notmuch_bool_t is_ghost;
+
     mid = notmuch_message_get_message_id (message);
     tid = notmuch_message_get_thread_id (message);
     notmuch = message->notmuch;
@@ -1059,21 +1062,44 @@ _notmuch_message_delete (notmuch_message_t *message)
 
     db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-	    
-    /* and reintroduce a ghost in its place */
-    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
-    if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
-	/* this is deeply weird, and we should not have gotten into
-	   this state.  is there a better error message to return
-	   here? */
-	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
-    } else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
-	private_status = _notmuch_message_initialize_ghost (ghost, tid);
-	if (! private_status)
-	    _notmuch_message_sync (ghost);
+
+    /* if this was a ghost to begin with, we are done */
+    private_status = _notmuch_message_has_term_st (message, "type", "ghost", &is_ghost);
+    if (private_status)
+	return COERCE_STATUS (private_status,
+			      "Error trying to determine whether message was a ghost");
+    if (is_ghost)
+	return NOTMUCH_STATUS_SUCCESS;
+    
+    query_string = talloc_asprintf (message, "thread:%s", tid);
+    query = notmuch_query_create (notmuch, query_string);
+    if (query == NULL)
+	return NOTMUCH_STATUS_OUT_OF_MEMORY;
+    status = notmuch_query_count_messages_st (query, &count);
+    if (status) {
+	notmuch_query_destroy (query);
+	return status;
     }
-    notmuch_message_destroy (ghost);
-    return COERCE_STATUS (private_status, "Error converting to ghost message");
+
+    if (count > 0) {
+	/* reintroduce a ghost in its place because there are still
+	 * other active messages in this thread: */
+	ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+	if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	    /* this is deeply weird, and we should not have gotten
+	       into this state.  is there a better error message to
+	       return here? */
+	    status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+	} else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	    private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	    if (! private_status)
+		_notmuch_message_sync (ghost);
+	}
+	notmuch_message_destroy (ghost);
+	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    }
+    notmuch_query_destroy (query);
+    return status;
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (4 preceding siblings ...)
  2016-04-02 14:15   ` [PATCH v2 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
@ 2016-04-02 14:15   ` Daniel Kahn Gillmor
  2016-04-02 16:19   ` [PATCH 1/2] test thread breakage when messages are removed and re-added David Bremner
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-02 14:15 UTC (permalink / raw)
  To: Notmuch Mail

To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
---
 lib/message.cc | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index f715e39..01daa56 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1097,6 +1097,26 @@ _notmuch_message_delete (notmuch_message_t *message)
 	}
 	notmuch_message_destroy (ghost);
 	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    } else {
+	/* the thread is empty; drop all ghost messages from it */
+	notmuch_messages_t *messages;
+	status = _notmuch_query_search_messages_type_st (query,
+							 "ghost",
+							 &messages);
+	if (status == NOTMUCH_STATUS_SUCCESS) {
+	    notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS;
+	    while (notmuch_messages_valid (messages)) {
+		message = notmuch_messages_get (messages);
+		status = _notmuch_message_delete (message);
+		if (status) /* we'll report the last failure we see;
+			     * if there is more than one failure, we
+			     * forget about previous ones */
+		    last_error = status;
+		notmuch_message_destroy (message);
+		notmuch_messages_move_to_next (messages);
+	    }
+	    status = last_error;
+	}
     }
     notmuch_query_destroy (query);
     return status;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 1/2] test thread breakage when messages are removed and re-added
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (5 preceding siblings ...)
  2016-04-02 14:15   ` [PATCH v2 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
@ 2016-04-02 16:19   ` David Bremner
  2016-04-02 16:19     ` [PATCH 2/2] test: add test-binary to print the number of ghost messages David Bremner
  6 siblings, 1 reply; 46+ messages in thread
From: David Bremner @ 2016-04-02 16:19 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>

This test (T590-thread-breakage.sh) currently fails!

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

I think this happens because if you insert "B" initially (before
anything is known about "A"), then a "ghost message" gets added to the
database in reference to "A" that is in the same thread, which "A"
takes over when it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I don't know how to easily fix this, but i see a few options:

ghost-on-removal
----------------

We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
------------------------------------------

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
------------------

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-----------------

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.
---

Here I just changed the FAILing test to BROKEN. This reflects our
usual distinction between known bug and regression.

 test/T590-thread-breakage.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 0000000..8c8e63b
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage by reindexing (currently broken)'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/a <<EOF
+Subject: First message
+Message-ID: <a@example.net>
+From: Alice <alice@example.net>
+To: Bob <bob@example.net>
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+EOF
+}
+
+message_b() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/b <<EOF
+Subject: Second message
+Message-ID: <b@example.net>
+In-Reply-To: <a@example.net>
+References: <a@example.net>
+From: Bob <bob@example.net>
+To: Alice <alice@example.net>
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+EOF
+}
+
+
+test_thread_count() {
+    notmuch new >/dev/null
+    test_begin_subtest "${2:-Expecting $1 thread(s)}"
+    count=$(notmuch count --output=threads)
+    test_expect_equal "$count" "$1"
+}
+
+test_thread_count 0 'There should be no threads initially'
+
+message_a
+test_thread_count 1 'One message in: one thread'
+
+message_b
+test_thread_count 1 'Second message in the same thread: one thread'
+
+rm -f ${MAIL_DIR}/cur/a
+test_thread_count 1 'First message removed: still only one thread'
+
+message_a
+# this is known to fail (it shows 2 threads) because no "ghost
+# message" was created for message A when it was removed from the
+# index, despite message B still pointing to it.
+test_begin_subtest 'First message reappears: should return to the same thread'
+test_subtest_known_broken
+notmuch new >/dev/null
+count=$(notmuch count --output=threads)
+test_expect_equal "$count" "1"
+
+test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/2] test: add test-binary to print the number of ghost messages
  2016-04-02 16:19   ` [PATCH 1/2] test thread breakage when messages are removed and re-added David Bremner
@ 2016-04-02 16:19     ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-02 16:19 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

This one-liner seems preferable to the complications of depending on
delve, getting the binary name right and parsing the output.
---

I wasn't sure where exactly you wanted to count ghosts, so here is
some example code to do the counting. Feel free to modify to suit your
purposes.

 test/Makefile.local  |  4 ++++
 test/ghost-report.cc | 12 ++++++++++++
 2 files changed, 16 insertions(+)
 create mode 100644 test/ghost-report.cc

diff --git a/test/Makefile.local b/test/Makefile.local
index 30d420e..022f2cf 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -38,6 +38,9 @@ $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
 $(dir)/make-db-version: $(dir)/make-db-version.o
 	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/ghost-report: $(dir)/ghost-report.o
+	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -47,6 +50,7 @@ test_main_srcs=$(dir)/arg-test.c \
 	      $(dir)/smtp-dummy.c \
 	      $(dir)/symbol-test.cc \
 	      $(dir)/make-db-version.cc \
+	      $(dir)/ghost-report.cc
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/ghost-report.cc b/test/ghost-report.cc
new file mode 100644
index 0000000..1739be4
--- /dev/null
+++ b/test/ghost-report.cc
@@ -0,0 +1,12 @@
+#include <iostream>
+#include <xapian.h>
+
+int main(int argc, char **argv) {
+
+    if (argc < 2) {
+	std::cerr << "usage: ghost-report xapian-dir" << std::endl;
+    }
+
+    Xapian::Database db(argv[1]);
+    std::cout << db.get_termfreq("Tghost") << std::endl;
+}
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] fix thread breakage via ghost-on-removal
  2016-04-02 14:15   ` [PATCH v2 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
@ 2016-04-05  6:53     ` Tomi Ollila
  2016-04-05 20:05       ` Daniel Kahn Gillmor
  2016-04-06  1:39     ` David Bremner
  1 sibling, 1 reply; 46+ messages in thread
From: Tomi Ollila @ 2016-04-05  6:53 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

On Sat, Apr 02 2016, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:

> ghost-on-removal the solution to T590-thread-breakage.sh that just
> adds a ghost message after removing each message.
>
> It leaks information about whether we've ever seen a given message id,
> but it's a fairly simple implementation.
>
> Note that _resolve_message_id_to_thread_id also introduces new
> message_ids to the database, so i think just searching for a given
> message ID may introduce the same metadata leakage.
>
> This differs from v1 of this changeset in that we implement the change
> in _notmuch_message_delete, a more "internal" function.

I fetched your changes from lair.fifthhorseman.net yesterday and diffed
against master; looks pretty good, some quick comments (this email has
most relevant changes related to those changes):

> ---
>  lib/database.cc |  2 +-
>  lib/message.cc  | 29 ++++++++++++++++++++++++++---
>  2 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 3b342f1..d5733c9 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -2557,7 +2557,7 @@ notmuch_database_remove_message (notmuch_database_t *notmuch,
>  
>      if (status == NOTMUCH_STATUS_SUCCESS && message) {
>  	    status = _notmuch_message_remove_filename (message, filename);
> -	    if (status == NOTMUCH_STATUS_SUCCESS)
> +	    if (status == NOTMUCH_STATUS_SUCCESS) 

It looks to be that this change is insignificant enough so it could be
dropped ;)

>  		_notmuch_message_delete (message);
>  	    else if (status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
>  		_notmuch_message_sync (message);
> diff --git a/lib/message.cc b/lib/message.cc
> index 8d72ea2..e414e9c 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -1037,20 +1037,43 @@ _notmuch_message_sync (notmuch_message_t *message)
>      message->modified = FALSE;
>  }
>  
> -/* Delete a message document from the database. */
> +/* Delete a message document from the database, leaving a ghost
> + * message in its place */

This comment could tell the condition when ghost message is left --
versus the case all ghost messages are dropped (thread becomes empty of
mail messages).

>  notmuch_status_t
>  _notmuch_message_delete (notmuch_message_t *message)
>  {
>      notmuch_status_t status;
>      Xapian::WritableDatabase *db;
> +    const char *mid, *tid;
> +    notmuch_message_t *ghost;
> +    notmuch_private_status_t private_status;
> +    notmuch_database_t *notmuch;
> +	    
> +    mid = notmuch_message_get_message_id (message);
> +    tid = notmuch_message_get_thread_id (message);
> +    notmuch = message->notmuch;
>  
>      status = _notmuch_database_ensure_writable (message->notmuch);
>      if (status)
>  	return status;
>  
> -    db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
> +    db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
>      db->delete_document (message->doc_id);
> -    return NOTMUCH_STATUS_SUCCESS;
> +	    
> +    /* and reintroduce a ghost in its place */
> +    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);

In next lines the expected case NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND
could be first. Although the performance difference is negligible to me
it looks silly do this first check and basically always fail there and
then do 'else if' which is likely to succeeed...
(your latest version in lair does not return in this first case but sets
status to NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID. Perhaps later messages in this
thread does the same but those are somewhat out-of-context for this reply).

> +    if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
> +	/* this is deeply weird, and we should not have gotten into
> +	   this state.  is there a better error message to return
> +	   here? */
> +	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
> +    } else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
> +	private_status = _notmuch_message_initialize_ghost (ghost, tid);
> +	if (! private_status)
> +	    _notmuch_message_sync (ghost);
> +    }
> +    notmuch_message_destroy (ghost);
> +    return COERCE_STATUS (private_status, "Error converting to ghost message");
>  }

Outside of this patch, but in some of the next messages, adds functions
_notmuch_message_has_term() and _notmuch_message_has_term_st(). Perhaps
the _notmuch_message_has_term() could be left unimplemented?

>  
>  /* Transform a blank message into a ghost message.  The caller must
> -- 
> 2.8.0.rc3
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] fix thread breakage via ghost-on-removal
  2016-04-05  6:53     ` Tomi Ollila
@ 2016-04-05 20:05       ` Daniel Kahn Gillmor
  2016-04-05 23:33         ` David Bremner
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-05 20:05 UTC (permalink / raw)
  To: Tomi Ollila, Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]

Hi Tomi--

On Tue 2016-04-05 03:53:34 -0300, Tomi Ollila wrote:
> I fetched your changes from lair.fifthhorseman.net yesterday and diffed
> against master; looks pretty good, some quick comments (this email has
> most relevant changes related to those changes):

thanks for the review!

>> -/* Delete a message document from the database. */
>> +/* Delete a message document from the database, leaving a ghost
>> + * message in its place */
>
> This comment could tell the condition when ghost message is left --
> versus the case all ghost messages are dropped (thread becomes empty of
> mail messages).

right, i can improve these comments.

> In next lines the expected case NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND
> could be first. Although the performance difference is negligible to me
> it looks silly do this first check and basically always fail there and
> then do 'else if' which is likely to succeeed...
> (your latest version in lair does not return in this first case but sets
> status to NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID. Perhaps later messages in this
> thread does the same but those are somewhat out-of-context for this reply).

sure, i can do the positive check first :)

>> +    if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
>> +	/* this is deeply weird, and we should not have gotten into
>> +	   this state.  is there a better error message to return
>> +	   here? */
>> +	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
>> +    } else if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
>> +	private_status = _notmuch_message_initialize_ghost (ghost, tid);
>> +	if (! private_status)
>> +	    _notmuch_message_sync (ghost);
>> +    }
>> +    notmuch_message_destroy (ghost);
>> +    return COERCE_STATUS (private_status, "Error converting to ghost message");
>>  }
>
> Outside of this patch, but in some of the next messages, adds functions
> _notmuch_message_has_term() and _notmuch_message_has_term_st(). Perhaps
> the _notmuch_message_has_term() could be left unimplemented?

yeah, i can do that, though i have to say it's programmatically
convenient to have a simple boolean test that defaults to some value if
there was an error.

I'll post one more round of patches to the mailing list to address these
cleanup bits in the next day or two.

I'm happy to read more reviews if anyone has them,

      --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] fix thread breakage via ghost-on-removal
  2016-04-05 20:05       ` Daniel Kahn Gillmor
@ 2016-04-05 23:33         ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-05 23:33 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Tomi Ollila, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

>>
>> Outside of this patch, but in some of the next messages, adds functions
>> _notmuch_message_has_term() and _notmuch_message_has_term_st(). Perhaps
>> the _notmuch_message_has_term() could be left unimplemented?
>
> yeah, i can do that, though i have to say it's programmatically
> convenient to have a simple boolean test that defaults to some value if
> there was an error.

Maybe this is obvious, but we rely heavily in the notmuch code base on
NOTMUCH_STATUS_SUCCESS==0, so the following idiom is pretty common,

      status = notmuch_status_returning_thing (... &out);
      if (status) {
         /* error path */
      }

      /* otherwise, deal with out */

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/7] verify during thread-breakage that messages are removed as well
  2016-04-02 14:15   ` [PATCH v2 2/7] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
@ 2016-04-06  1:20     ` David Bremner
  2016-04-09  1:54       ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 46+ messages in thread
From: David Bremner @ 2016-04-06  1:20 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

>  
> +test_subject_count() {
> +    notmuch new >/dev/null
> +    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
> +    count=$(notmuch count --output=threads "$1")
> +    test_expect_equal "$count" "$2"
> +}

It's confusing that this doesn't have anything to do with the subject:
prefix or the corresponding header.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 3/7] fix thread breakage via ghost-on-removal
  2016-04-02 14:15   ` [PATCH v2 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
  2016-04-05  6:53     ` Tomi Ollila
@ 2016-04-06  1:39     ` David Bremner
  1 sibling, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-06  1:39 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> ghost-on-removal the solution to T590-thread-breakage.sh that just
> adds a ghost message after removing each message.

- this patch should remove the test_subtest_known_broken added by (my
  revised version of T590. Also, "curently broken" probably doesn't need
  to be in the description, since the individual test is marked.
- there's a few lines with whitespace on the end of line.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 4/7] Add internal functions to search for alternate doc types
  2016-04-02 14:15   ` [PATCH v2 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
@ 2016-04-06  1:52     ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-06  1:52 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> Publicly we are only exposing the non-ghost documents (of "type"
> "mail").  But internally we might want to inspect the ghost messages
> as well.
>
> This changeset adds two new private interfaces to queries to recover
> information about alternate document types.
> ---
>  lib/notmuch-private.h | 10 ++++++++++
>  lib/query.cc          | 18 ++++++++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index 5dd4770..cbfc144 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -477,6 +477,16 @@ void
>  _notmuch_doc_id_set_remove (notmuch_doc_id_set_t *doc_ids,
>                              unsigned int doc_id);
>  
> +notmuch_status_t
> +_notmuch_query_search_messages_type_st (notmuch_query_t *query,
> +					const char *type,
> +					notmuch_messages_t **out);
> +
> +notmuch_status_t
> +_notmuch_query_count_messages_type_st (notmuch_query_t *query,
> +				       const char *type,
> +				       unsigned *count_out);

I was wondering if we should follow Xapian nomenclature and call these functions

_notmuch_query_{search, count}_documents

This assumes only going with the status returning versions

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 5/7] Introduce _notmuch_message_has_term()
  2016-04-02 14:15   ` [PATCH v2 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
@ 2016-04-06  2:04     ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-06  2:04 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> It can be useful to easily tell if a given message has a given term
> associated with it.
> ---
>  lib/message.cc        | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/notmuch-private.h | 13 +++++++++++++
>  2 files changed, 62 insertions(+)
>
> diff --git a/lib/message.cc b/lib/message.cc
> index e414e9c..fab70fd 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -1216,6 +1216,55 @@ _notmuch_message_remove_term (notmuch_message_t *message,
>      return NOTMUCH_PRIVATE_STATUS_SUCCESS;
>  }
>  
> +notmuch_bool_t
> +_notmuch_message_has_term (notmuch_message_t *message,
> +			   const char *prefix_name,
> +			   const char *value)
> +{
> +    notmuch_bool_t out;
> +    notmuch_private_status_t st =
> +	_notmuch_message_has_term_st (message, prefix_name, value, &out);
> +    if (st)
> +	return FALSE;
> +    return out;
> +}

I second Tomi's unease with providing this interface; not having it
would also allow dropping the _st from the other function.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 1/7] test: add test-binary to print the number of ghost messages
  2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                   ` (2 preceding siblings ...)
  2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-09  1:02 ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (5 more replies)
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
  4 siblings, 6 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

From: David Bremner <david@tethera.net>

This one-liner seems preferable to the complications of depending on
delve, getting the binary name right and parsing the output.
---
 test/Makefile.local  |  4 ++++
 test/ghost-report.cc | 12 ++++++++++++
 2 files changed, 16 insertions(+)
 create mode 100644 test/ghost-report.cc

diff --git a/test/Makefile.local b/test/Makefile.local
index 30d420e..022f2cf 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -38,6 +38,9 @@ $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
 $(dir)/make-db-version: $(dir)/make-db-version.o
 	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/ghost-report: $(dir)/ghost-report.o
+	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -47,6 +50,7 @@ test_main_srcs=$(dir)/arg-test.c \
 	      $(dir)/smtp-dummy.c \
 	      $(dir)/symbol-test.cc \
 	      $(dir)/make-db-version.cc \
+	      $(dir)/ghost-report.cc
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/ghost-report.cc b/test/ghost-report.cc
new file mode 100644
index 0000000..1739be4
--- /dev/null
+++ b/test/ghost-report.cc
@@ -0,0 +1,12 @@
+#include <iostream>
+#include <xapian.h>
+
+int main(int argc, char **argv) {
+
+    if (argc < 2) {
+	std::cerr << "usage: ghost-report xapian-dir" << std::endl;
+    }
+
+    Xapian::Database db(argv[1]);
+    std::cout << db.get_termfreq("Tghost") << std::endl;
+}
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 2/7] test thread breakage when messages are removed and re-added
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

This test (T590-thread-breakage.sh) has known-broken subtests.

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

This happens because if you insert "B" initially (before anything is
known about "A"), then a "ghost message" gets added to the database in
reference to "A" that is in the same thread, which "A" takes over when
it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I see a few options to fix this:

ghost-on-removal
----------------

We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
------------------------------------------

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
------------------

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-----------------

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.

---------------------

One risk of attempted fixes to this problem is that we could fail to
remove the search term indexes entirely.  This test contains
additional subtests to guard against that.

This test also ensures that the right number of ghost messages exist
in each situation; this will help us ensure we don't accumulate ghosts
indefinitely or leak too much information about what messages we've
seen or not seen, while still making it easy to reassemble threads
when messages come in out-of-order.
---
 test/T590-thread-breakage.sh | 131 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 0000000..24ffb42
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,131 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage during reindexing
+
+notmuch uses ghost documents to track messages we have seen references
+to but have never seen.  Regardless of the order of delivery, message
+deletion, and reindexing, the list of ghost messages for a given
+stored corpus should not vary, so that threads can be reassmebled
+cleanly.
+
+In practice, we accept a small amount of variation (and therefore
+traffic pattern metadata leakage to be stored in the index) for the
+sake of efficiency.
+
+This test also embeds some subtests to ensure that indexing actually
+works properly and attempted fixes to threading issues do not break
+the expected contents of the index.'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/a <<EOF
+Subject: First message
+Message-ID: <a@example.net>
+From: Alice <alice@example.net>
+To: Bob <bob@example.net>
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+Apple
+EOF
+}
+
+message_b() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/b <<EOF
+Subject: Second message
+Message-ID: <b@example.net>
+In-Reply-To: <a@example.net>
+References: <a@example.net>
+From: Bob <bob@example.net>
+To: Alice <alice@example.net>
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+Banana
+EOF
+}
+
+
+test_subject_count() {
+    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+    count=$(notmuch count --output=threads "$1")
+    test_expect_equal "$count" "$2"
+}
+
+test_thread_count() {
+    test_begin_subtest "${2:-Expecting $1 thread(s)}"
+    count=$(notmuch count --output=threads)
+    test_expect_equal "$count" "$1"
+}
+
+test_ghost_count() {
+    test_begin_subtest "${2:-Expecting $1 ghosts(s)}"
+    ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+    test_expect_equal "$ghosts" "$1"
+}
+
+notmuch new >/dev/null
+pwd
+ls -la ${MAIL_DIR}/.notmuch/xapian
+
+test_thread_count 0 'There should be no threads initially'
+test_ghost_count 0 'There should be no ghosts initially'
+
+message_a
+notmuch new >/dev/null
+test_thread_count 1 'One message in: one thread'
+test_subject_count apple 1
+test_subject_count banana 0
+test_ghost_count 0
+
+message_b
+notmuch new >/dev/null
+test_thread_count 1 'Second message in the same thread: one thread'
+test_subject_count apple 1
+test_subject_count banana 1
+test_ghost_count 0
+
+rm -f ${MAIL_DIR}/cur/a
+notmuch new >/dev/null
+test_thread_count 1 'First message removed: still only one thread'
+test_subject_count apple 0
+test_subject_count banana 1
+test_begin_subtest 'should be one ghost after first message removed'
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "1"
+
+message_a
+notmuch new >/dev/null
+# this is known to fail (it shows 2 threads) because no "ghost
+# message" was created for message A when it was removed from the
+# index, despite message B still pointing to it.
+test_begin_subtest 'First message reappears: should return to the same thread'
+test_subtest_known_broken
+count=$(notmuch count --output=threads)
+test_expect_equal "$count" "1"
+test_subject_count apple 1
+test_subject_count banana 1
+test_ghost_count 0
+
+rm -f ${MAIL_DIR}/cur/b
+notmuch new >/dev/null
+test_thread_count 1 'Removing second message: still only one thread'
+test_subject_count apple 1
+test_subject_count banana 0
+test_ghost_count 0 'No ghosts should remain after deletion of second message'
+
+rm -f ${MAIL_DIR}/cur/a
+notmuch new >/dev/null
+test_thread_count 0 'All messages gone: no threads'
+test_subject_count apple 0
+test_subject_count banana 0
+test_ghost_count 0
+
+test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 3/7] fix thread breakage via ghost-on-removal
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

implement ghost-on-removal, the solution to T590-thread-breakage.sh
that just adds a ghost message after removing each message.

It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.

Note that _resolve_message_id_to_thread_id already introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
---
 lib/message.cc               | 30 +++++++++++++++++++++++++++---
 test/T590-thread-breakage.sh | 25 ++++++++++++-------------
 2 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 8d72ea2..435b78a 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1037,20 +1037,44 @@ _notmuch_message_sync (notmuch_message_t *message)
     message->modified = FALSE;
 }
 
-/* Delete a message document from the database. */
+/* Delete a message document from the database, leaving a ghost
+ * message in its place */
 notmuch_status_t
 _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
+    const char *mid, *tid;
+    notmuch_message_t *ghost;
+    notmuch_private_status_t private_status;
+    notmuch_database_t *notmuch;
+	    
+    mid = notmuch_message_get_message_id (message);
+    tid = notmuch_message_get_thread_id (message);
+    notmuch = message->notmuch;
 
     status = _notmuch_database_ensure_writable (message->notmuch);
     if (status)
 	return status;
 
-    db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
+    db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-    return NOTMUCH_STATUS_SUCCESS;
+	    
+    /* and reintroduce a ghost in its place */
+    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	if (! private_status)
+	    _notmuch_message_sync (ghost);
+    } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	/* this is deeply weird, and we should not have gotten into
+	   this state.  is there a better error message to return
+	   here? */
+	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+    }
+
+    notmuch_message_destroy (ghost);
+    return COERCE_STATUS (private_status, "Error converting to ghost message");
 }
 
 /* Transform a blank message into a ghost message.  The caller must
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 24ffb42..22e5cbc 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -96,20 +96,11 @@ notmuch new >/dev/null
 test_thread_count 1 'First message removed: still only one thread'
 test_subject_count apple 0
 test_subject_count banana 1
-test_begin_subtest 'should be one ghost after first message removed'
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "1"
+test_ghost_count 1 'should be one ghost after first message removed'
 
 message_a
 notmuch new >/dev/null
-# this is known to fail (it shows 2 threads) because no "ghost
-# message" was created for message A when it was removed from the
-# index, despite message B still pointing to it.
-test_begin_subtest 'First message reappears: should return to the same thread'
-test_subtest_known_broken
-count=$(notmuch count --output=threads)
-test_expect_equal "$count" "1"
+test_thread_count 1 'First message reappears: should return to the same thread'
 test_subject_count apple 1
 test_subject_count banana 1
 test_ghost_count 0
@@ -119,13 +110,21 @@ notmuch new >/dev/null
 test_thread_count 1 'Removing second message: still only one thread'
 test_subject_count apple 1
 test_subject_count banana 0
-test_ghost_count 0 'No ghosts should remain after deletion of second message'
+test_begin_subtest 'No ghosts should remain after deletion of second message'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 rm -f ${MAIL_DIR}/cur/a
 notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_subject_count apple 0
 test_subject_count banana 0
-test_ghost_count 0
+test_begin_subtest 'No ghosts should remain after full thread deletion'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 4/7] Add internal functions to search for alternate doc types
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

Publicly we are only exposing the non-ghost documents (of "type"
"mail").  But internally we might want to inspect the ghost messages
as well.

This changeset adds two new private interfaces to queries to recover
information about alternate document types.
---
 lib/notmuch-private.h | 11 +++++++++++
 lib/query.cc          | 18 ++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 5dd4770..d95bf31 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -477,6 +477,17 @@ void
 _notmuch_doc_id_set_remove (notmuch_doc_id_set_t *doc_ids,
                             unsigned int doc_id);
 
+/* querying xapian documents by type (e.g. "mail" or "ghost"): */
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+				 const char *type,
+				 notmuch_messages_t **out);
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query,
+				const char *type,
+				unsigned *count_out);
+
 /* message.cc */
 
 void
diff --git a/lib/query.cc b/lib/query.cc
index e627bfc..77a7926 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -187,6 +187,14 @@ notmuch_status_t
 notmuch_query_search_messages_st (notmuch_query_t *query,
 				  notmuch_messages_t **out)
 {
+    return _notmuch_query_search_documents (query, "mail", out);
+}
+
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+				 const char *type,
+				 notmuch_messages_t **out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     notmuch_mset_messages_t *messages;
@@ -208,7 +216,7 @@ notmuch_query_search_messages_st (notmuch_query_t *query,
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	Xapian::MSetIterator iterator;
@@ -554,6 +562,12 @@ notmuch_query_count_messages (notmuch_query_t *query)
 notmuch_status_t
 notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 {
+    return _notmuch_query_count_documents (query, "mail", count_out);
+}
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query, const char *type, unsigned *count_out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     Xapian::doccount count = 0;
@@ -562,7 +576,7 @@ notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 5/7] Introduce _notmuch_message_has_term()
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
                     ` (2 preceding siblings ...)
  2016-04-09  1:02   ` [PATCH v3 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

It can be useful to easily tell if a given message has a given term
associated with it.
---
 lib/message.cc        | 35 +++++++++++++++++++++++++++++++++++
 lib/notmuch-private.h |  6 ++++++
 2 files changed, 41 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index 435b78a..2399ab3 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1217,6 +1217,41 @@ _notmuch_message_remove_term (notmuch_message_t *message,
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value,
+			   notmuch_bool_t *result)
+{
+    char *term;
+    notmuch_bool_t out = FALSE;
+    notmuch_private_status_t status = NOTMUCH_PRIVATE_STATUS_SUCCESS;
+
+    if (value == NULL)
+	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
+
+    term = talloc_asprintf (message, "%s%s",
+			    _find_prefix (prefix_name), value);
+
+    if (strlen (term) > NOTMUCH_TERM_MAX)
+	return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
+
+    try {
+	/* Look for the exact term */
+	Xapian::TermIterator i = message->doc.termlist_begin ();
+	i.skip_to (term);
+	if (i != message->doc.termlist_end () &&
+	    !strcmp ((*i).c_str (), term))
+	    out = TRUE;
+    } catch (Xapian::Error &error) {
+	status = NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+    }
+    talloc_free (term);
+
+    *result = out;
+    return status;
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index d95bf31..9280797 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -280,6 +280,12 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 			      const char *value);
 
 notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value,
+			   notmuch_bool_t *result);
+
+notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
 			    const char *text);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 6/7] On deletion, replace with ghost when other active messages in thread
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
                     ` (3 preceding siblings ...)
  2016-04-09  1:02   ` [PATCH v3 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  2016-04-09  1:02   ` [PATCH v3 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.

Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
---
 lib/message.cc | 58 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 2399ab3..1b423b0 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1044,11 +1044,14 @@ _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
-    const char *mid, *tid;
+    const char *mid, *tid, *query_string;
     notmuch_message_t *ghost;
     notmuch_private_status_t private_status;
     notmuch_database_t *notmuch;
-	    
+    notmuch_query_t *query;
+    unsigned int count = 0;
+    notmuch_bool_t is_ghost;
+
     mid = notmuch_message_get_message_id (message);
     tid = notmuch_message_get_thread_id (message);
     notmuch = message->notmuch;
@@ -1059,22 +1062,45 @@ _notmuch_message_delete (notmuch_message_t *message)
 
     db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-	    
-    /* and reintroduce a ghost in its place */
-    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
-    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
-	private_status = _notmuch_message_initialize_ghost (ghost, tid);
-	if (! private_status)
-	    _notmuch_message_sync (ghost);
-    } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
-	/* this is deeply weird, and we should not have gotten into
-	   this state.  is there a better error message to return
-	   here? */
-	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+
+    /* if this was a ghost to begin with, we are done */
+    private_status = _notmuch_message_has_term (message, "type", "ghost", &is_ghost);
+    if (private_status)
+	return COERCE_STATUS (private_status,
+			      "Error trying to determine whether message was a ghost");
+    if (is_ghost)
+	return NOTMUCH_STATUS_SUCCESS;
+    
+    query_string = talloc_asprintf (message, "thread:%s", tid);
+    query = notmuch_query_create (notmuch, query_string);
+    if (query == NULL)
+	return NOTMUCH_STATUS_OUT_OF_MEMORY;
+    status = notmuch_query_count_messages_st (query, &count);
+    if (status) {
+	notmuch_query_destroy (query);
+	return status;
     }
 
-    notmuch_message_destroy (ghost);
-    return COERCE_STATUS (private_status, "Error converting to ghost message");
+    if (count > 0) {
+	/* reintroduce a ghost in its place because there are still
+	 * other active messages in this thread: */
+	ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+	if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	    private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	    if (! private_status)
+		_notmuch_message_sync (ghost);
+	} else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	    /* this is deeply weird, and we should not have gotten
+	       into this state.  is there a better error message to
+	       return here? */
+	    status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+	}
+
+	notmuch_message_destroy (ghost);
+	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    }
+    notmuch_query_destroy (query);
+    return status;
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
                     ` (4 preceding siblings ...)
  2016-04-09  1:02   ` [PATCH v3 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
@ 2016-04-09  1:02   ` Daniel Kahn Gillmor
  5 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:02 UTC (permalink / raw)
  To: Notmuch Mail

To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
---
 lib/message.cc               | 20 ++++++++++++++++++++
 test/T590-thread-breakage.sh |  6 +-----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 1b423b0..39dbe53 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1098,6 +1098,26 @@ _notmuch_message_delete (notmuch_message_t *message)
 
 	notmuch_message_destroy (ghost);
 	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    } else {
+	/* the thread is empty; drop all ghost messages from it */
+	notmuch_messages_t *messages;
+	status = _notmuch_query_search_documents (query,
+						  "ghost",
+						  &messages);
+	if (status == NOTMUCH_STATUS_SUCCESS) {
+	    notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS;
+	    while (notmuch_messages_valid (messages)) {
+		message = notmuch_messages_get (messages);
+		status = _notmuch_message_delete (message);
+		if (status) /* we'll report the last failure we see;
+			     * if there is more than one failure, we
+			     * forget about previous ones */
+		    last_error = status;
+		notmuch_message_destroy (message);
+		notmuch_messages_move_to_next (messages);
+	    }
+	    status = last_error;
+	}
     }
     notmuch_query_destroy (query);
     return status;
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 22e5cbc..969243e 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -121,10 +121,6 @@ notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_subject_count apple 0
 test_subject_count banana 0
-test_begin_subtest 'No ghosts should remain after full thread deletion'
-# this is known to fail; we are leaking ghost messages deliberately
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "0"
+test_ghost_count 0 'No ghosts should remain after full thread deletion'
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 2/7] verify during thread-breakage that messages are removed as well
  2016-04-06  1:20     ` David Bremner
@ 2016-04-09  1:54       ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: David Bremner, Notmuch Mail

On Tue 2016-04-05 22:20:45 -0300, David Bremner wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>>  
>> +test_subject_count() {
>> +    notmuch new >/dev/null
>> +    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
>> +    count=$(notmuch count --output=threads "$1")
>> +    test_expect_equal "$count" "$2"
>> +}
>
> It's confusing that this doesn't have anything to do with the subject:
> prefix or the corresponding header.

Sigh.  sorry, i just realized i missed this absolutely correct comment
when preparing the v3 series.  v4 is on its way, and hopefully it
addresses all the excellent feedback i got here.

apologies for the noise.

       --dkg

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 1/7] test: add test-binary to print the number of ghost messages
  2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                   ` (3 preceding siblings ...)
  2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
@ 2016-04-09  1:54 ` Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
                     ` (6 more replies)
  4 siblings, 7 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

From: David Bremner <david@tethera.net>

This one-liner seems preferable to the complications of depending on
delve, getting the binary name right and parsing the output.
---
 test/Makefile.local  |  4 ++++
 test/ghost-report.cc | 12 ++++++++++++
 2 files changed, 16 insertions(+)
 create mode 100644 test/ghost-report.cc

diff --git a/test/Makefile.local b/test/Makefile.local
index 30d420e..022f2cf 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -38,6 +38,9 @@ $(dir)/parse-time: $(dir)/parse-time.o parse-time-string/parse-time-string.o
 $(dir)/make-db-version: $(dir)/make-db-version.o
 	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/ghost-report: $(dir)/ghost-report.o
+	$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -47,6 +50,7 @@ test_main_srcs=$(dir)/arg-test.c \
 	      $(dir)/smtp-dummy.c \
 	      $(dir)/symbol-test.cc \
 	      $(dir)/make-db-version.cc \
+	      $(dir)/ghost-report.cc
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/ghost-report.cc b/test/ghost-report.cc
new file mode 100644
index 0000000..1739be4
--- /dev/null
+++ b/test/ghost-report.cc
@@ -0,0 +1,12 @@
+#include <iostream>
+#include <xapian.h>
+
+int main(int argc, char **argv) {
+
+    if (argc < 2) {
+	std::cerr << "usage: ghost-report xapian-dir" << std::endl;
+    }
+
+    Xapian::Database db(argv[1]);
+    std::cout << db.get_termfreq("Tghost") << std::endl;
+}
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 2/7] test thread breakage when messages are removed and re-added
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-11 13:59     ` [PATCH] remove debugging spew from T590 Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

This test (T590-thread-breakage.sh) has known-broken subtests.

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

This happens because if you insert "B" initially (before anything is
known about "A"), then a "ghost message" gets added to the database in
reference to "A" that is in the same thread, which "A" takes over when
it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I see a few options to fix this:

ghost-on-removal
----------------

We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
------------------------------------------

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
------------------

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-----------------

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.

---------------------

One risk of attempted fixes to this problem is that we could fail to
remove the search term indexes entirely.  This test contains
additional subtests to guard against that.

This test also ensures that the right number of ghost messages exist
in each situation; this will help us ensure we don't accumulate ghosts
indefinitely or leak too much information about what messages we've
seen or not seen, while still making it easy to reassemble threads
when messages come in out-of-order.
---
 test/T590-thread-breakage.sh | 131 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 0000000..2b933f6
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,131 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage during reindexing
+
+notmuch uses ghost documents to track messages we have seen references
+to but have never seen.  Regardless of the order of delivery, message
+deletion, and reindexing, the list of ghost messages for a given
+stored corpus should not vary, so that threads can be reassmebled
+cleanly.
+
+In practice, we accept a small amount of variation (and therefore
+traffic pattern metadata leakage to be stored in the index) for the
+sake of efficiency.
+
+This test also embeds some subtests to ensure that indexing actually
+works properly and attempted fixes to threading issues do not break
+the expected contents of the index.'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/a <<EOF
+Subject: First message
+Message-ID: <a@example.net>
+From: Alice <alice@example.net>
+To: Bob <bob@example.net>
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+Apple
+EOF
+}
+
+message_b() {
+    mkdir -p ${MAIL_DIR}/cur
+    cat > ${MAIL_DIR}/cur/b <<EOF
+Subject: Second message
+Message-ID: <b@example.net>
+In-Reply-To: <a@example.net>
+References: <a@example.net>
+From: Bob <bob@example.net>
+To: Alice <alice@example.net>
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+Banana
+EOF
+}
+
+
+test_content_count() {
+    test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+    count=$(notmuch count --output=threads "$1")
+    test_expect_equal "$count" "$2"
+}
+
+test_thread_count() {
+    test_begin_subtest "${2:-Expecting $1 thread(s)}"
+    count=$(notmuch count --output=threads)
+    test_expect_equal "$count" "$1"
+}
+
+test_ghost_count() {
+    test_begin_subtest "${2:-Expecting $1 ghosts(s)}"
+    ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+    test_expect_equal "$ghosts" "$1"
+}
+
+notmuch new >/dev/null
+pwd
+ls -la ${MAIL_DIR}/.notmuch/xapian
+
+test_thread_count 0 'There should be no threads initially'
+test_ghost_count 0 'There should be no ghosts initially'
+
+message_a
+notmuch new >/dev/null
+test_thread_count 1 'One message in: one thread'
+test_content_count apple 1
+test_content_count banana 0
+test_ghost_count 0
+
+message_b
+notmuch new >/dev/null
+test_thread_count 1 'Second message in the same thread: one thread'
+test_content_count apple 1
+test_content_count banana 1
+test_ghost_count 0
+
+rm -f ${MAIL_DIR}/cur/a
+notmuch new >/dev/null
+test_thread_count 1 'First message removed: still only one thread'
+test_content_count apple 0
+test_content_count banana 1
+test_begin_subtest 'should be one ghost after first message removed'
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "1"
+
+message_a
+notmuch new >/dev/null
+# this is known to fail (it shows 2 threads) because no "ghost
+# message" was created for message A when it was removed from the
+# index, despite message B still pointing to it.
+test_begin_subtest 'First message reappears: should return to the same thread'
+test_subtest_known_broken
+count=$(notmuch count --output=threads)
+test_expect_equal "$count" "1"
+test_content_count apple 1
+test_content_count banana 1
+test_ghost_count 0
+
+rm -f ${MAIL_DIR}/cur/b
+notmuch new >/dev/null
+test_thread_count 1 'Removing second message: still only one thread'
+test_content_count apple 1
+test_content_count banana 0
+test_ghost_count 0 'No ghosts should remain after deletion of second message'
+
+rm -f ${MAIL_DIR}/cur/a
+notmuch new >/dev/null
+test_thread_count 0 'All messages gone: no threads'
+test_content_count apple 0
+test_content_count banana 0
+test_ghost_count 0
+
+test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 3/7] fix thread breakage via ghost-on-removal
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

implement ghost-on-removal, the solution to T590-thread-breakage.sh
that just adds a ghost message after removing each message.

It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.

Note that _resolve_message_id_to_thread_id already introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
---
 lib/message.cc               | 30 +++++++++++++++++++++++++++---
 test/T590-thread-breakage.sh | 25 ++++++++++++-------------
 2 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 8d72ea2..435b78a 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1037,20 +1037,44 @@ _notmuch_message_sync (notmuch_message_t *message)
     message->modified = FALSE;
 }
 
-/* Delete a message document from the database. */
+/* Delete a message document from the database, leaving a ghost
+ * message in its place */
 notmuch_status_t
 _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
+    const char *mid, *tid;
+    notmuch_message_t *ghost;
+    notmuch_private_status_t private_status;
+    notmuch_database_t *notmuch;
+	    
+    mid = notmuch_message_get_message_id (message);
+    tid = notmuch_message_get_thread_id (message);
+    notmuch = message->notmuch;
 
     status = _notmuch_database_ensure_writable (message->notmuch);
     if (status)
 	return status;
 
-    db = static_cast <Xapian::WritableDatabase *> (message->notmuch->xapian_db);
+    db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-    return NOTMUCH_STATUS_SUCCESS;
+	    
+    /* and reintroduce a ghost in its place */
+    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	if (! private_status)
+	    _notmuch_message_sync (ghost);
+    } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	/* this is deeply weird, and we should not have gotten into
+	   this state.  is there a better error message to return
+	   here? */
+	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+    }
+
+    notmuch_message_destroy (ghost);
+    return COERCE_STATUS (private_status, "Error converting to ghost message");
 }
 
 /* Transform a blank message into a ghost message.  The caller must
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 2b933f6..81f27db 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -96,20 +96,11 @@ notmuch new >/dev/null
 test_thread_count 1 'First message removed: still only one thread'
 test_content_count apple 0
 test_content_count banana 1
-test_begin_subtest 'should be one ghost after first message removed'
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "1"
+test_ghost_count 1 'should be one ghost after first message removed'
 
 message_a
 notmuch new >/dev/null
-# this is known to fail (it shows 2 threads) because no "ghost
-# message" was created for message A when it was removed from the
-# index, despite message B still pointing to it.
-test_begin_subtest 'First message reappears: should return to the same thread'
-test_subtest_known_broken
-count=$(notmuch count --output=threads)
-test_expect_equal "$count" "1"
+test_thread_count 1 'First message reappears: should return to the same thread'
 test_content_count apple 1
 test_content_count banana 1
 test_ghost_count 0
@@ -119,13 +110,21 @@ notmuch new >/dev/null
 test_thread_count 1 'Removing second message: still only one thread'
 test_content_count apple 1
 test_content_count banana 0
-test_ghost_count 0 'No ghosts should remain after deletion of second message'
+test_begin_subtest 'No ghosts should remain after deletion of second message'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 rm -f ${MAIL_DIR}/cur/a
 notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_content_count apple 0
 test_content_count banana 0
-test_ghost_count 0
+test_begin_subtest 'No ghosts should remain after full thread deletion'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 4/7] Add internal functions to search for alternate doc types
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

Publicly we are only exposing the non-ghost documents (of "type"
"mail").  But internally we might want to inspect the ghost messages
as well.

This changeset adds two new private interfaces to queries to recover
information about alternate document types.
---
 lib/notmuch-private.h | 11 +++++++++++
 lib/query.cc          | 18 ++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 5dd4770..d95bf31 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -477,6 +477,17 @@ void
 _notmuch_doc_id_set_remove (notmuch_doc_id_set_t *doc_ids,
                             unsigned int doc_id);
 
+/* querying xapian documents by type (e.g. "mail" or "ghost"): */
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+				 const char *type,
+				 notmuch_messages_t **out);
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query,
+				const char *type,
+				unsigned *count_out);
+
 /* message.cc */
 
 void
diff --git a/lib/query.cc b/lib/query.cc
index e627bfc..77a7926 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -187,6 +187,14 @@ notmuch_status_t
 notmuch_query_search_messages_st (notmuch_query_t *query,
 				  notmuch_messages_t **out)
 {
+    return _notmuch_query_search_documents (query, "mail", out);
+}
+
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+				 const char *type,
+				 notmuch_messages_t **out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     notmuch_mset_messages_t *messages;
@@ -208,7 +216,7 @@ notmuch_query_search_messages_st (notmuch_query_t *query,
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	Xapian::MSetIterator iterator;
@@ -554,6 +562,12 @@ notmuch_query_count_messages (notmuch_query_t *query)
 notmuch_status_t
 notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 {
+    return _notmuch_query_count_documents (query, "mail", count_out);
+}
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query, const char *type, unsigned *count_out)
+{
     notmuch_database_t *notmuch = query->notmuch;
     const char *query_string = query->query_string;
     Xapian::doccount count = 0;
@@ -562,7 +576,7 @@ notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 	Xapian::Enquire enquire (*notmuch->xapian_db);
 	Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
 						   _find_prefix ("type"),
-						   "mail"));
+						   type));
 	Xapian::Query string_query, final_query, exclude_query;
 	Xapian::MSet mset;
 	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 5/7] Introduce _notmuch_message_has_term()
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
                     ` (2 preceding siblings ...)
  2016-04-09  1:54   ` [PATCH v4 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

It can be useful to easily tell if a given message has a given term
associated with it.
---
 lib/message.cc        | 35 +++++++++++++++++++++++++++++++++++
 lib/notmuch-private.h |  6 ++++++
 2 files changed, 41 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index 435b78a..2399ab3 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1217,6 +1217,41 @@ _notmuch_message_remove_term (notmuch_message_t *message,
     return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value,
+			   notmuch_bool_t *result)
+{
+    char *term;
+    notmuch_bool_t out = FALSE;
+    notmuch_private_status_t status = NOTMUCH_PRIVATE_STATUS_SUCCESS;
+
+    if (value == NULL)
+	return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
+
+    term = talloc_asprintf (message, "%s%s",
+			    _find_prefix (prefix_name), value);
+
+    if (strlen (term) > NOTMUCH_TERM_MAX)
+	return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
+
+    try {
+	/* Look for the exact term */
+	Xapian::TermIterator i = message->doc.termlist_begin ();
+	i.skip_to (term);
+	if (i != message->doc.termlist_end () &&
+	    !strcmp ((*i).c_str (), term))
+	    out = TRUE;
+    } catch (Xapian::Error &error) {
+	status = NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+    }
+    talloc_free (term);
+
+    *result = out;
+    return status;
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index d95bf31..9280797 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -280,6 +280,12 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 			      const char *value);
 
 notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+			   const char *prefix_name,
+			   const char *value,
+			   notmuch_bool_t *result);
+
+notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
 			    const char *prefix_name,
 			    const char *text);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 6/7] On deletion, replace with ghost when other active messages in thread
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
                     ` (3 preceding siblings ...)
  2016-04-09  1:54   ` [PATCH v4 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  2016-04-09 11:02   ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages David Bremner
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.

Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
---
 lib/message.cc | 58 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 2399ab3..1b423b0 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1044,11 +1044,14 @@ _notmuch_message_delete (notmuch_message_t *message)
 {
     notmuch_status_t status;
     Xapian::WritableDatabase *db;
-    const char *mid, *tid;
+    const char *mid, *tid, *query_string;
     notmuch_message_t *ghost;
     notmuch_private_status_t private_status;
     notmuch_database_t *notmuch;
-	    
+    notmuch_query_t *query;
+    unsigned int count = 0;
+    notmuch_bool_t is_ghost;
+
     mid = notmuch_message_get_message_id (message);
     tid = notmuch_message_get_thread_id (message);
     notmuch = message->notmuch;
@@ -1059,22 +1062,45 @@ _notmuch_message_delete (notmuch_message_t *message)
 
     db = static_cast <Xapian::WritableDatabase *> (notmuch->xapian_db);
     db->delete_document (message->doc_id);
-	    
-    /* and reintroduce a ghost in its place */
-    ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
-    if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
-	private_status = _notmuch_message_initialize_ghost (ghost, tid);
-	if (! private_status)
-	    _notmuch_message_sync (ghost);
-    } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
-	/* this is deeply weird, and we should not have gotten into
-	   this state.  is there a better error message to return
-	   here? */
-	return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+
+    /* if this was a ghost to begin with, we are done */
+    private_status = _notmuch_message_has_term (message, "type", "ghost", &is_ghost);
+    if (private_status)
+	return COERCE_STATUS (private_status,
+			      "Error trying to determine whether message was a ghost");
+    if (is_ghost)
+	return NOTMUCH_STATUS_SUCCESS;
+    
+    query_string = talloc_asprintf (message, "thread:%s", tid);
+    query = notmuch_query_create (notmuch, query_string);
+    if (query == NULL)
+	return NOTMUCH_STATUS_OUT_OF_MEMORY;
+    status = notmuch_query_count_messages_st (query, &count);
+    if (status) {
+	notmuch_query_destroy (query);
+	return status;
     }
 
-    notmuch_message_destroy (ghost);
-    return COERCE_STATUS (private_status, "Error converting to ghost message");
+    if (count > 0) {
+	/* reintroduce a ghost in its place because there are still
+	 * other active messages in this thread: */
+	ghost = _notmuch_message_create_for_message_id (notmuch, mid, &private_status);
+	if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+	    private_status = _notmuch_message_initialize_ghost (ghost, tid);
+	    if (! private_status)
+		_notmuch_message_sync (ghost);
+	} else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+	    /* this is deeply weird, and we should not have gotten
+	       into this state.  is there a better error message to
+	       return here? */
+	    status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+	}
+
+	notmuch_message_destroy (ghost);
+	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    }
+    notmuch_query_destroy (query);
+    return status;
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
                     ` (4 preceding siblings ...)
  2016-04-09  1:54   ` [PATCH v4 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
@ 2016-04-09  1:54   ` Daniel Kahn Gillmor
  2016-04-09 11:31     ` David Bremner
                       ` (2 more replies)
  2016-04-09 11:02   ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages David Bremner
  6 siblings, 3 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09  1:54 UTC (permalink / raw)
  To: Notmuch Mail

To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
---
 lib/message.cc               | 20 ++++++++++++++++++++
 test/T590-thread-breakage.sh |  6 +-----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 1b423b0..39dbe53 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1098,6 +1098,26 @@ _notmuch_message_delete (notmuch_message_t *message)
 
 	notmuch_message_destroy (ghost);
 	status = COERCE_STATUS (private_status, "Error converting to ghost message");
+    } else {
+	/* the thread is empty; drop all ghost messages from it */
+	notmuch_messages_t *messages;
+	status = _notmuch_query_search_documents (query,
+						  "ghost",
+						  &messages);
+	if (status == NOTMUCH_STATUS_SUCCESS) {
+	    notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS;
+	    while (notmuch_messages_valid (messages)) {
+		message = notmuch_messages_get (messages);
+		status = _notmuch_message_delete (message);
+		if (status) /* we'll report the last failure we see;
+			     * if there is more than one failure, we
+			     * forget about previous ones */
+		    last_error = status;
+		notmuch_message_destroy (message);
+		notmuch_messages_move_to_next (messages);
+	    }
+	    status = last_error;
+	}
     }
     notmuch_query_destroy (query);
     return status;
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 81f27db..45446b9 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -121,10 +121,6 @@ notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_content_count apple 0
 test_content_count banana 0
-test_begin_subtest 'No ghosts should remain after full thread deletion'
-# this is known to fail; we are leaking ghost messages deliberately
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "0"
+test_ghost_count 0 'No ghosts should remain after full thread deletion'
 
 test_done
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 1/7] test: add test-binary to print the number of ghost messages
  2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
                     ` (5 preceding siblings ...)
  2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
@ 2016-04-09 11:02   ` David Bremner
  6 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-09 11:02 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:


> +    if (argc < 2) {
> +	std::cerr << "usage: ghost-report xapian-dir" << std::endl;
> +    }
> +
> +    Xapian::Database db(argv[1]);
> +    std::cout << db.get_termfreq("Tghost") << std::endl;
> +}

It's completely my fault, but this should have an exit(1) after printing
the error message.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
@ 2016-04-09 11:31     ` David Bremner
  2016-04-09 18:55       ` Daniel Kahn Gillmor
  2016-04-10  8:35     ` Tomi Ollila
  2016-04-11  0:33     ` David Bremner
  2 siblings, 1 reply; 46+ messages in thread
From: David Bremner @ 2016-04-09 11:31 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:


> +		status = _notmuch_message_delete (message);
> +		if (status) /* we'll report the last failure we see;
> +			     * if there is more than one failure, we
> +			     * forget about previous ones */
> +		    last_error = status;

I was initially worried/paranoid that there might be some risk of data
loss by continuing deleting after the first bad status; that doesn't
seem to be the case, but there doesn't seem to be much advantage in
continuing either, since the only error currently returned from
_notmuch_message_delete is from _notmuch_database_ensure_writable, which
seems likely to persist.  So perhaps exiting the loop on the first error
might be less confusing.

Other than that, and my bug in ghost-report, the series looks good to
me.

d

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09 11:31     ` David Bremner
@ 2016-04-09 18:55       ` Daniel Kahn Gillmor
  2016-04-09 19:15         ` David Bremner
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-09 18:55 UTC (permalink / raw)
  To: David Bremner, Notmuch Mail

On Sat 2016-04-09 07:31:47 -0400, David Bremner <david@tethera.net> wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>
>> +		status = _notmuch_message_delete (message);
>> +		if (status) /* we'll report the last failure we see;
>> +			     * if there is more than one failure, we
>> +			     * forget about previous ones */
>> +		    last_error = status;
>
> I was initially worried/paranoid that there might be some risk of data
> loss by continuing deleting after the first bad status; that doesn't
> seem to be the case, but there doesn't seem to be much advantage in
> continuing either, since the only error currently returned from
> _notmuch_message_delete is from _notmuch_database_ensure_writable,
> which seems likely to persist.  So perhaps exiting the loop on the
> first error might be less confusing.

At the moment, that's the only possible error, but maybe there will be
more errors as notmuch grows/changes in the future?  I figure we should
try once to delete each message we know we want to delete, regardless of
the success in deleting other message. So i'm inclined to keep it as-is,
but if someone wants to prepare a patch for the other direction i
wouldn't object strongly.

> Other than that, and my bug in ghost-report, the series looks good to
> me.

cool!  do you need another rev of the series from me, or is that
something you're up for applying directly?

       --dkg

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09 18:55       ` Daniel Kahn Gillmor
@ 2016-04-09 19:15         ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-09 19:15 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

>> Other than that, and my bug in ghost-report, the series looks good to
>> me.
>
> cool!  do you need another rev of the series from me, or is that
> something you're up for applying directly?
>
>        --dkg

I'm ok to apply directly if no objects in the next day or so.

d

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  2016-04-09 11:31     ` David Bremner
@ 2016-04-10  8:35     ` Tomi Ollila
  2016-04-11  0:33     ` David Bremner
  2 siblings, 0 replies; 46+ messages in thread
From: Tomi Ollila @ 2016-04-10  8:35 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail


;; This buffer is for notes you don't want to save, and for Lisp evaluation.
;; If you want to create a file, visit that file with C-x C-f,
;; then enter the text in that file's own buffer.

I applied these:

 id:1460166892-29721-1-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-2-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-3-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-4-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-5-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-6-git-send-email-dkg@fifthhorseman.net
 id:1460166892-29721-7-git-send-email-dkg@fifthhorseman.net

 1 Daniel Kahn Gillmor   Sat Apr  9 01:54  93/5426  "[PATCH v4 1/7] test: "
 2 Daniel Kahn Gillmor   Sat Apr  9 01:54 192/13050 "[PATCH v4 2/7] test t"
 3 Daniel Kahn Gillmor   Sat Apr  9 01:54 149/9768  "[PATCH v4 3/7] fix th"
 4 Daniel Kahn Gillmor   Sat Apr  9 01:54 120/7443  "[PATCH v4 4/7] Add in"
 5 Daniel Kahn Gillmor   Sat Apr  9 01:54 104/6284  "[PATCH v4 5/7] Introd"
 6 Daniel Kahn Gillmor   Sat Apr  9 01:54 130/8261  "[PATCH v4 6/7] On del"
 7 Daniel Kahn Gillmor   Sat Apr  9 01:54 104/6326  "[PATCH v4 7/7] comple"

And got

Applying: fix thread breakage via ghost-on-removal
.git/rebase-apply/patch:26: trailing whitespace.
.git/rebase-apply/patch:39: trailing whitespace.
warning: 2 lines add whitespace errors.

(adds trailing space to 2 empty lines)

Applying: On deletion, replace with ghost when other active messages in thread
.git/rebase-apply/patch:50: trailing whitespace.
warning: 1 line adds whitespace errors.

(removes the above 2 trailing spaces but adds one more)

Relevant tests passed (with one expected broken).

Tomi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
  2016-04-09 11:31     ` David Bremner
  2016-04-10  8:35     ` Tomi Ollila
@ 2016-04-11  0:33     ` David Bremner
  2016-04-11 19:18       ` Daniel Kahn Gillmor
  2 siblings, 1 reply; 46+ messages in thread
From: David Bremner @ 2016-04-11  0:33 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> To fully complete the ghost-on-removal-when-shared-thread-exists
> proposal, we need to clear all ghost messages when the last active
> message is removed from a thread.

For me this patch breaks T530 as follows

T530-upgrade: Testing database upgrade
 FAIL   ghost message retains thread ID
	--- T530-upgrade.13.expected	2016-04-11 00:25:19.482196274 +0000
	+++ T530-upgrade.13.output	2016-04-11 00:25:19.482196274 +0000
	@@ -1 +1 @@
	-thread:000000000000001d
	+thread:0000000000000011
No new mail.
No new mail. Removed 1 message.

I pushed my rebased version of the patches to

  https://git.notmuchmail.org/git?p=notmuch;a=shortlog;h=refs/heads/thread-fix

in case the problem is some mistake on my part.

d

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] remove debugging spew from T590
  2016-04-09  1:54   ` [PATCH v4 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
@ 2016-04-11 13:59     ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-11 13:59 UTC (permalink / raw)
  To: Notmuch Mail

This debugging aide somehow made it into the actual commits centered
around fix-thread, and it should not have.  you could squash this
change with 76955e82d1e4615faf49301804727aa4e3a76fd6

 ("test thread breakage when messages are removed and re-added")

apologies for the noisy process here.
---
 test/T590-thread-breakage.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 45446b9..6e4031a 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -71,8 +71,6 @@ test_ghost_count() {
 }
 
 notmuch new >/dev/null
-pwd
-ls -la ${MAIL_DIR}/.notmuch/xapian
 
 test_thread_count 0 'There should be no threads initially'
 test_ghost_count 0 'There should be no ghosts initially'
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-11  0:33     ` David Bremner
@ 2016-04-11 19:18       ` Daniel Kahn Gillmor
  2016-04-12  1:28         ` David Bremner
  2016-04-20  3:36         ` Austin Clements
  0 siblings, 2 replies; 46+ messages in thread
From: Daniel Kahn Gillmor @ 2016-04-11 19:18 UTC (permalink / raw)
  To: David Bremner, Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 2454 bytes --]

On Sun 2016-04-10 20:33:02 -0400, David Bremner <david@tethera.net> wrote:
> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>> To fully complete the ghost-on-removal-when-shared-thread-exists
>> proposal, we need to clear all ghost messages when the last active
>> message is removed from a thread.
>
> For me this patch breaks T530 as follows
>
> T530-upgrade: Testing database upgrade
>  FAIL   ghost message retains thread ID
> 	--- T530-upgrade.13.expected	2016-04-11 00:25:19.482196274 +0000
> 	+++ T530-upgrade.13.output	2016-04-11 00:25:19.482196274 +0000
> 	@@ -1 +1 @@
> 	-thread:000000000000001d
> 	+thread:0000000000000011
> No new mail.
> No new mail. Removed 1 message.
>
> I pushed my rebased version of the patches to
>
>   https://git.notmuchmail.org/git?p=notmuch;a=shortlog;h=refs/heads/thread-fix
>
> in case the problem is some mistake on my part.

After having done "make download-test-databases", I can confirm that
this is happening for me as well: it's not an issue with bremner's
config.

however, i think this particular test is wrong, and should probably be
removed.

For review, here's the final test:

----------
# Ghost messages are difficult to test since they're nearly invisible.
# However, if the upgrade works correctly, the ghost message should
# retain the right thread ID even if all of the original messages in
# the thread are deleted.  That's what we test.  This won't detect if
# the upgrade just plain didn't happen, but it should detect if
# something went wrong.
test_begin_subtest "ghost message retains thread ID"
# Upgrade database
notmuch new
# Get thread ID of real message
thread=$(notmuch search --output=threads id:4EFC743A.3060609@april.org)
# Delete all real messages in that thread
rm $(notmuch search --output=files $thread)
notmuch new
# "Deliver" ghost message
add_message '[subject]=Ghost' '[id]=4EFC3931.6030007@april.org'
# If the ghost upgrade worked, the new message should be attached to
# the existing thread ID.
nthread=$(notmuch search --output=threads id:4EFC3931.6030007@april.org)
test_expect_equal "$thread" "$nthread"
-------------

I don't think this reasoning is sensible.  If the entire thread is
deleted, and a new message comes in, it should *not* get the same mesage
ID.  ghosts should only exist in the database when other messages point
to them.

So i'd be fine with killing this entire last test, unless someone can
propose a good reason to keep it.

        --dkg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 948 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-11 19:18       ` Daniel Kahn Gillmor
@ 2016-04-12  1:28         ` David Bremner
  2016-04-15 10:29           ` David Bremner
  2016-04-20  3:36         ` Austin Clements
  1 sibling, 1 reply; 46+ messages in thread
From: David Bremner @ 2016-04-12  1:28 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail; +Cc: Austin Clements

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:


> I don't think this reasoning is sensible.  If the entire thread is
> deleted, and a new message comes in, it should *not* get the same mesage
> ID.  ghosts should only exist in the database when other messages point
> to them.
>
> So i'd be fine with killing this entire last test, unless someone can
> propose a good reason to keep it.

I think I buy your reasoning, but I'd be happy if Austin (who introduced
ghost messages, and wrote that test, could comment.

d

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-12  1:28         ` David Bremner
@ 2016-04-15 10:29           ` David Bremner
  0 siblings, 0 replies; 46+ messages in thread
From: David Bremner @ 2016-04-15 10:29 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, Notmuch Mail; +Cc: Austin Clements

David Bremner <david@tethera.net> writes:

> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>
>
>> I don't think this reasoning is sensible.  If the entire thread is
>> deleted, and a new message comes in, it should *not* get the same mesage
>> ID.  ghosts should only exist in the database when other messages point
>> to them.
>>
>> So i'd be fine with killing this entire last test, unless someone can
>> propose a good reason to keep it.
>
> I think I buy your reasoning, but I'd be happy if Austin (who introduced
> ghost messages, and wrote that test, could comment.

I have pushed the series, amended to remove that one test from T530.

d

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists
  2016-04-11 19:18       ` Daniel Kahn Gillmor
  2016-04-12  1:28         ` David Bremner
@ 2016-04-20  3:36         ` Austin Clements
  1 sibling, 0 replies; 46+ messages in thread
From: Austin Clements @ 2016-04-20  3:36 UTC (permalink / raw)
  To: Daniel Kahn Gillmor, David Bremner, Notmuch Mail

On Mon, 11 Apr 2016, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> On Sun 2016-04-10 20:33:02 -0400, David Bremner <david@tethera.net> wrote:
>> Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:
>>
>>> To fully complete the ghost-on-removal-when-shared-thread-exists
>>> proposal, we need to clear all ghost messages when the last active
>>> message is removed from a thread.
>>
>> For me this patch breaks T530 as follows
>>
>> T530-upgrade: Testing database upgrade
>>  FAIL   ghost message retains thread ID
>> 	--- T530-upgrade.13.expected	2016-04-11 00:25:19.482196274 +0000
>> 	+++ T530-upgrade.13.output	2016-04-11 00:25:19.482196274 +0000
>> 	@@ -1 +1 @@
>> 	-thread:000000000000001d
>> 	+thread:0000000000000011
>> No new mail.
>> No new mail. Removed 1 message.
>>
>> I pushed my rebased version of the patches to
>>
>>   https://git.notmuchmail.org/git?p=notmuch;a=shortlog;h=refs/heads/thread-fix
>>
>> in case the problem is some mistake on my part.
>
> After having done "make download-test-databases", I can confirm that
> this is happening for me as well: it's not an issue with bremner's
> config.
>
> however, i think this particular test is wrong, and should probably be
> removed.
>
> For review, here's the final test:
>
> ----------
> # Ghost messages are difficult to test since they're nearly invisible.
> # However, if the upgrade works correctly, the ghost message should
> # retain the right thread ID even if all of the original messages in
> # the thread are deleted.  That's what we test.  This won't detect if
> # the upgrade just plain didn't happen, but it should detect if
> # something went wrong.
> test_begin_subtest "ghost message retains thread ID"
> # Upgrade database
> notmuch new
> # Get thread ID of real message
> thread=$(notmuch search --output=threads id:4EFC743A.3060609@april.org)
> # Delete all real messages in that thread
> rm $(notmuch search --output=files $thread)
> notmuch new
> # "Deliver" ghost message
> add_message '[subject]=Ghost' '[id]=4EFC3931.6030007@april.org'
> # If the ghost upgrade worked, the new message should be attached to
> # the existing thread ID.
> nthread=$(notmuch search --output=threads id:4EFC3931.6030007@april.org)
> test_expect_equal "$thread" "$nthread"
> -------------
>
> I don't think this reasoning is sensible.  If the entire thread is
> deleted, and a new message comes in, it should *not* get the same mesage
> ID.  ghosts should only exist in the database when other messages point
> to them.
>
> So i'd be fine with killing this entire last test, unless someone can
> propose a good reason to keep it.

I agree that it's fine to remove this test. From what I recall, it
wasn't intended to test that ghost messages retained their thread IDs;
this was just the implementation property it exploited to test that
ghosts were working.

I haven't looked through this series, but it would be nice if there was
still some tests that ghosts were working across upgrade.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-04-20  4:03 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31 17:34 [PATCH] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
2016-04-01 22:27 ` Daniel Kahn Gillmor
2016-04-01 23:31 ` [PATCH 1/2] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
2016-04-01 23:31   ` [PATCH 2/2] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
2016-04-02 14:15 ` [PATCH v2 1/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
2016-04-02 14:15   ` [PATCH v2 2/7] verify during thread-breakage that messages are removed as well Daniel Kahn Gillmor
2016-04-06  1:20     ` David Bremner
2016-04-09  1:54       ` Daniel Kahn Gillmor
2016-04-02 14:15   ` [PATCH v2 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
2016-04-05  6:53     ` Tomi Ollila
2016-04-05 20:05       ` Daniel Kahn Gillmor
2016-04-05 23:33         ` David Bremner
2016-04-06  1:39     ` David Bremner
2016-04-02 14:15   ` [PATCH v2 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
2016-04-06  1:52     ` David Bremner
2016-04-02 14:15   ` [PATCH v2 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
2016-04-06  2:04     ` David Bremner
2016-04-02 14:15   ` [PATCH v2 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
2016-04-02 14:15   ` [PATCH v2 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
2016-04-02 16:19   ` [PATCH 1/2] test thread breakage when messages are removed and re-added David Bremner
2016-04-02 16:19     ` [PATCH 2/2] test: add test-binary to print the number of ghost messages David Bremner
2016-04-09  1:02 ` [PATCH v3 1/7] " Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
2016-04-09  1:02   ` [PATCH v3 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
2016-04-09  1:54 ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 2/7] test thread breakage when messages are removed and re-added Daniel Kahn Gillmor
2016-04-11 13:59     ` [PATCH] remove debugging spew from T590 Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 3/7] fix thread breakage via ghost-on-removal Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 4/7] Add internal functions to search for alternate doc types Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 5/7] Introduce _notmuch_message_has_term() Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 6/7] On deletion, replace with ghost when other active messages in thread Daniel Kahn Gillmor
2016-04-09  1:54   ` [PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists Daniel Kahn Gillmor
2016-04-09 11:31     ` David Bremner
2016-04-09 18:55       ` Daniel Kahn Gillmor
2016-04-09 19:15         ` David Bremner
2016-04-10  8:35     ` Tomi Ollila
2016-04-11  0:33     ` David Bremner
2016-04-11 19:18       ` Daniel Kahn Gillmor
2016-04-12  1:28         ` David Bremner
2016-04-15 10:29           ` David Bremner
2016-04-20  3:36         ` Austin Clements
2016-04-09 11:02   ` [PATCH v4 1/7] test: add test-binary to print the number of ghost messages David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).