unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>,
	Notmuch Mail <notmuch@notmuchmail.org>
Subject: [PATCH] WIP: remove all non-prefixed-terms (and stemmed versions)
Date: Sun, 14 Aug 2016 21:43:18 +0900	[thread overview]
Message-ID: <1471178598-9639-1-git-send-email-david@tethera.net> (raw)
In-Reply-To: <1467970047-8013-16-git-send-email-dkg@fifthhorseman.net>

The testing here is not really suitable for production, since we export
a function just for testing.  It would be possible to modify the test
framework to test functions in notmuch-private.h, but this was the quick
and dirty solution.
---

dkg wrote:

> I could find no way to distinguish terms which were added during
>  indexing of the message body from other terms associated with the
>  document.

I think this does the trick. If it makes sense, I can polish it
up. I'd appreciate any ideas about the right way to manage the
testing.  We could either modify the test framework to test internal
functions, or continue on testing only exported functions and the CLI.

 lib/message.cc             | 33 ++++++++++++++++++++++
 lib/notmuch-private.h      |  2 ++
 lib/notmuch.h              |  4 +++
 test/T650-message-terms.sh | 70 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 109 insertions(+)
 create mode 100755 test/T650-message-terms.sh

diff --git a/lib/message.cc b/lib/message.cc
index 9d3e807..9a9845a 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -577,6 +577,39 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
     }
 }
 
+void notmuch_test_clear_terms(notmuch_message_t *message) {
+    _notmuch_message_remove_unprefixed_terms (message);
+    _notmuch_message_sync (message);
+}
+void
+_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message)
+{
+    Xapian::TermIterator i;
+
+    for (i = message->doc.termlist_begin ();
+	 i != message->doc.termlist_end () &&
+	     ((*i).c_str ()[0] < 'A');
+	     i++) {
+	try {
+	    message->doc.remove_term ((*i));
+	    message->modified = TRUE;
+	} catch (const Xapian::InvalidArgumentError) {
+	    /* Ignore failure to remove non-existent term. */
+	}
+    }
+
+    /* We want to remove stemmed terms, but only those not from a
+       prefixed term */
+    for (i.skip_to ("Z["); i != message->doc.termlist_end (); i++) {
+	try {
+	    message->doc.remove_term ((*i));
+	    message->modified = TRUE;
+	} catch (const Xapian::InvalidArgumentError) {
+	    /* Ignore failure to remove non-existent term. */
+	}
+    }
+}
+
 /* Return true if p points at "new" or "cur". */
 static bool is_maildir (const char *p)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 65f7ead..646fc78 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -502,6 +502,8 @@ _notmuch_message_add_reply (notmuch_message_t *message,
 notmuch_database_t *
 _notmuch_message_database (notmuch_message_t *message);
 
+void
+_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
 /* sha1.c */
 
 char *
diff --git a/lib/notmuch.h b/lib/notmuch.h
index e03a05d..e964b1a 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1658,6 +1658,10 @@ notmuch_message_thaw (notmuch_message_t *message);
 void
 notmuch_message_destroy (notmuch_message_t *message);
 
+/* for testing */
+
+void
+notmuch_test_clear_terms(notmuch_message_t *message);
 /**
  * @name Message Properties
  *
diff --git a/test/T650-message-terms.sh b/test/T650-message-terms.sh
new file mode 100755
index 0000000..553e95b
--- /dev/null
+++ b/test/T650-message-terms.sh
@@ -0,0 +1,70 @@
+#!/usr/bin/env bash
+test_description="message API"
+
+. ./test-lib.sh || exit 1
+
+add_email_corpus
+
+cat <<EOF > c_head
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <talloc.h>
+#include <notmuch-test.h>
+
+int main (int argc, char** argv)
+{
+   notmuch_database_t *db;
+   notmuch_message_t *message = NULL;
+   const char *val;
+   notmuch_status_t stat;
+
+   EXPECT0(notmuch_database_open (argv[1], NOTMUCH_DATABASE_MODE_READ_WRITE, &db));
+   EXPECT0(notmuch_database_find_message(db, "4EFC743A.3060609@april.org", &message));
+   if (message == NULL) {
+	fprintf (stderr, "unable to find message");
+	exit (1);
+   }
+EOF
+
+cat <<EOF > c_tail
+   EXPECT0(notmuch_database_destroy(db));
+}
+EOF
+
+add_email_corpus
+
+test_begin_subtest "check unique term"
+byid=$(notmuch count id:4EFC743A.3060609@april.org)
+byterm=$(notmuch count Boulogne)
+test_expect_equal "$byid" "$byterm"
+
+xapian-delve -1 -a ${MAIL_DIR}/.notmuch/xapian > BEFORE
+
+test_begin_subtest "clear non-prefixed terms from message"
+cat c_head - c_tail <<'EOF' | test_C ${MAIL_DIR}
+{
+notmuch_test_clear_terms(message);
+}
+EOF
+byterm=$(notmuch count Boulogne)
+test_expect_equal 0 "$byterm"
+
+test_begin_subtest "check removed terms"
+xapian-delve -1 -a ${MAIL_DIR}/.notmuch/xapian > AFTER
+comm -2 -3 BEFORE AFTER | egrep '^Z?a' > REMOVED
+cat <<EOF > EXPECTED
+Zallan
+Zarch
+Zarch_packaging_standard
+Zarchlinux
+Zaur
+allan
+arch
+arch_packaging_standards
+archlinux
+aur
+EOF
+test_expect_equal_file EXPECTED REMOVED
+
+test_done
-- 
2.8.1

  reply	other threads:[~2016-08-14 23:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08  9:27 Allow indexing cleartext of encrypted messages (v4) Daniel Kahn Gillmor
2016-07-08  9:27 ` [PATCH v4 01/16] add util/search-path.{c, h} to test for executables in $PATH Daniel Kahn Gillmor
2016-08-12  5:51   ` David Bremner
2016-08-12  6:19     ` Daniel Kahn Gillmor
2016-08-12  7:38       ` David Bremner
2016-08-12 18:46         ` Daniel Kahn Gillmor
2016-08-12 20:01           ` Tomi Ollila
2016-08-12 23:03             ` David Bremner
2016-07-08  9:27 ` [PATCH v4 02/16] Move crypto.c into libutil Daniel Kahn Gillmor
2016-08-07 13:32   ` David Bremner
2016-08-12  6:17   ` David Bremner
2016-08-13  8:01     ` Tomi Ollila
2016-08-13  8:27       ` David Bremner
2016-07-08  9:27 ` [PATCH v4 03/16] make shared crypto code behave library-like Daniel Kahn Gillmor
2016-08-12  7:46   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 04/16] Provide _notmuch_crypto_{set,get}_gpg_path Daniel Kahn Gillmor
2016-08-12  8:04   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 05/16] Choose the default gpg_path with _notmuch_crypto_get_gpg_path (NULL) Daniel Kahn Gillmor
2016-07-08  9:27 ` [PATCH v4 06/16] Prefer gpg2 in the test suite if available Daniel Kahn Gillmor
2016-08-12  8:19   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 07/16] create a notmuch_indexopts_t index options object Daniel Kahn Gillmor
2016-07-08  9:27 ` [PATCH v4 08/16] reorganize indexing of multipart/signed and multipart/encrypted Daniel Kahn Gillmor
2016-08-13  4:30   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 09/16] index encrypted parts when asked Daniel Kahn Gillmor
2016-07-14 13:59   ` David Bremner
2016-07-14 16:22     ` Daniel Kahn Gillmor
2016-07-15  0:23       ` David Bremner
2016-07-15  7:46         ` Daniel Kahn Gillmor
2016-08-13 13:23   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 10/16] Add n_d_add_message_with_indexopts (extension of n_d_add_message) Daniel Kahn Gillmor
2016-08-14  0:08   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 11/16] add --try-decrypt to notmuch insert Daniel Kahn Gillmor
2016-08-14  0:16   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 12/16] add --try-decrypt to notmuch new Daniel Kahn Gillmor
2016-08-14  0:22   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 13/16] add indexopts to notmuch python bindings Daniel Kahn Gillmor
2016-08-14  0:41   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 14/16] test indexing cleartext version of delivered messages Daniel Kahn Gillmor
2016-08-14  1:14   ` David Bremner
2016-07-08  9:27 ` [PATCH v4 15/16] added notmuch_message_reindex Daniel Kahn Gillmor
2016-08-14 12:43   ` David Bremner [this message]
2017-04-02 14:52     ` [PATCH] WIP: remove all non-prefixed-terms (and stemmed versions) David Bremner
2016-07-08  9:27 ` [PATCH v4 16/16] add "notmuch reindex" subcommand Daniel Kahn Gillmor
2016-08-14 22:42   ` David Bremner
2016-08-14 23:41     ` Olly Betts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1471178598-9639-1-git-send-email-david@tethera.net \
    --to=david@tethera.net \
    --cc=dkg@fifthhorseman.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).