unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH v2 0/9] cli: alternative address deduplication
@ 2015-09-03 19:39 Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill Jani Nikula
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:39 UTC (permalink / raw)
  To: notmuch

This is v2 of id:cover.1440859765.git.jani@nikula.org addressing review
from the first (or rfc) version. Test are still missing, but everything
else should be in place. (At least patches 1, 2, and 5 could be pushed
without additional tests.)

There's also one bug fix in patch 7. It is necessary to collect all
results before printing for --deduplicate=address, similar to
--output=count.

BR,
Jani.


Jani Nikula (9):
  cli: g_hash_table_lookup_extended is overkill
  cli: abstract new mailbox creation
  cli: add support for not deduplicating notmuch address results
  man: document notmuch address --deduplicate=(no|mailbox) option
  util: move strcase_equal and strcase_hash to util
  cli: change the data structure for notmuch address deduplication
  cli: add support for deduplicating based on case insensitive address
  man: document notmuch address --deduplicate=address option
  cli: do not sort addresses on --output=count or --deduplicate=address

 doc/man1/notmuch-address.rst |  26 ++++++-
 lib/message-file.c           |  21 -----
 lib/notmuch-private.h        |   1 +
 notmuch-client.h             |   1 +
 notmuch-search.c             | 178 +++++++++++++++++++++++++++++++++++++------
 util/string-util.c           |  21 +++++
 util/string-util.h           |   6 ++
 7 files changed, 207 insertions(+), 47 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
@ 2015-09-03 19:39 ` Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 2/9] cli: abstract new mailbox creation Jani Nikula
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:39 UTC (permalink / raw)
  To: notmuch

Switch to normal glib hash table lookup. The extended version is only
required if the values may contain NULL.
---
 notmuch-search.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 3076c3f637b1..7fdc6acaa2fe 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -248,7 +248,6 @@ do_search_threads (search_context_t *ctx)
 static notmuch_bool_t
 is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
 {
-    notmuch_bool_t duplicate;
     char *key;
     mailbox_t *mailbox;
 
@@ -256,20 +255,20 @@ is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
     if (! key)
 	return FALSE;
 
-    duplicate = g_hash_table_lookup_extended (ctx->addresses, key, NULL, (gpointer)&mailbox);
-
-    if (! duplicate) {
-	mailbox = talloc (ctx->format, mailbox_t);
-	mailbox->name = talloc_strdup (mailbox, name);
-	mailbox->addr = talloc_strdup (mailbox, addr);
-	mailbox->count = 1;
-	g_hash_table_insert (ctx->addresses, key, mailbox);
-    } else {
+    mailbox = g_hash_table_lookup (ctx->addresses, key);
+    if (mailbox) {
 	mailbox->count++;
 	talloc_free (key);
+	return TRUE;
     }
 
-    return duplicate;
+    mailbox = talloc (ctx->format, mailbox_t);
+    mailbox->name = talloc_strdup (mailbox, name);
+    mailbox->addr = talloc_strdup (mailbox, addr);
+    mailbox->count = 1;
+    g_hash_table_insert (ctx->addresses, key, mailbox);
+
+    return FALSE;
 }
 
 static void
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/9] cli: abstract new mailbox creation
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill Jani Nikula
@ 2015-09-03 19:39 ` Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results Jani Nikula
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:39 UTC (permalink / raw)
  To: notmuch

We'll be needing more mailbox creation soon, so abstract it
away. While at it, check for allocation failures. No other functional
changes.
---
 notmuch-search.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 7fdc6acaa2fe..36f58eb8d54c 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -243,6 +243,21 @@ do_search_threads (search_context_t *ctx)
     return 0;
 }
 
+static mailbox_t *new_mailbox (void *ctx, const char *name, const char *addr)
+{
+    mailbox_t *mailbox;
+
+    mailbox = talloc (ctx, mailbox_t);
+    if (! mailbox)
+	return NULL;
+
+    mailbox->name = talloc_strdup (mailbox, name);
+    mailbox->addr = talloc_strdup (mailbox, addr);
+    mailbox->count = 1;
+
+    return mailbox;
+}
+
 /* Returns TRUE iff name and addr is duplicate. If not, stores the
  * name/addr pair in order to detect subsequent duplicates. */
 static notmuch_bool_t
@@ -262,10 +277,10 @@ is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
 	return TRUE;
     }
 
-    mailbox = talloc (ctx->format, mailbox_t);
-    mailbox->name = talloc_strdup (mailbox, name);
-    mailbox->addr = talloc_strdup (mailbox, addr);
-    mailbox->count = 1;
+    mailbox = new_mailbox (ctx->format, name, addr);
+    if (! mailbox)
+	return FALSE;
+
     g_hash_table_insert (ctx->addresses, key, mailbox);
 
     return FALSE;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill Jani Nikula
  2015-09-03 19:39 ` [PATCH v2 2/9] cli: abstract new mailbox creation Jani Nikula
@ 2015-09-03 19:39 ` Jani Nikula
  2015-09-04 18:35   ` [PATCH 3½/9] test: notmuch address --deduplicate=no tests Jani Nikula
  2015-09-03 19:40 ` [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option Jani Nikula
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:39 UTC (permalink / raw)
  To: notmuch

Make it possible to use notmuch address as part of a | sort | uniq -c
pipe instead of forcing --output=count. This is useful for combining
results from multiple notmuch address queries.
---
 notmuch-search.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 36f58eb8d54c..66404b561679 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -37,12 +37,18 @@ typedef enum {
 } output_t;
 
 typedef enum {
+    DEDUP_NONE,
+    DEDUP_MAILBOX,
+} dedup_t;
+
+typedef enum {
     NOTMUCH_FORMAT_JSON,
     NOTMUCH_FORMAT_TEXT,
     NOTMUCH_FORMAT_TEXT0,
     NOTMUCH_FORMAT_SEXP
 } format_sel_t;
 
+
 typedef struct {
     notmuch_database_t *notmuch;
     format_sel_t format_sel;
@@ -55,6 +61,7 @@ typedef struct {
     int limit;
     int dupe;
     GHashTable *addresses;
+    dedup_t dedup;
 } search_context_t;
 
 typedef struct {
@@ -355,7 +362,9 @@ process_address_list (const search_context_t *ctx,
 		.count = 0,
 	    };
 
-	    if (is_duplicate (ctx, mbx.name, mbx.addr))
+	    /* OUTPUT_COUNT only works with deduplication */
+	    if (ctx->dedup != DEDUP_NONE &&
+		is_duplicate (ctx, mbx.name, mbx.addr))
 		continue;
 
 	    if (ctx->output & OUTPUT_COUNT)
@@ -656,6 +665,7 @@ static search_context_t search_context = {
     .offset = 0,
     .limit = -1, /* unlimited */
     .dupe = -1,
+    .dedup = DEDUP_MAILBOX,
 };
 
 static const notmuch_opt_desc_t common_options[] = {
@@ -755,6 +765,10 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
 	  (notmuch_keyword_t []){ { "true", NOTMUCH_EXCLUDE_TRUE },
 				  { "false", NOTMUCH_EXCLUDE_FALSE },
 				  { 0, 0 } } },
+	{ NOTMUCH_OPT_KEYWORD, &ctx->dedup, "deduplicate", 'D',
+	  (notmuch_keyword_t []){ { "no", DEDUP_NONE },
+				  { "mailbox", DEDUP_MAILBOX },
+				  { 0, 0 } } },
 	{ NOTMUCH_OPT_INHERIT, (void *) &common_options, NULL, 0, 0 },
 	{ NOTMUCH_OPT_INHERIT, (void *) &notmuch_shared_options, NULL, 0, 0 },
 	{ 0, 0, 0, 0, 0 }
@@ -769,6 +783,11 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
     if (! (ctx->output & (OUTPUT_SENDER | OUTPUT_RECIPIENTS)))
 	ctx->output |= OUTPUT_SENDER;
 
+    if (ctx->output & OUTPUT_COUNT && ctx->dedup == DEDUP_NONE) {
+	fprintf (stderr, "--output=count is not applicable with --deduplicate=no\n");
+	return EXIT_FAILURE;
+    }
+
     if (_notmuch_search_prepare (ctx, config,
 				 argc - opt_index, argv + opt_index))
 	return EXIT_FAILURE;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (2 preceding siblings ...)
  2015-09-03 19:39 ` [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-20 12:45   ` David Bremner
  2015-09-03 19:40 ` [PATCH v2 5/9] util: move strcase_equal and strcase_hash to util Jani Nikula
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

Document the deduplication of results.
---
 doc/man1/notmuch-address.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/doc/man1/notmuch-address.rst b/doc/man1/notmuch-address.rst
index 9570095b23c8..4ae7395bce4d 100644
--- a/doc/man1/notmuch-address.rst
+++ b/doc/man1/notmuch-address.rst
@@ -55,6 +55,21 @@ Supported options for **address** include
             Note: With this option, addresses are printed only after
             the whole search is finished. This may take long time.
 
+    ``--deduplicate=(no|mailbox)``
+
+        Control the deduplication of results.
+
+        **none**
+            Output all occurences of addresses in the matching
+            messages. This is not applicable with --output=count.
+
+        **mailbox**
+	    Deduplicate addresses based on the full, case sensitive
+	    name and email address, or mailbox. This is effectively
+	    the same as piping the --deduplicate=no output to **sort |
+	    uniq**, except for the order of results. This is the
+	    default.
+
     ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
         This option can be used to present results in either
         chronological order (**oldest-first**) or reverse chronological
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/9] util: move strcase_equal and strcase_hash to util
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (3 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-03 19:40 ` [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication Jani Nikula
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

For future use in both cli and lib.
---
 lib/message-file.c    | 21 ---------------------
 lib/notmuch-private.h |  1 +
 util/string-util.c    | 21 +++++++++++++++++++++
 util/string-util.h    |  6 ++++++
 4 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/lib/message-file.c b/lib/message-file.c
index 8ac96e8e06a5..ee305202fffa 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -38,27 +38,6 @@ struct _notmuch_message_file {
 };
 
 static int
-strcase_equal (const void *a, const void *b)
-{
-    return strcasecmp (a, b) == 0;
-}
-
-static unsigned int
-strcase_hash (const void *ptr)
-{
-    const char *s = ptr;
-
-    /* This is the djb2 hash. */
-    unsigned int hash = 5381;
-    while (s && *s) {
-	hash = ((hash << 5) + hash) + tolower (*s);
-	s++;
-    }
-
-    return hash;
-}
-
-static int
 _notmuch_message_file_destructor (notmuch_message_file_t *message)
 {
     if (message->headers)
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index f52b4e4776f9..5dd4770e9619 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -50,6 +50,7 @@ NOTMUCH_BEGIN_DECLS
 
 #include "xutil.h"
 #include "error_util.h"
+#include "string-util.h"
 
 #pragma GCC visibility push(hidden)
 
diff --git a/util/string-util.c b/util/string-util.c
index a90501ee3e70..76c0b9025d0f 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -221,3 +221,24 @@ parse_boolean_term (void *ctx, const char *str,
     errno = err;
     return -1;
 }
+
+int
+strcase_equal (const void *a, const void *b)
+{
+    return strcasecmp (a, b) == 0;
+}
+
+unsigned int
+strcase_hash (const void *ptr)
+{
+    const char *s = ptr;
+
+    /* This is the djb2 hash. */
+    unsigned int hash = 5381;
+    while (s && *s) {
+	hash = ((hash << 5) + hash) + tolower (*s);
+	s++;
+    }
+
+    return hash;
+}
diff --git a/util/string-util.h b/util/string-util.h
index e409cb3d2ab1..80d24d1c1053 100644
--- a/util/string-util.h
+++ b/util/string-util.h
@@ -64,6 +64,12 @@ int
 parse_boolean_term (void *ctx, const char *str,
 		    char **prefix_out, char **term_out);
 
+/* GLib GEqualFunc compatible strcasecmp wrapper */
+int strcase_equal (const void *a, const void *b);
+
+/* GLib GHashFunc compatible case insensitive hash function */
+unsigned int strcase_hash (const void *ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (4 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 5/9] util: move strcase_equal and strcase_hash to util Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-24 12:32   ` David Bremner
  2015-09-03 19:40 ` [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address Jani Nikula
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

Currently we key the address hash table with the case sensitive "name
<address>". Switch to case insensitive keying with just address, and
store the case sensitive name and address in linked lists. This will
be helpful in adding support for different deduplication schemes in
the future.

There will be a slight performance penalty for the current full case
sensitive name + address deduplication, but this is simpler as a whole
when other deduplication schemes are added, and I expect the schemes
to be added to become more popular than the current default.

Aparet from the possible performance penalty, the only user visible
change should be the change in the output ordering for
--output=count. The order is not guaranteed (and is based on hash
table traversal) currently anyway, so this should be of no
consequence.
---
 notmuch-client.h |  1 +
 notmuch-search.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/notmuch-client.h b/notmuch-client.h
index 882aa30563df..97d68d1158ac 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -48,6 +48,7 @@ typedef GMimeCryptoContext notmuch_crypto_context_t;
 #include <dirent.h>
 #include <errno.h>
 #include <signal.h>
+#include <ctype.h>
 
 #include "talloc-extra.h"
 
diff --git a/notmuch-search.c b/notmuch-search.c
index 66404b561679..7c51d5df6bd4 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -265,30 +265,74 @@ static mailbox_t *new_mailbox (void *ctx, const char *name, const char *addr)
     return mailbox;
 }
 
+static int mailbox_compare (const void *v1, const void *v2)
+{
+    const mailbox_t *m1 = v1, *m2 = v2;
+    int v;
+
+    if (m1->name && m2->name)
+	v = strcmp (m1->name, m2->name);
+    else
+	v = !!m1->name - !!m2->name;
+
+    if (! v)
+	v = strcmp (m1->addr, m2->addr);
+
+    return v;
+}
+
 /* Returns TRUE iff name and addr is duplicate. If not, stores the
  * name/addr pair in order to detect subsequent duplicates. */
 static notmuch_bool_t
 is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
 {
     char *key;
+    GList *list, *l;
     mailbox_t *mailbox;
 
-    key = talloc_asprintf (ctx->format, "%s <%s>", name, addr);
-    if (! key)
-	return FALSE;
+    list = g_hash_table_lookup (ctx->addresses, addr);
+    if (list) {
+	mailbox_t find = {
+	    .name = name,
+	    .addr = addr,
+	};
+
+	l = g_list_find_custom (list, &find, mailbox_compare);
+	if (l) {
+	    mailbox = l->data;
+	    mailbox->count++;
+	    return TRUE;
+	}
 
-    mailbox = g_hash_table_lookup (ctx->addresses, key);
-    if (mailbox) {
-	mailbox->count++;
-	talloc_free (key);
-	return TRUE;
+	mailbox = new_mailbox (ctx->format, name, addr);
+	if (! mailbox)
+	    return FALSE;
+
+	/*
+	 * XXX: It would be more efficient to prepend to the list, but
+	 * then we'd have to store the changed list head back to the
+	 * hash table. This check is here just to avoid the compiler
+	 * warning for unused result.
+	 */
+	if (list != g_list_append (list, mailbox))
+	    INTERNAL_ERROR ("appending to list changed list head\n");
+
+	return FALSE;
     }
 
+    key = talloc_strdup (ctx->format, addr);
+    if (! key)
+	return FALSE;
+
     mailbox = new_mailbox (ctx->format, name, addr);
     if (! mailbox)
 	return FALSE;
 
-    g_hash_table_insert (ctx->addresses, key, mailbox);
+    list = g_list_append (NULL, mailbox);
+    if (! list)
+	return FALSE;
+
+    g_hash_table_insert (ctx->addresses, key, list);
 
     return FALSE;
 }
@@ -401,12 +445,21 @@ _talloc_free_for_g_hash (void *ptr)
 }
 
 static void
-print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
+_list_free_for_g_hash (void *ptr)
+{
+    g_list_free_full (ptr, _talloc_free_for_g_hash);
+}
+
+static void
+print_list_value (void *mailbox, void *context)
 {
-    const mailbox_t *mailbox = value;
-    search_context_t *ctx = user_data;
+    print_mailbox (context, mailbox);
+}
 
-    print_mailbox (ctx, mailbox);
+static void
+print_hash_value (unused (void *key), void *list, void *context)
+{
+    g_list_foreach (list, print_list_value, context);
 }
 
 static int
@@ -792,8 +845,9 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
 				 argc - opt_index, argv + opt_index))
 	return EXIT_FAILURE;
 
-    ctx->addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
-					    _talloc_free_for_g_hash, _talloc_free_for_g_hash);
+    ctx->addresses = g_hash_table_new_full (strcase_hash, strcase_equal,
+					    _talloc_free_for_g_hash,
+					    _list_free_for_g_hash);
 
     ret = do_search_messages (ctx);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (5 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-04 18:38   ` [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests Jani Nikula
  2015-09-03 19:40 ` [PATCH v2 8/9] man: document notmuch address --deduplicate=address option Jani Nikula
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

Consider all variants of an email address as one, and print the most
common variant.
---
 notmuch-search.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 7c51d5df6bd4..deb9e58a747c 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -39,6 +39,7 @@ typedef enum {
 typedef enum {
     DEDUP_NONE,
     DEDUP_MAILBOX,
+    DEDUP_ADDRESS,
 } dedup_t;
 
 typedef enum {
@@ -352,7 +353,7 @@ print_mailbox (const search_context_t *ctx, const mailbox_t *mailbox)
     name_addr = internet_address_to_string (ia, FALSE);
 
     if (format->is_text_printer) {
-	if (count > 0) {
+	if (ctx->output & OUTPUT_COUNT) {
 	    format->integer (format, count);
 	    format->string (format, "\t");
 	}
@@ -366,7 +367,7 @@ print_mailbox (const search_context_t *ctx, const mailbox_t *mailbox)
 	format->string (format, addr);
 	format->map_key (format, "name-addr");
 	format->string (format, name_addr);
-	if (count > 0) {
+	if (ctx->output & OUTPUT_COUNT) {
 	    format->map_key (format, "count");
 	    format->integer (format, count);
 	}
@@ -403,7 +404,6 @@ process_address_list (const search_context_t *ctx,
 	    mailbox_t mbx = {
 		.name = internet_address_get_name (address),
 		.addr = internet_address_mailbox_get_addr (mailbox),
-		.count = 0,
 	    };
 
 	    /* OUTPUT_COUNT only works with deduplication */
@@ -411,7 +411,8 @@ process_address_list (const search_context_t *ctx,
 		is_duplicate (ctx, mbx.name, mbx.addr))
 		continue;
 
-	    if (ctx->output & OUTPUT_COUNT)
+	    /* OUTPUT_COUNT and DEDUP_ADDRESS require a full pass. */
+	    if (ctx->output & OUTPUT_COUNT || ctx->dedup == DEDUP_ADDRESS)
 		continue;
 
 	    print_mailbox (ctx, &mbx);
@@ -450,6 +451,34 @@ _list_free_for_g_hash (void *ptr)
     g_list_free_full (ptr, _talloc_free_for_g_hash);
 }
 
+/* Print the most common variant of a list of unique mailboxes, and
+ * conflate the counts. */
+static void
+print_popular (const search_context_t *ctx, GList *list)
+{
+    GList *l;
+    mailbox_t *mailbox = NULL, *m;
+    int max = 0;
+    int total = 0;
+
+    for (l = list; l; l = l->next) {
+	m = l->data;
+	total += m->count;
+	if (m->count > max) {
+	    mailbox = m;
+	    max = m->count;
+	}
+    }
+
+    if (! mailbox)
+	INTERNAL_ERROR("Empty list in address hash table\n");
+
+    /* The original count is no longer needed, so overwrite. */
+    mailbox->count = total;
+
+    print_mailbox (ctx, mailbox);
+}
+
 static void
 print_list_value (void *mailbox, void *context)
 {
@@ -459,7 +488,12 @@ print_list_value (void *mailbox, void *context)
 static void
 print_hash_value (unused (void *key), void *list, void *context)
 {
-    g_list_foreach (list, print_list_value, context);
+    const search_context_t *ctx = context;
+
+    if (ctx->dedup == DEDUP_ADDRESS)
+	print_popular (ctx, list);
+    else
+	g_list_foreach (list, print_list_value, context);
 }
 
 static int
@@ -557,7 +591,8 @@ do_search_messages (search_context_t *ctx)
 	notmuch_message_destroy (message);
     }
 
-    if (ctx->addresses && ctx->output & OUTPUT_COUNT)
+    if (ctx->addresses &&
+	(ctx->output & OUTPUT_COUNT || ctx->dedup == DEDUP_ADDRESS))
 	g_hash_table_foreach (ctx->addresses, print_hash_value, ctx);
 
     notmuch_messages_destroy (messages);
@@ -821,6 +856,7 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
 	{ NOTMUCH_OPT_KEYWORD, &ctx->dedup, "deduplicate", 'D',
 	  (notmuch_keyword_t []){ { "no", DEDUP_NONE },
 				  { "mailbox", DEDUP_MAILBOX },
+				  { "address", DEDUP_ADDRESS },
 				  { 0, 0 } } },
 	{ NOTMUCH_OPT_INHERIT, (void *) &common_options, NULL, 0, 0 },
 	{ NOTMUCH_OPT_INHERIT, (void *) &notmuch_shared_options, NULL, 0, 0 },
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 8/9] man: document notmuch address --deduplicate=address option
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (6 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-03 19:40 ` [PATCH v2 9/9] cli: do not sort addresses on --output=count or --deduplicate=address Jani Nikula
  2015-09-07 12:52 ` [PATCH v2 0/9] cli: alternative address deduplication David Bremner
  9 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

Document the deduplication based on case insensitive address.
---
 doc/man1/notmuch-address.rst | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/man1/notmuch-address.rst b/doc/man1/notmuch-address.rst
index 4ae7395bce4d..b85cc37213e5 100644
--- a/doc/man1/notmuch-address.rst
+++ b/doc/man1/notmuch-address.rst
@@ -55,7 +55,7 @@ Supported options for **address** include
             Note: With this option, addresses are printed only after
             the whole search is finished. This may take long time.
 
-    ``--deduplicate=(no|mailbox)``
+    ``--deduplicate=(no|mailbox|address)``
 
         Control the deduplication of results.
 
@@ -70,6 +70,13 @@ Supported options for **address** include
 	    uniq**, except for the order of results. This is the
 	    default.
 
+        **address**
+            Deduplicate addresses based on the case insensitive
+            address part of the mailbox. Of all the variants (with
+            different name or case), print the one occurring most
+            frequently among the matching messages. If --output=count
+            is specified, include all variants in the count.
+
     ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
         This option can be used to present results in either
         chronological order (**oldest-first**) or reverse chronological
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 9/9] cli: do not sort addresses on --output=count or --deduplicate=address
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (7 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 8/9] man: document notmuch address --deduplicate=address option Jani Nikula
@ 2015-09-03 19:40 ` Jani Nikula
  2015-09-07 12:52 ` [PATCH v2 0/9] cli: alternative address deduplication David Bremner
  9 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-03 19:40 UTC (permalink / raw)
  To: notmuch

The order of the results with --output=count and --deduplicate=address
are unspecified as they're based on a hash table traversal. This being
the case, optimize the query by explicitly requesting unsorted
results. Clarify the documentation accordingly.
---
 doc/man1/notmuch-address.rst | 4 +++-
 notmuch-search.c             | 5 +++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/man1/notmuch-address.rst b/doc/man1/notmuch-address.rst
index b85cc37213e5..cbdb851980a8 100644
--- a/doc/man1/notmuch-address.rst
+++ b/doc/man1/notmuch-address.rst
@@ -85,7 +85,9 @@ Supported options for **address** include
         By default, results will be displayed in reverse chronological
         order, (that is, the newest results will be displayed first).
 
-        This option is not supported with --output=count.
+        However, if either --output=count or --deduplicate=address is
+        specified, this option is ignored and the order of the results
+        is unspecified.
 
     ``--exclude=(true|false)``
         A message is called "excluded" if it matches at least one tag in
diff --git a/notmuch-search.c b/notmuch-search.c
index deb9e58a747c..b58496bf09d5 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -885,6 +885,11 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
 					    _talloc_free_for_g_hash,
 					    _list_free_for_g_hash);
 
+    /* The order is not guaranteed if a full pass is required, so go
+     * for fastest. */
+    if (ctx->output & OUTPUT_COUNT || ctx->dedup == DEDUP_ADDRESS)
+	notmuch_query_set_sort (ctx->query, NOTMUCH_SORT_UNSORTED);
+
     ret = do_search_messages (ctx);
 
     g_hash_table_unref (ctx->addresses);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3½/9] test: notmuch address --deduplicate=no tests
  2015-09-03 19:39 ` [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results Jani Nikula
@ 2015-09-04 18:35   ` Jani Nikula
  2015-09-20 12:43     ` David Bremner
  0 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-04 18:35 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Test not using address deduplication. Incorporate some trivial output
sorting tests here, as they seem to lack tests.
---
 test/T095-address.sh | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/test/T095-address.sh b/test/T095-address.sh
index 8eecb2a6a7ba..f8d902cb3a5e 100755
--- a/test/T095-address.sh
+++ b/test/T095-address.sh
@@ -145,4 +145,74 @@ cat <<EOF >EXPECTED
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "--deduplicate=no --sort=oldest-first --output=sender"
+notmuch address --deduplicate=no --sort=oldest-first --output=sender '*' >OUTPUT
+cat <<EOF >EXPECTED
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Carl Worth <cworth@cworth.org>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Alex Botero-Lowry <alex.boterolowry@gmail.com>
+Carl Worth <cworth@cworth.org>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Keith Packard <keithp@keithp.com>
+Keith Packard <keithp@keithp.com>
+Keith Packard <keithp@keithp.com>
+Jan Janak <jan@ryngle.com>
+Jan Janak <jan@ryngle.com>
+Jan Janak <jan@ryngle.com>
+Israel Herraiz <isra@herraiz.org>
+Adrian Perez de Castro <aperez@igalia.com>
+Aron Griffis <agriffis@n01se.net>
+Ingmar Vanhassel <ingmar@exherbo.org>
+Alex Botero-Lowry <alex.boterolowry@gmail.com>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Stewart Smith <stewart@flamingspork.com>
+Stewart Smith <stewart@flamingspork.com>
+Keith Packard <keithp@keithp.com>
+Keith Packard <keithp@keithp.com>
+Keith Packard <keithp@keithp.com>
+Stewart Smith <stewart@flamingspork.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Jan Janak <jan@ryngle.com>
+Rolland Santimano <rollandsantimano@yahoo.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Keith Packard <keithp@keithp.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Carl Worth <cworth@cworth.org>
+Chris Wilson <chris@chris-wilson.co.uk>
+Olivier Berger <olivier.berger@it-sudparis.eu>
+François Boulogne <boulogne.f@gmail.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--deduplicate=no --sort=newest-first --output=sender --output=recipients"
+notmuch address --deduplicate=no --sort=newest-first --output=sender --output=recipients path:foo/new >OUTPUT
+cat <<EOF >EXPECTED
+Mikhail Gusarov <dottedmag@dottedmag.net>
+notmuch@notmuchmail.org
+Mikhail Gusarov <dottedmag@dottedmag.net>
+notmuch@notmuchmail.org
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+notmuch@notmuchmail.org
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 test_done
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests
  2015-09-03 19:40 ` [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address Jani Nikula
@ 2015-09-04 18:38   ` Jani Nikula
  2015-09-25  0:02     ` David Bremner
  0 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-04 18:38 UTC (permalink / raw)
  To: Jani Nikula, notmuch

First a simple smoke test first, next generate messages with multiple
email address variants and check the behaviour of deduplication
schemes with these.
---
 test/T095-address.sh | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/test/T095-address.sh b/test/T095-address.sh
index f8d902cb3a5e..54ac713a5227 100755
--- a/test/T095-address.sh
+++ b/test/T095-address.sh
@@ -215,4 +215,67 @@ notmuch@notmuchmail.org
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "--deduplicate=address --output=sender --output=recipients"
+notmuch address --deduplicate=address --output=sender --output=recipients '*' | sort >OUTPUT
+cat <<EOF >EXPECTED
+"Discussion about the Arch User Repository (AUR)" <aur-general@archlinux.org>
+Adrian Perez de Castro <aperez@igalia.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Allan McRae <allan@archlinux.org>
+Aron Griffis <agriffis@n01se.net>
+Carl Worth <cworth@cworth.org>
+Chris Wilson <chris@chris-wilson.co.uk>
+François Boulogne <boulogne.f@gmail.com>
+Ingmar Vanhassel <ingmar@exherbo.org>
+Israel Herraiz <isra@herraiz.org>
+Jan Janak <jan@ryngle.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Keith Packard <keithp@keithp.com>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Olivier Berger <olivier.berger@it-sudparis.eu>
+Rolland Santimano <rollandsantimano@yahoo.com>
+Stewart Smith <stewart@flamingspork.com>
+notmuch@notmuchmail.org
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+generate_message '[from]="Foo Bar <foo.bar@example.com>"'
+generate_message '[from]="Foo Bar <Foo.Bar@Example.Com>"'
+generate_message '[from]="Foo Bar <foo.bar@example.com>"'
+generate_message '[from]="<foo.bar@example.com>"'
+generate_message '[from]="foo.bar@example.com"'
+generate_message '[from]="Foo Bar <foo.bar+baz@example.com>"'
+notmuch new > /dev/null
+
+test_begin_subtest "--deduplicate=no --output=sender"
+notmuch address --deduplicate=no --output=sender from:example.com | sort >OUTPUT
+cat <<EOF >EXPECTED
+Foo Bar <Foo.Bar@Example.Com>
+Foo Bar <foo.bar+baz@example.com>
+Foo Bar <foo.bar@example.com>
+Foo Bar <foo.bar@example.com>
+foo.bar@example.com
+foo.bar@example.com
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--deduplicate=mailbox --output=sender --output=count"
+notmuch address --deduplicate=mailbox --output=sender --output=count from:example.com | sort -n >OUTPUT
+cat <<EOF >EXPECTED
+1	Foo Bar <Foo.Bar@Example.Com>
+1	Foo Bar <foo.bar+baz@example.com>
+2	Foo Bar <foo.bar@example.com>
+2	foo.bar@example.com
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--deduplicate=address --output=sender --output=count"
+notmuch address --deduplicate=address --output=sender --output=count from:example.com | sort -n >OUTPUT
+cat <<EOF >EXPECTED
+1	Foo Bar <foo.bar+baz@example.com>
+5	Foo Bar <foo.bar@example.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 test_done
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/9] cli: alternative address deduplication
  2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
                   ` (8 preceding siblings ...)
  2015-09-03 19:40 ` [PATCH v2 9/9] cli: do not sort addresses on --output=count or --deduplicate=address Jani Nikula
@ 2015-09-07 12:52 ` David Bremner
  2015-09-26 10:48   ` David Bremner
  9 siblings, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-07 12:52 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> This is v2 of id:cover.1440859765.git.jani@nikula.org addressing review
> from the first (or rfc) version. Test are still missing, but everything
> else should be in place. (At least patches 1, 2, and 5 could be pushed
> without additional tests.)

I pushed 1, 2, and 5.

d

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3½/9] test: notmuch address --deduplicate=no tests
  2015-09-04 18:35   ` [PATCH 3½/9] test: notmuch address --deduplicate=no tests Jani Nikula
@ 2015-09-20 12:43     ` David Bremner
  2015-09-23 18:56       ` Jani Nikula
  0 siblings, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-20 12:43 UTC (permalink / raw)
  To: Jani Nikula, Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> Test not using address deduplication. Incorporate some trivial output
> sorting tests here, as they seem to lack tests.
> ---
>  test/T095-address.sh | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>
> diff --git a/test/T095-address.sh b/test/T095-address.sh
> index 8eecb2a6a7ba..f8d902cb3a5e 100755
> --- a/test/T095-address.sh
> +++ b/test/T095-address.sh
> @@ -145,4 +145,74 @@ cat <<EOF >EXPECTED
>  EOF
>  test_expect_equal_file OUTPUT EXPECTED
>  
> +test_begin_subtest "--deduplicate=no --sort=oldest-first --output=sender"
> +notmuch address --deduplicate=no --sort=oldest-first --output=sender '*' >OUTPUT
> +cat <<EOF >EXPECTED
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +Carl Worth <cworth@cworth.org>

> +
> +test_begin_subtest "--deduplicate=no --sort=newest-first --output=sender --output=recipients"
> +notmuch address --deduplicate=no --sort=newest-first --output=sender --output=recipients path:foo/new >OUTPUT
> +cat <<EOF >EXPECTED
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +notmuch@notmuchmail.org
> +Mikhail Gusarov <dottedmag@dottedmag.net>

Two comments:

1) It's suprising that Mikail is first in both the newest-first and
oldest-first list. Is it easy to explain why?

2) the use of path: term is not mentioned in the description. Should it
be mentioned somehow?

d

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option
  2015-09-03 19:40 ` [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option Jani Nikula
@ 2015-09-20 12:45   ` David Bremner
  2015-09-23 19:31     ` [PATCH] " Jani Nikula
  0 siblings, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-20 12:45 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> Document the deduplication of results.
> ---
>  doc/man1/notmuch-address.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/doc/man1/notmuch-address.rst b/doc/man1/notmuch-address.rst
> index 9570095b23c8..4ae7395bce4d 100644
> --- a/doc/man1/notmuch-address.rst
> +++ b/doc/man1/notmuch-address.rst
> @@ -55,6 +55,21 @@ Supported options for **address** include
>              Note: With this option, addresses are printed only after
>              the whole search is finished. This may take long time.
>  
> +    ``--deduplicate=(no|mailbox)``
> +
> +        Control the deduplication of results.
> +
> +        **none**
> +            Output all occurences of addresses in the matching
> +            messages. This is not applicable with --output=count.
should be be **no**, I guess.
> +
> +        **mailbox**
> +	    Deduplicate addresses based on the full, case sensitive
> +	    name and email address, or mailbox. This is effectively
> +	    the same as piping the --deduplicate=no output to **sort |
> +	    uniq**, except for the order of results. This is the

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3½/9] test: notmuch address --deduplicate=no tests
  2015-09-20 12:43     ` David Bremner
@ 2015-09-23 18:56       ` Jani Nikula
  0 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-23 18:56 UTC (permalink / raw)
  To: David Bremner, notmuch

On Sun, 20 Sep 2015, David Bremner <david@tethera.net> wrote:
> Jani Nikula <jani@nikula.org> writes:
>
>> Test not using address deduplication. Incorporate some trivial output
>> sorting tests here, as they seem to lack tests.
>> ---
>>  test/T095-address.sh | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 70 insertions(+)
>>
>> diff --git a/test/T095-address.sh b/test/T095-address.sh
>> index 8eecb2a6a7ba..f8d902cb3a5e 100755
>> --- a/test/T095-address.sh
>> +++ b/test/T095-address.sh
>> @@ -145,4 +145,74 @@ cat <<EOF >EXPECTED
>>  EOF
>>  test_expect_equal_file OUTPUT EXPECTED
>>  
>> +test_begin_subtest "--deduplicate=no --sort=oldest-first --output=sender"
>> +notmuch address --deduplicate=no --sort=oldest-first --output=sender '*' >OUTPUT
>> +cat <<EOF >EXPECTED
>> +Mikhail Gusarov <dottedmag@dottedmag.net>
>> +Mikhail Gusarov <dottedmag@dottedmag.net>
>> +Carl Worth <cworth@cworth.org>
>
>> +
>> +test_begin_subtest "--deduplicate=no --sort=newest-first --output=sender --output=recipients"
>> +notmuch address --deduplicate=no --sort=newest-first --output=sender --output=recipients path:foo/new >OUTPUT
>> +cat <<EOF >EXPECTED
>> +Mikhail Gusarov <dottedmag@dottedmag.net>
>> +notmuch@notmuchmail.org
>> +Mikhail Gusarov <dottedmag@dottedmag.net>
>
> Two comments:
>
> 1) It's suprising that Mikail is first in both the newest-first and
> oldest-first list. Is it easy to explain why?
>
> 2) the use of path: term is not mentioned in the description. Should it
> be mentioned somehow?

I wanted to have a smaller set of results in the second test, and path:
was an easy filter so I could look at the files there to construct the
expected results. That also explains the first part; it just so happened
that the folder I arbitrarily picked happened to have Mikail there.

BR,
Jani.


>
> d

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] man: document notmuch address --deduplicate=(no|mailbox) option
  2015-09-20 12:45   ` David Bremner
@ 2015-09-23 19:31     ` Jani Nikula
  2015-09-24 10:37       ` David Bremner
  0 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-23 19:31 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

Document the deduplication of results.
---
 doc/man1/notmuch-address.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/doc/man1/notmuch-address.rst b/doc/man1/notmuch-address.rst
index 9570095b23c8..6003ed29cf32 100644
--- a/doc/man1/notmuch-address.rst
+++ b/doc/man1/notmuch-address.rst
@@ -55,6 +55,21 @@ Supported options for **address** include
             Note: With this option, addresses are printed only after
             the whole search is finished. This may take long time.
 
+    ``--deduplicate=(no|mailbox)``
+
+        Control the deduplication of results.
+
+        **no**
+            Output all occurences of addresses in the matching
+            messages. This is not applicable with --output=count.
+
+        **mailbox**
+	    Deduplicate addresses based on the full, case sensitive
+	    name and email address, or mailbox. This is effectively
+	    the same as piping the --deduplicate=no output to **sort |
+	    uniq**, except for the order of results. This is the
+	    default.
+
     ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
         This option can be used to present results in either
         chronological order (**oldest-first**) or reverse chronological
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] man: document notmuch address --deduplicate=(no|mailbox) option
  2015-09-23 19:31     ` [PATCH] " Jani Nikula
@ 2015-09-24 10:37       ` David Bremner
  0 siblings, 0 replies; 28+ messages in thread
From: David Bremner @ 2015-09-24 10:37 UTC (permalink / raw)
  To: Jani Nikula, Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> Document the deduplication of results.

pushed the series up to this patch.

d

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-03 19:40 ` [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication Jani Nikula
@ 2015-09-24 12:32   ` David Bremner
  2015-09-24 12:40     ` David Bremner
  2015-09-24 18:34     ` Jani Nikula
  0 siblings, 2 replies; 28+ messages in thread
From: David Bremner @ 2015-09-24 12:32 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:


> +    else
> +	v = !!m1->name - !!m2->name;

Is this really idiomatic? It seems a little difficult to follow to me.

>  /* Returns TRUE iff name and addr is duplicate. If not, stores the
>   * name/addr pair in order to detect subsequent duplicates. */
>  static notmuch_bool_t
>  is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
>  {
>      char *key;
> +    GList *list, *l;
>      mailbox_t *mailbox;
>  
> -    key = talloc_asprintf (ctx->format, "%s <%s>", name, addr);
> -    if (! key)
> -	return FALSE;
> +    list = g_hash_table_lookup (ctx->addresses, addr);
> +    if (list) {
> +	mailbox_t find = {
> +	    .name = name,
> +	    .addr = addr,
> +	};
> +
> +	l = g_list_find_custom (list, &find, mailbox_compare);
> +	if (l) {
> +	    mailbox = l->data;
> +	    mailbox->count++;
> +	    return TRUE;
> +	}
>  
> -    mailbox = g_hash_table_lookup (ctx->addresses, key);
> -    if (mailbox) {
> -	mailbox->count++;
> -	talloc_free (key);
> -	return TRUE;
> +	mailbox = new_mailbox (ctx->format, name, addr);
> +	if (! mailbox)
> +	    return FALSE;
> +
> +	/*
> +	 * XXX: It would be more efficient to prepend to the list, but
> +	 * then we'd have to store the changed list head back to the
> +	 * hash table. This check is here just to avoid the compiler
> +	 * warning for unused result.
> +	 */
> +	if (list != g_list_append (list, mailbox))
> +	    INTERNAL_ERROR ("appending to list changed list head\n");
> +
> +	return FALSE;
>      }
>  
> +    key = talloc_strdup (ctx->format, addr);
> +    if (! key)
> +	return FALSE;
> +
>      mailbox = new_mailbox (ctx->format, name, addr);
>      if (! mailbox)
>  	return FALSE;
>  
> -    g_hash_table_insert (ctx->addresses, key, mailbox);
> +    list = g_list_append (NULL, mailbox);
> +    if (! list)
> +	return FALSE;
> +
> +    g_hash_table_insert (ctx->addresses, key, list);
>  
>      return FALSE;
>  }
> @@ -401,12 +445,21 @@ _talloc_free_for_g_hash (void *ptr)
>  }
>  
>  static void
> -print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
> +_list_free_for_g_hash (void *ptr)
> +{
> +    g_list_free_full (ptr, _talloc_free_for_g_hash);
> +}
> +
> +static void
> +print_list_value (void *mailbox, void *context)
>  {
> -    const mailbox_t *mailbox = value;
> -    search_context_t *ctx = user_data;
> +    print_mailbox (context, mailbox);
> +}
>  
> -    print_mailbox (ctx, mailbox);
> +static void
> +print_hash_value (unused (void *key), void *list, void *context)
> +{
> +    g_list_foreach (list, print_list_value, context);
>  }
>  
>  static int
> @@ -792,8 +845,9 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
>  				 argc - opt_index, argv + opt_index))
>  	return EXIT_FAILURE;
>  
> -    ctx->addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
> -					    _talloc_free_for_g_hash, _talloc_free_for_g_hash);
> +    ctx->addresses = g_hash_table_new_full (strcase_hash, strcase_equal,
> +					    _talloc_free_for_g_hash,
> +					    _list_free_for_g_hash);
>  
>      ret = do_search_messages (ctx);
>  
> -- 
> 2.1.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-24 12:32   ` David Bremner
@ 2015-09-24 12:40     ` David Bremner
  2015-09-24 19:55       ` Tomi Ollila
  2015-09-24 18:34     ` Jani Nikula
  1 sibling, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-24 12:40 UTC (permalink / raw)
  To: Jani Nikula, notmuch

David Bremner <david@tethera.net> writes:

> Jani Nikula <jani@nikula.org> writes:
>
>
>> +    else
>> +	v = !!m1->name - !!m2->name;
>
> Is this really idiomatic? It seems a little difficult to follow to me.
>

Sorry, this stupid MUA I'm using confused me and I sent the reply
halfway through reading the patch. Anyway, the rest looks OK.

d

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-24 12:32   ` David Bremner
  2015-09-24 12:40     ` David Bremner
@ 2015-09-24 18:34     ` Jani Nikula
  2015-09-24 23:31       ` David Bremner
  1 sibling, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-24 18:34 UTC (permalink / raw)
  To: David Bremner, notmuch

On Thu, 24 Sep 2015, David Bremner <david@tethera.net> wrote:
> Jani Nikula <jani@nikula.org> writes:
>
>
>> +    else
>> +	v = !!m1->name - !!m2->name;
>
> Is this really idiomatic? It seems a little difficult to follow to me.

Probably depends on whether you're accustomed to using !! for
"normalizing" zero and non-zero to 0 and 1, respectively.

The alternative seemed a bit too verbose for my liking:

    if (m1->name && m2->name)
        v = strcmp (m1->name, m2->name);
    else if (!m1->name && !m2->name)
        v = 0;
    else if (m1->name)
        v = 1;
    else
        v = -1;

I can live with that if you think the verbosity is in order here. (Or am
I missing an obvious better alternative?)

BR,
Jani.


>
>>  /* Returns TRUE iff name and addr is duplicate. If not, stores the
>>   * name/addr pair in order to detect subsequent duplicates. */
>>  static notmuch_bool_t
>>  is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
>>  {
>>      char *key;
>> +    GList *list, *l;
>>      mailbox_t *mailbox;
>>  
>> -    key = talloc_asprintf (ctx->format, "%s <%s>", name, addr);
>> -    if (! key)
>> -	return FALSE;
>> +    list = g_hash_table_lookup (ctx->addresses, addr);
>> +    if (list) {
>> +	mailbox_t find = {
>> +	    .name = name,
>> +	    .addr = addr,
>> +	};
>> +
>> +	l = g_list_find_custom (list, &find, mailbox_compare);
>> +	if (l) {
>> +	    mailbox = l->data;
>> +	    mailbox->count++;
>> +	    return TRUE;
>> +	}
>>  
>> -    mailbox = g_hash_table_lookup (ctx->addresses, key);
>> -    if (mailbox) {
>> -	mailbox->count++;
>> -	talloc_free (key);
>> -	return TRUE;
>> +	mailbox = new_mailbox (ctx->format, name, addr);
>> +	if (! mailbox)
>> +	    return FALSE;
>> +
>> +	/*
>> +	 * XXX: It would be more efficient to prepend to the list, but
>> +	 * then we'd have to store the changed list head back to the
>> +	 * hash table. This check is here just to avoid the compiler
>> +	 * warning for unused result.
>> +	 */
>> +	if (list != g_list_append (list, mailbox))
>> +	    INTERNAL_ERROR ("appending to list changed list head\n");
>> +
>> +	return FALSE;
>>      }
>>  
>> +    key = talloc_strdup (ctx->format, addr);
>> +    if (! key)
>> +	return FALSE;
>> +
>>      mailbox = new_mailbox (ctx->format, name, addr);
>>      if (! mailbox)
>>  	return FALSE;
>>  
>> -    g_hash_table_insert (ctx->addresses, key, mailbox);
>> +    list = g_list_append (NULL, mailbox);
>> +    if (! list)
>> +	return FALSE;
>> +
>> +    g_hash_table_insert (ctx->addresses, key, list);
>>  
>>      return FALSE;
>>  }
>> @@ -401,12 +445,21 @@ _talloc_free_for_g_hash (void *ptr)
>>  }
>>  
>>  static void
>> -print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
>> +_list_free_for_g_hash (void *ptr)
>> +{
>> +    g_list_free_full (ptr, _talloc_free_for_g_hash);
>> +}
>> +
>> +static void
>> +print_list_value (void *mailbox, void *context)
>>  {
>> -    const mailbox_t *mailbox = value;
>> -    search_context_t *ctx = user_data;
>> +    print_mailbox (context, mailbox);
>> +}
>>  
>> -    print_mailbox (ctx, mailbox);
>> +static void
>> +print_hash_value (unused (void *key), void *list, void *context)
>> +{
>> +    g_list_foreach (list, print_list_value, context);
>>  }
>>  
>>  static int
>> @@ -792,8 +845,9 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
>>  				 argc - opt_index, argv + opt_index))
>>  	return EXIT_FAILURE;
>>  
>> -    ctx->addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
>> -					    _talloc_free_for_g_hash, _talloc_free_for_g_hash);
>> +    ctx->addresses = g_hash_table_new_full (strcase_hash, strcase_equal,
>> +					    _talloc_free_for_g_hash,
>> +					    _list_free_for_g_hash);
>>  
>>      ret = do_search_messages (ctx);
>>  
>> -- 
>> 2.1.4
>>
>> _______________________________________________
>> notmuch mailing list
>> notmuch@notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-24 12:40     ` David Bremner
@ 2015-09-24 19:55       ` Tomi Ollila
  0 siblings, 0 replies; 28+ messages in thread
From: Tomi Ollila @ 2015-09-24 19:55 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

On Thu, Sep 24 2015, David Bremner <david@tethera.net> wrote:

> David Bremner <david@tethera.net> writes:
>
>> Jani Nikula <jani@nikula.org> writes:
>>
>>
>>> +    else
>>> +	v = !!m1->name - !!m2->name;
>>
>> Is this really idiomatic? It seems a little difficult to follow to me.
>>

For me it took a while to grasp the meaning of that but after I spent
a minute looking at it it become clear to me. this part +1


> Sorry, this stupid MUA I'm using confused me and I sent the reply
> halfway through reading the patch. Anyway, the rest looks OK.
>
> d

Tomi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication
  2015-09-24 18:34     ` Jani Nikula
@ 2015-09-24 23:31       ` David Bremner
  2015-09-25 16:48         ` [PATCH 6/9 v3 part 1/2] util: add strcmp_null, a strcmp that handles NULL parameters Jani Nikula
  0 siblings, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-24 23:31 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> On Thu, 24 Sep 2015, David Bremner <david@tethera.net> wrote:
>> Jani Nikula <jani@nikula.org> writes:
>>
>>
>>> +    else
>>> +	v = !!m1->name - !!m2->name;
>>
>> Is this really idiomatic? It seems a little difficult to follow to me.
>
> Probably depends on whether you're accustomed to using !! for
> "normalizing" zero and non-zero to 0 and 1, respectively.
>
> The alternative seemed a bit too verbose for my liking:
>
>     if (m1->name && m2->name)
>         v = strcmp (m1->name, m2->name);
>     else if (!m1->name && !m2->name)
>         v = 0;
>     else if (m1->name)
>         v = 1;
>     else
>         v = -1;
>

What about adding the verbose version to string-util.c as
e.g. strcmp_null. Theres apparently similar functions in the linux
kernel, gnome-vfs, subversion...

https://codesearch.debian.net/results/strcmp_null/page_0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests
  2015-09-04 18:38   ` [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests Jani Nikula
@ 2015-09-25  0:02     ` David Bremner
  2015-09-25 17:08       ` [PATCH v2 " Jani Nikula
  0 siblings, 1 reply; 28+ messages in thread
From: David Bremner @ 2015-09-25  0:02 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula <jani@nikula.org> writes:

> +generate_message '[from]="Foo Bar <foo.bar@example.com>"'
> +generate_message '[from]="Foo Bar <Foo.Bar@Example.Com>"'
> +generate_message '[from]="Foo Bar <foo.bar@example.com>"'
> +generate_message '[from]="<foo.bar@example.com>"'
> +generate_message '[from]="foo.bar@example.com"'
> +generate_message '[from]="Foo Bar <foo.bar+baz@example.com>"'

It might be good, especially for the deduplicate=mailbox case below to
have some more variation in the names other than "Foo Bar" and ""

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 6/9 v3 part 1/2] util: add strcmp_null, a strcmp that handles NULL parameters
  2015-09-24 23:31       ` David Bremner
@ 2015-09-25 16:48         ` Jani Nikula
  2015-09-25 16:48           ` [PATCH 6/9 v3 part 2/2] cli: change the data structure for notmuch address deduplication Jani Nikula
  0 siblings, 1 reply; 28+ messages in thread
From: Jani Nikula @ 2015-09-25 16:48 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

Add strcmp_null, a strcmp that handles NULL strings; in strcmp terms a
NULL string is considered to be less than a non-NULL string.
---
 util/string-util.c | 13 +++++++++++++
 util/string-util.h |  5 +++++
 2 files changed, 18 insertions(+)

diff --git a/util/string-util.c b/util/string-util.c
index 76c0b9025d0f..92af937f45ec 100644
--- a/util/string-util.c
+++ b/util/string-util.c
@@ -223,6 +223,19 @@ parse_boolean_term (void *ctx, const char *str,
 }
 
 int
+strcmp_null (const char *s1, const char *s2)
+{
+    if (s1 && s2)
+	return strcmp (s1, s2);
+    else if (! s1 && ! s2)
+	return 0;
+    else if (s1)
+	return 1;	/* s1 (non-NULL) is greater than s2 (NULL) */
+    else
+	return -1;	/* s1 (NULL) is less than s2 (non-NULL) */
+}
+
+int
 strcase_equal (const void *a, const void *b)
 {
     return strcasecmp (a, b) == 0;
diff --git a/util/string-util.h b/util/string-util.h
index 80d24d1c1053..87917b8fd279 100644
--- a/util/string-util.h
+++ b/util/string-util.h
@@ -64,6 +64,11 @@ int
 parse_boolean_term (void *ctx, const char *str,
 		    char **prefix_out, char **term_out);
 
+/* strcmp that handles NULL strings; in strcmp terms a NULL string is
+ * considered to be less than a non-NULL string.
+ */
+int strcmp_null (const char *s1, const char *s2);
+
 /* GLib GEqualFunc compatible strcasecmp wrapper */
 int strcase_equal (const void *a, const void *b);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/9 v3 part 2/2] cli: change the data structure for notmuch address deduplication
  2015-09-25 16:48         ` [PATCH 6/9 v3 part 1/2] util: add strcmp_null, a strcmp that handles NULL parameters Jani Nikula
@ 2015-09-25 16:48           ` Jani Nikula
  0 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-25 16:48 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

Currently we key the address hash table with the case sensitive "name
<address>". Switch to case insensitive keying with just address, and
store the case sensitive name and address in linked lists. This will
be helpful in adding support for different deduplication schemes in
the future.

There will be a slight performance penalty for the current full case
sensitive name + address deduplication, but this is simpler as a whole
when other deduplication schemes are added, and I expect the schemes
to be added to become more popular than the current default.

Aparet from the possible performance penalty, the only user visible
change should be the change in the output ordering for
--output=count. The order is not guaranteed (and is based on hash
table traversal) currently anyway, so this should be of no
consequence.

---

v3: abstract strcmp_null
---
 notmuch-client.h |  1 +
 notmuch-search.c | 80 +++++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 66 insertions(+), 15 deletions(-)

diff --git a/notmuch-client.h b/notmuch-client.h
index de8a3b15f865..3bd2903ec54a 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -48,6 +48,7 @@ typedef GMimeCryptoContext notmuch_crypto_context_t;
 #include <dirent.h>
 #include <errno.h>
 #include <signal.h>
+#include <ctype.h>
 
 #include "talloc-extra.h"
 
diff --git a/notmuch-search.c b/notmuch-search.c
index 966c310f8f18..6cac0fcdc1df 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -265,30 +265,70 @@ static mailbox_t *new_mailbox (void *ctx, const char *name, const char *addr)
     return mailbox;
 }
 
+static int mailbox_compare (const void *v1, const void *v2)
+{
+    const mailbox_t *m1 = v1, *m2 = v2;
+    int ret;
+
+    ret = strcmp_null (m1->name, m2->name);
+    if (! ret)
+	ret = strcmp (m1->addr, m2->addr);
+
+    return ret;
+}
+
 /* Returns TRUE iff name and addr is duplicate. If not, stores the
  * name/addr pair in order to detect subsequent duplicates. */
 static notmuch_bool_t
 is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
 {
     char *key;
+    GList *list, *l;
     mailbox_t *mailbox;
 
-    key = talloc_asprintf (ctx->format, "%s <%s>", name, addr);
-    if (! key)
-	return FALSE;
+    list = g_hash_table_lookup (ctx->addresses, addr);
+    if (list) {
+	mailbox_t find = {
+	    .name = name,
+	    .addr = addr,
+	};
+
+	l = g_list_find_custom (list, &find, mailbox_compare);
+	if (l) {
+	    mailbox = l->data;
+	    mailbox->count++;
+	    return TRUE;
+	}
+
+	mailbox = new_mailbox (ctx->format, name, addr);
+	if (! mailbox)
+	    return FALSE;
 
-    mailbox = g_hash_table_lookup (ctx->addresses, key);
-    if (mailbox) {
-	mailbox->count++;
-	talloc_free (key);
-	return TRUE;
+	/*
+	 * XXX: It would be more efficient to prepend to the list, but
+	 * then we'd have to store the changed list head back to the
+	 * hash table. This check is here just to avoid the compiler
+	 * warning for unused result.
+	 */
+	if (list != g_list_append (list, mailbox))
+	    INTERNAL_ERROR ("appending to list changed list head\n");
+
+	return FALSE;
     }
 
+    key = talloc_strdup (ctx->format, addr);
+    if (! key)
+	return FALSE;
+
     mailbox = new_mailbox (ctx->format, name, addr);
     if (! mailbox)
 	return FALSE;
 
-    g_hash_table_insert (ctx->addresses, key, mailbox);
+    list = g_list_append (NULL, mailbox);
+    if (! list)
+	return FALSE;
+
+    g_hash_table_insert (ctx->addresses, key, list);
 
     return FALSE;
 }
@@ -401,12 +441,21 @@ _talloc_free_for_g_hash (void *ptr)
 }
 
 static void
-print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
+_list_free_for_g_hash (void *ptr)
 {
-    const mailbox_t *mailbox = value;
-    search_context_t *ctx = user_data;
+    g_list_free_full (ptr, _talloc_free_for_g_hash);
+}
 
-    print_mailbox (ctx, mailbox);
+static void
+print_list_value (void *mailbox, void *context)
+{
+    print_mailbox (context, mailbox);
+}
+
+static void
+print_hash_value (unused (void *key), void *list, void *context)
+{
+    g_list_foreach (list, print_list_value, context);
 }
 
 static int
@@ -794,8 +843,9 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
 				 argc - opt_index, argv + opt_index))
 	return EXIT_FAILURE;
 
-    ctx->addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
-					    _talloc_free_for_g_hash, _talloc_free_for_g_hash);
+    ctx->addresses = g_hash_table_new_full (strcase_hash, strcase_equal,
+					    _talloc_free_for_g_hash,
+					    _list_free_for_g_hash);
 
     ret = do_search_messages (ctx);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests
  2015-09-25  0:02     ` David Bremner
@ 2015-09-25 17:08       ` Jani Nikula
  0 siblings, 0 replies; 28+ messages in thread
From: Jani Nikula @ 2015-09-25 17:08 UTC (permalink / raw)
  To: David Bremner, Jani Nikula, notmuch

First a simple smoke test first, next generate messages with multiple
email address variants and check the behaviour of deduplication
schemes with these.

---

v2: more variation in name parts of email addresses
---
 test/T095-address.sh | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/test/T095-address.sh b/test/T095-address.sh
index f8d902cb3a5e..a194faf3b842 100755
--- a/test/T095-address.sh
+++ b/test/T095-address.sh
@@ -215,4 +215,78 @@ notmuch@notmuchmail.org
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "--deduplicate=address --output=sender --output=recipients"
+notmuch address --deduplicate=address --output=sender --output=recipients '*' | sort >OUTPUT
+cat <<EOF >EXPECTED
+"Discussion about the Arch User Repository (AUR)" <aur-general@archlinux.org>
+Adrian Perez de Castro <aperez@igalia.com>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Allan McRae <allan@archlinux.org>
+Aron Griffis <agriffis@n01se.net>
+Carl Worth <cworth@cworth.org>
+Chris Wilson <chris@chris-wilson.co.uk>
+François Boulogne <boulogne.f@gmail.com>
+Ingmar Vanhassel <ingmar@exherbo.org>
+Israel Herraiz <isra@herraiz.org>
+Jan Janak <jan@ryngle.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Keith Packard <keithp@keithp.com>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+Olivier Berger <olivier.berger@it-sudparis.eu>
+Rolland Santimano <rollandsantimano@yahoo.com>
+Stewart Smith <stewart@flamingspork.com>
+notmuch@notmuchmail.org
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+generate_message '[from]="Foo Bar <foo.bar@example.com>"'
+generate_message '[from]="Foo Bar <Foo.Bar@Example.Com>"'
+generate_message '[from]="Foo Bar <foo.bar@example.com>"'
+generate_message '[from]="Bar <Foo.Bar@Example.Com>"'
+generate_message '[from]="Foo <foo.bar@example.com>"'
+generate_message '[from]="<foo.bar@example.com>"'
+generate_message '[from]="foo.bar@example.com"'
+generate_message '[from]="Baz <foo.bar+baz@example.com>"'
+generate_message '[from]="Foo Bar <foo.bar+baz@example.com>"'
+generate_message '[from]="Baz <foo.bar+baz@example.com>"'
+notmuch new > /dev/null
+
+test_begin_subtest "--deduplicate=no --output=sender"
+notmuch address --deduplicate=no --output=sender from:example.com | sort >OUTPUT
+cat <<EOF >EXPECTED
+Bar <Foo.Bar@Example.Com>
+Baz <foo.bar+baz@example.com>
+Baz <foo.bar+baz@example.com>
+Foo <foo.bar@example.com>
+Foo Bar <Foo.Bar@Example.Com>
+Foo Bar <foo.bar+baz@example.com>
+Foo Bar <foo.bar@example.com>
+Foo Bar <foo.bar@example.com>
+foo.bar@example.com
+foo.bar@example.com
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--deduplicate=mailbox --output=sender --output=count"
+notmuch address --deduplicate=mailbox --output=sender --output=count from:example.com | sort -n >OUTPUT
+cat <<EOF >EXPECTED
+1	Bar <Foo.Bar@Example.Com>
+1	Foo <foo.bar@example.com>
+1	Foo Bar <Foo.Bar@Example.Com>
+1	Foo Bar <foo.bar+baz@example.com>
+2	Baz <foo.bar+baz@example.com>
+2	Foo Bar <foo.bar@example.com>
+2	foo.bar@example.com
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--deduplicate=address --output=sender --output=count"
+notmuch address --deduplicate=address --output=sender --output=count from:example.com | sort -n >OUTPUT
+cat <<EOF >EXPECTED
+3	Baz <foo.bar+baz@example.com>
+7	Foo Bar <foo.bar@example.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 test_done
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/9] cli: alternative address deduplication
  2015-09-07 12:52 ` [PATCH v2 0/9] cli: alternative address deduplication David Bremner
@ 2015-09-26 10:48   ` David Bremner
  0 siblings, 0 replies; 28+ messages in thread
From: David Bremner @ 2015-09-26 10:48 UTC (permalink / raw)
  To: Jani Nikula, notmuch

David Bremner <david@tethera.net> writes:

> Jani Nikula <jani@nikula.org> writes:
>
>> This is v2 of id:cover.1440859765.git.jani@nikula.org addressing review
>> from the first (or rfc) version. Test are still missing, but everything
>> else should be in place. (At least patches 1, 2, and 5 could be pushed
>> without additional tests.)
>
> I pushed 1, 2, and 5.

I pushed the remaining patches in the series.

d

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-09-26 10:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
2015-09-03 19:39 ` [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill Jani Nikula
2015-09-03 19:39 ` [PATCH v2 2/9] cli: abstract new mailbox creation Jani Nikula
2015-09-03 19:39 ` [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results Jani Nikula
2015-09-04 18:35   ` [PATCH 3½/9] test: notmuch address --deduplicate=no tests Jani Nikula
2015-09-20 12:43     ` David Bremner
2015-09-23 18:56       ` Jani Nikula
2015-09-03 19:40 ` [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option Jani Nikula
2015-09-20 12:45   ` David Bremner
2015-09-23 19:31     ` [PATCH] " Jani Nikula
2015-09-24 10:37       ` David Bremner
2015-09-03 19:40 ` [PATCH v2 5/9] util: move strcase_equal and strcase_hash to util Jani Nikula
2015-09-03 19:40 ` [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication Jani Nikula
2015-09-24 12:32   ` David Bremner
2015-09-24 12:40     ` David Bremner
2015-09-24 19:55       ` Tomi Ollila
2015-09-24 18:34     ` Jani Nikula
2015-09-24 23:31       ` David Bremner
2015-09-25 16:48         ` [PATCH 6/9 v3 part 1/2] util: add strcmp_null, a strcmp that handles NULL parameters Jani Nikula
2015-09-25 16:48           ` [PATCH 6/9 v3 part 2/2] cli: change the data structure for notmuch address deduplication Jani Nikula
2015-09-03 19:40 ` [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address Jani Nikula
2015-09-04 18:38   ` [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests Jani Nikula
2015-09-25  0:02     ` David Bremner
2015-09-25 17:08       ` [PATCH v2 " Jani Nikula
2015-09-03 19:40 ` [PATCH v2 8/9] man: document notmuch address --deduplicate=address option Jani Nikula
2015-09-03 19:40 ` [PATCH v2 9/9] cli: do not sort addresses on --output=count or --deduplicate=address Jani Nikula
2015-09-07 12:52 ` [PATCH v2 0/9] cli: alternative address deduplication David Bremner
2015-09-26 10:48   ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).