From: Jani Nikula <jani@nikula.org>
To: David Bremner <david@tethera.net>, Jani Nikula <jani@nikula.org>,
notmuch@notmuchmail.org
Subject: [PATCH 6/9 v3 part 2/2] cli: change the data structure for notmuch address deduplication
Date: Fri, 25 Sep 2015 19:48:20 +0300 [thread overview]
Message-ID: <1443199700-16654-2-git-send-email-jani@nikula.org> (raw)
In-Reply-To: <1443199700-16654-1-git-send-email-jani@nikula.org>
Currently we key the address hash table with the case sensitive "name
<address>". Switch to case insensitive keying with just address, and
store the case sensitive name and address in linked lists. This will
be helpful in adding support for different deduplication schemes in
the future.
There will be a slight performance penalty for the current full case
sensitive name + address deduplication, but this is simpler as a whole
when other deduplication schemes are added, and I expect the schemes
to be added to become more popular than the current default.
Aparet from the possible performance penalty, the only user visible
change should be the change in the output ordering for
--output=count. The order is not guaranteed (and is based on hash
table traversal) currently anyway, so this should be of no
consequence.
---
v3: abstract strcmp_null
---
notmuch-client.h | 1 +
notmuch-search.c | 80 +++++++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 66 insertions(+), 15 deletions(-)
diff --git a/notmuch-client.h b/notmuch-client.h
index de8a3b15f865..3bd2903ec54a 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -48,6 +48,7 @@ typedef GMimeCryptoContext notmuch_crypto_context_t;
#include <dirent.h>
#include <errno.h>
#include <signal.h>
+#include <ctype.h>
#include "talloc-extra.h"
diff --git a/notmuch-search.c b/notmuch-search.c
index 966c310f8f18..6cac0fcdc1df 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -265,30 +265,70 @@ static mailbox_t *new_mailbox (void *ctx, const char *name, const char *addr)
return mailbox;
}
+static int mailbox_compare (const void *v1, const void *v2)
+{
+ const mailbox_t *m1 = v1, *m2 = v2;
+ int ret;
+
+ ret = strcmp_null (m1->name, m2->name);
+ if (! ret)
+ ret = strcmp (m1->addr, m2->addr);
+
+ return ret;
+}
+
/* Returns TRUE iff name and addr is duplicate. If not, stores the
* name/addr pair in order to detect subsequent duplicates. */
static notmuch_bool_t
is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
{
char *key;
+ GList *list, *l;
mailbox_t *mailbox;
- key = talloc_asprintf (ctx->format, "%s <%s>", name, addr);
- if (! key)
- return FALSE;
+ list = g_hash_table_lookup (ctx->addresses, addr);
+ if (list) {
+ mailbox_t find = {
+ .name = name,
+ .addr = addr,
+ };
+
+ l = g_list_find_custom (list, &find, mailbox_compare);
+ if (l) {
+ mailbox = l->data;
+ mailbox->count++;
+ return TRUE;
+ }
+
+ mailbox = new_mailbox (ctx->format, name, addr);
+ if (! mailbox)
+ return FALSE;
- mailbox = g_hash_table_lookup (ctx->addresses, key);
- if (mailbox) {
- mailbox->count++;
- talloc_free (key);
- return TRUE;
+ /*
+ * XXX: It would be more efficient to prepend to the list, but
+ * then we'd have to store the changed list head back to the
+ * hash table. This check is here just to avoid the compiler
+ * warning for unused result.
+ */
+ if (list != g_list_append (list, mailbox))
+ INTERNAL_ERROR ("appending to list changed list head\n");
+
+ return FALSE;
}
+ key = talloc_strdup (ctx->format, addr);
+ if (! key)
+ return FALSE;
+
mailbox = new_mailbox (ctx->format, name, addr);
if (! mailbox)
return FALSE;
- g_hash_table_insert (ctx->addresses, key, mailbox);
+ list = g_list_append (NULL, mailbox);
+ if (! list)
+ return FALSE;
+
+ g_hash_table_insert (ctx->addresses, key, list);
return FALSE;
}
@@ -401,12 +441,21 @@ _talloc_free_for_g_hash (void *ptr)
}
static void
-print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
+_list_free_for_g_hash (void *ptr)
{
- const mailbox_t *mailbox = value;
- search_context_t *ctx = user_data;
+ g_list_free_full (ptr, _talloc_free_for_g_hash);
+}
- print_mailbox (ctx, mailbox);
+static void
+print_list_value (void *mailbox, void *context)
+{
+ print_mailbox (context, mailbox);
+}
+
+static void
+print_hash_value (unused (void *key), void *list, void *context)
+{
+ g_list_foreach (list, print_list_value, context);
}
static int
@@ -794,8 +843,9 @@ notmuch_address_command (notmuch_config_t *config, int argc, char *argv[])
argc - opt_index, argv + opt_index))
return EXIT_FAILURE;
- ctx->addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
- _talloc_free_for_g_hash, _talloc_free_for_g_hash);
+ ctx->addresses = g_hash_table_new_full (strcase_hash, strcase_equal,
+ _talloc_free_for_g_hash,
+ _list_free_for_g_hash);
ret = do_search_messages (ctx);
--
2.1.4
next prev parent reply other threads:[~2015-09-25 16:48 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-03 19:39 [PATCH v2 0/9] cli: alternative address deduplication Jani Nikula
2015-09-03 19:39 ` [PATCH v2 1/9] cli: g_hash_table_lookup_extended is overkill Jani Nikula
2015-09-03 19:39 ` [PATCH v2 2/9] cli: abstract new mailbox creation Jani Nikula
2015-09-03 19:39 ` [PATCH v2 3/9] cli: add support for not deduplicating notmuch address results Jani Nikula
2015-09-04 18:35 ` [PATCH 3½/9] test: notmuch address --deduplicate=no tests Jani Nikula
2015-09-20 12:43 ` David Bremner
2015-09-23 18:56 ` Jani Nikula
2015-09-03 19:40 ` [PATCH v2 4/9] man: document notmuch address --deduplicate=(no|mailbox) option Jani Nikula
2015-09-20 12:45 ` David Bremner
2015-09-23 19:31 ` [PATCH] " Jani Nikula
2015-09-24 10:37 ` David Bremner
2015-09-03 19:40 ` [PATCH v2 5/9] util: move strcase_equal and strcase_hash to util Jani Nikula
2015-09-03 19:40 ` [PATCH v2 6/9] cli: change the data structure for notmuch address deduplication Jani Nikula
2015-09-24 12:32 ` David Bremner
2015-09-24 12:40 ` David Bremner
2015-09-24 19:55 ` Tomi Ollila
2015-09-24 18:34 ` Jani Nikula
2015-09-24 23:31 ` David Bremner
2015-09-25 16:48 ` [PATCH 6/9 v3 part 1/2] util: add strcmp_null, a strcmp that handles NULL parameters Jani Nikula
2015-09-25 16:48 ` Jani Nikula [this message]
2015-09-03 19:40 ` [PATCH v2 7/9] cli: add support for deduplicating based on case insensitive address Jani Nikula
2015-09-04 18:38 ` [PATCH 7½/9] test: add notmuch address --deduplicate=(no|mailbox|address) tests Jani Nikula
2015-09-25 0:02 ` David Bremner
2015-09-25 17:08 ` [PATCH v2 " Jani Nikula
2015-09-03 19:40 ` [PATCH v2 8/9] man: document notmuch address --deduplicate=address option Jani Nikula
2015-09-03 19:40 ` [PATCH v2 9/9] cli: do not sort addresses on --output=count or --deduplicate=address Jani Nikula
2015-09-07 12:52 ` [PATCH v2 0/9] cli: alternative address deduplication David Bremner
2015-09-26 10:48 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1443199700-16654-2-git-send-email-jani@nikula.org \
--to=jani@nikula.org \
--cc=david@tethera.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).