unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH v4 0/6] notmuch search --output=sender/recipients
@ 2014-10-27 14:50 Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 1/6] cli: search: Refactor passing of command line options Michal Sojka
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

Hi all,

this is v4 of the search --output=address series. It obsoletes v3 that
starts at id:1413150093-8383-1-git-send-email-sojkam1@fel.cvut.cz.

It addresses most comments made by Mark Walters and others. In
addition to v3, it also implements new --output=count (also suggested
by Mark). This required changes even in the initial patches of the
series but the result is cleaner code.

Detailed changes from v3:

- `o' renamed to `opt'.
- Conversion of --output from keyword to keyword-flags is now a
  separate patch.
- Structured output formats print name and address separately.
- Added test for --format=json.
- Changed --filter-by default to nameaddr. In v2, the default was
  addrfold, in v3 the default was no filtering at all. I believe that
  Mark's suggestion to make nameaddr the default is good trade off.
- Added new --output=count
- Minor style fixes
- Few typos fixed
- There is no way to output unfiltered (duplicite) addresses.
  Hopefully, the introduction of --output=count is sufficient
  replacement for this "feature".

Cheers,
-Michal


Jani Nikula (1):
  cli: Add support for parsing keyword-flag arguments

Michal Sojka (5):
  cli: search: Refactor passing of command line options
  cli: search: Convert --output to keyword-flag argument
  cli: search: Add --output={sender,recipients}
  cli: search: Add configurable way to filter out duplicate addresses
  cli: search: Add --output=count

 command-line-arguments.c           |   6 +-
 command-line-arguments.h           |   1 +
 completion/notmuch-completion.bash |   8 +-
 completion/notmuch-completion.zsh  |   4 +-
 doc/man1/notmuch-search.rst        |  65 ++++++-
 notmuch-search.c                   | 385 +++++++++++++++++++++++++++++--------
 test/T090-search-output.sh         | 137 +++++++++++++
 test/T095-search-filter-by.sh      |  64 ++++++
 test/T410-argument-parsing.sh      |   3 +-
 test/arg-test.c                    |   9 +
 10 files changed, 600 insertions(+), 82 deletions(-)
 create mode 100755 test/T095-search-filter-by.sh

-- 
2.1.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/6] cli: search: Refactor passing of command line options
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-30  7:57   ` Mark Walters
  2014-10-27 14:50 ` [PATCH v4 2/6] cli: Add support for parsing keyword-flag arguments Michal Sojka
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

Many functions that implement the search command need to access command
line options. Instead of passing each option in a separate variable, put
them in a structure and pass only this structure.

This will become handy in the following patches.
---
 notmuch-search.c | 125 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 64 insertions(+), 61 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index bc9be45..0c3e972 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -30,6 +30,16 @@ typedef enum {
     OUTPUT_TAGS
 } output_t;
 
+typedef struct {
+    sprinter_t *format;
+    notmuch_query_t *query;
+    notmuch_sort_t sort;
+    output_t output;
+    int offset;
+    int limit;
+    int dupe;
+} search_options_t;
+
 /* Return two stable query strings that identify exactly the matched
  * and unmatched messages currently in thread.  If there are no
  * matched or unmatched messages, the returned buffers will be
@@ -70,43 +80,39 @@ get_thread_query (notmuch_thread_t *thread,
 }
 
 static int
-do_search_threads (sprinter_t *format,
-		   notmuch_query_t *query,
-		   notmuch_sort_t sort,
-		   output_t output,
-		   int offset,
-		   int limit)
+do_search_threads (search_options_t *opt)
 {
     notmuch_thread_t *thread;
     notmuch_threads_t *threads;
     notmuch_tags_t *tags;
+    sprinter_t *format = opt->format;
     time_t date;
     int i;
 
-    if (offset < 0) {
-	offset += notmuch_query_count_threads (query);
-	if (offset < 0)
-	    offset = 0;
+    if (opt->offset < 0) {
+	opt->offset += notmuch_query_count_threads (opt->query);
+	if (opt->offset < 0)
+	    opt->offset = 0;
     }
 
-    threads = notmuch_query_search_threads (query);
+    threads = notmuch_query_search_threads (opt->query);
     if (threads == NULL)
 	return 1;
 
     format->begin_list (format);
 
     for (i = 0;
-	 notmuch_threads_valid (threads) && (limit < 0 || i < offset + limit);
+	 notmuch_threads_valid (threads) && (opt->limit < 0 || i < opt->offset + opt->limit);
 	 notmuch_threads_move_to_next (threads), i++)
     {
 	thread = notmuch_threads_get (threads);
 
-	if (i < offset) {
+	if (i < opt->offset) {
 	    notmuch_thread_destroy (thread);
 	    continue;
 	}
 
-	if (output == OUTPUT_THREADS) {
+	if (opt->output == OUTPUT_THREADS) {
 	    format->set_prefix (format, "thread");
 	    format->string (format,
 			    notmuch_thread_get_thread_id (thread));
@@ -123,7 +129,7 @@ do_search_threads (sprinter_t *format,
 
 	    format->begin_map (format);
 
-	    if (sort == NOTMUCH_SORT_OLDEST_FIRST)
+	    if (opt->sort == NOTMUCH_SORT_OLDEST_FIRST)
 		date = notmuch_thread_get_oldest_date (thread);
 	    else
 		date = notmuch_thread_get_newest_date (thread);
@@ -215,40 +221,36 @@ do_search_threads (sprinter_t *format,
 }
 
 static int
-do_search_messages (sprinter_t *format,
-		    notmuch_query_t *query,
-		    output_t output,
-		    int offset,
-		    int limit,
-		    int dupe)
+do_search_messages (search_options_t *opt)
 {
     notmuch_message_t *message;
     notmuch_messages_t *messages;
     notmuch_filenames_t *filenames;
+    sprinter_t *format = opt->format;
     int i;
 
-    if (offset < 0) {
-	offset += notmuch_query_count_messages (query);
-	if (offset < 0)
-	    offset = 0;
+    if (opt->offset < 0) {
+	opt->offset += notmuch_query_count_messages (opt->query);
+	if (opt->offset < 0)
+	    opt->offset = 0;
     }
 
-    messages = notmuch_query_search_messages (query);
+    messages = notmuch_query_search_messages (opt->query);
     if (messages == NULL)
 	return 1;
 
     format->begin_list (format);
 
     for (i = 0;
-	 notmuch_messages_valid (messages) && (limit < 0 || i < offset + limit);
+	 notmuch_messages_valid (messages) && (opt->limit < 0 || i < opt->offset + opt->limit);
 	 notmuch_messages_move_to_next (messages), i++)
     {
-	if (i < offset)
+	if (i < opt->offset)
 	    continue;
 
 	message = notmuch_messages_get (messages);
 
-	if (output == OUTPUT_FILES) {
+	if (opt->output == OUTPUT_FILES) {
 	    int j;
 	    filenames = notmuch_message_get_filenames (message);
 
@@ -256,7 +258,7 @@ do_search_messages (sprinter_t *format,
 		 notmuch_filenames_valid (filenames);
 		 notmuch_filenames_move_to_next (filenames), j++)
 	    {
-		if (dupe < 0 || dupe == j) {
+		if (opt->dupe < 0 || opt->dupe == j) {
 		    format->string (format, notmuch_filenames_get (filenames));
 		    format->separator (format);
 		}
@@ -283,12 +285,13 @@ do_search_messages (sprinter_t *format,
 
 static int
 do_search_tags (notmuch_database_t *notmuch,
-		sprinter_t *format,
-		notmuch_query_t *query)
+		const search_options_t *opt)
 {
     notmuch_messages_t *messages = NULL;
     notmuch_tags_t *tags;
     const char *tag;
+    sprinter_t *format = opt->format;
+    notmuch_query_t *query = opt->query;
 
     /* should the following only special case if no excluded terms
      * specified? */
@@ -333,16 +336,16 @@ int
 notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 {
     notmuch_database_t *notmuch;
-    notmuch_query_t *query;
+    search_options_t opt = {
+	.sort = NOTMUCH_SORT_NEWEST_FIRST,
+	.output = OUTPUT_SUMMARY,
+	.offset = 0,
+	.limit = -1, /* unlimited */
+	.dupe = -1,
+    };
     char *query_str;
-    notmuch_sort_t sort = NOTMUCH_SORT_NEWEST_FIRST;
-    sprinter_t *format = NULL;
     int opt_index, ret;
-    output_t output = OUTPUT_SUMMARY;
-    int offset = 0;
-    int limit = -1; /* unlimited */
     notmuch_exclude_t exclude = NOTMUCH_EXCLUDE_TRUE;
-    int dupe = -1;
     unsigned int i;
 
     enum {
@@ -353,7 +356,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
     } format_sel = NOTMUCH_FORMAT_TEXT;
 
     notmuch_opt_desc_t options[] = {
-	{ NOTMUCH_OPT_KEYWORD, &sort, "sort", 's',
+	{ NOTMUCH_OPT_KEYWORD, &opt.sort, "sort", 's',
 	  (notmuch_keyword_t []){ { "oldest-first", NOTMUCH_SORT_OLDEST_FIRST },
 				  { "newest-first", NOTMUCH_SORT_NEWEST_FIRST },
 				  { 0, 0 } } },
@@ -364,7 +367,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 				  { "text0", NOTMUCH_FORMAT_TEXT0 },
 				  { 0, 0 } } },
 	{ NOTMUCH_OPT_INT, &notmuch_format_version, "format-version", 0, 0 },
-	{ NOTMUCH_OPT_KEYWORD, &output, "output", 'o',
+	{ NOTMUCH_OPT_KEYWORD, &opt.output, "output", 'o',
 	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
 				  { "threads", OUTPUT_THREADS },
 				  { "messages", OUTPUT_MESSAGES },
@@ -377,9 +380,9 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
                                   { "flag", NOTMUCH_EXCLUDE_FLAG },
                                   { "all", NOTMUCH_EXCLUDE_ALL },
                                   { 0, 0 } } },
-	{ NOTMUCH_OPT_INT, &offset, "offset", 'O', 0 },
-	{ NOTMUCH_OPT_INT, &limit, "limit", 'L', 0  },
-	{ NOTMUCH_OPT_INT, &dupe, "duplicate", 'D', 0  },
+	{ NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 },
+	{ NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0  },
+	{ NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0  },
 	{ 0, 0, 0, 0, 0 }
     };
 
@@ -389,20 +392,20 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 
     switch (format_sel) {
     case NOTMUCH_FORMAT_TEXT:
-	format = sprinter_text_create (config, stdout);
+	opt.format = sprinter_text_create (config, stdout);
 	break;
     case NOTMUCH_FORMAT_TEXT0:
-	if (output == OUTPUT_SUMMARY) {
+	if (opt.output == OUTPUT_SUMMARY) {
 	    fprintf (stderr, "Error: --format=text0 is not compatible with --output=summary.\n");
 	    return EXIT_FAILURE;
 	}
-	format = sprinter_text0_create (config, stdout);
+	opt.format = sprinter_text0_create (config, stdout);
 	break;
     case NOTMUCH_FORMAT_JSON:
-	format = sprinter_json_create (config, stdout);
+	opt.format = sprinter_json_create (config, stdout);
 	break;
     case NOTMUCH_FORMAT_SEXP:
-	format = sprinter_sexp_create (config, stdout);
+	opt.format = sprinter_sexp_create (config, stdout);
 	break;
     default:
 	/* this should never happen */
@@ -425,15 +428,15 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	return EXIT_FAILURE;
     }
 
-    query = notmuch_query_create (notmuch, query_str);
-    if (query == NULL) {
+    opt.query = notmuch_query_create (notmuch, query_str);
+    if (opt.query == NULL) {
 	fprintf (stderr, "Out of memory\n");
 	return EXIT_FAILURE;
     }
 
-    notmuch_query_set_sort (query, sort);
+    notmuch_query_set_sort (opt.query, opt.sort);
 
-    if (exclude == NOTMUCH_EXCLUDE_FLAG && output != OUTPUT_SUMMARY) {
+    if (exclude == NOTMUCH_EXCLUDE_FLAG && opt.output != OUTPUT_SUMMARY) {
 	/* If we are not doing summary output there is nowhere to
 	 * print the excluded flag so fall back on including the
 	 * excluded messages. */
@@ -448,29 +451,29 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	search_exclude_tags = notmuch_config_get_search_exclude_tags
 	    (config, &search_exclude_tags_length);
 	for (i = 0; i < search_exclude_tags_length; i++)
-	    notmuch_query_add_tag_exclude (query, search_exclude_tags[i]);
-	notmuch_query_set_omit_excluded (query, exclude);
+	    notmuch_query_add_tag_exclude (opt.query, search_exclude_tags[i]);
+	notmuch_query_set_omit_excluded (opt.query, exclude);
     }
 
-    switch (output) {
+    switch (opt.output) {
     default:
     case OUTPUT_SUMMARY:
     case OUTPUT_THREADS:
-	ret = do_search_threads (format, query, sort, output, offset, limit);
+	ret = do_search_threads (&opt);
 	break;
     case OUTPUT_MESSAGES:
     case OUTPUT_FILES:
-	ret = do_search_messages (format, query, output, offset, limit, dupe);
+	ret = do_search_messages (&opt);
 	break;
     case OUTPUT_TAGS:
-	ret = do_search_tags (notmuch, format, query);
+	ret = do_search_tags (notmuch, &opt);
 	break;
     }
 
-    notmuch_query_destroy (query);
+    notmuch_query_destroy (opt.query);
     notmuch_database_destroy (notmuch);
 
-    talloc_free (format);
+    talloc_free (opt.format);
 
     return ret ? EXIT_FAILURE : EXIT_SUCCESS;
 }
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/6] cli: Add support for parsing keyword-flag arguments
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 1/6] cli: search: Refactor passing of command line options Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 3/6] cli: search: Convert --output to keyword-flag argument Michal Sojka
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

From: Jani Nikula <jani@nikula.org>

This allows having multiple --foo=bar --foo=baz options on the command
line, with the corresponding values OR'd together.

[Test added by Michal Sojka]
---
 command-line-arguments.c      | 6 +++++-
 command-line-arguments.h      | 1 +
 test/T410-argument-parsing.sh | 3 ++-
 test/arg-test.c               | 9 +++++++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/command-line-arguments.c b/command-line-arguments.c
index 844d6c3..c6f7269 100644
--- a/command-line-arguments.c
+++ b/command-line-arguments.c
@@ -23,7 +23,10 @@ _process_keyword_arg (const notmuch_opt_desc_t *arg_desc, char next, const char
     while (keywords->name) {
 	if (strcmp (arg_str, keywords->name) == 0) {
 	    if (arg_desc->output_var) {
-		*((int *)arg_desc->output_var) = keywords->value;
+		if (arg_desc->opt_type == NOTMUCH_OPT_KEYWORD_FLAGS)
+		    *((int *)arg_desc->output_var) |= keywords->value;
+		else
+		    *((int *)arg_desc->output_var) = keywords->value;
 	    }
 	    return TRUE;
 	}
@@ -152,6 +155,7 @@ parse_option (const char *arg,
 
 	switch (try->opt_type) {
 	case NOTMUCH_OPT_KEYWORD:
+	case NOTMUCH_OPT_KEYWORD_FLAGS:
 	    return _process_keyword_arg (try, next, value);
 	case NOTMUCH_OPT_BOOLEAN:
 	    return _process_boolean_arg (try, next, value);
diff --git a/command-line-arguments.h b/command-line-arguments.h
index de1734a..6444129 100644
--- a/command-line-arguments.h
+++ b/command-line-arguments.h
@@ -8,6 +8,7 @@ enum notmuch_opt_type {
     NOTMUCH_OPT_BOOLEAN,	/* --verbose              */
     NOTMUCH_OPT_INT,		/* --frob=8               */
     NOTMUCH_OPT_KEYWORD,	/* --format=raw|json|text */
+    NOTMUCH_OPT_KEYWORD_FLAGS,	/* the above with values OR'd together */
     NOTMUCH_OPT_STRING,		/* --file=/tmp/gnarf.txt  */
     NOTMUCH_OPT_POSITION	/* notmuch dump pos_arg   */
 };
diff --git a/test/T410-argument-parsing.sh b/test/T410-argument-parsing.sh
index 94e9087..2e5d7ae 100755
--- a/test/T410-argument-parsing.sh
+++ b/test/T410-argument-parsing.sh
@@ -3,9 +3,10 @@ test_description="argument parsing"
 . ./test-lib.sh
 
 test_begin_subtest "sanity check"
-$TEST_DIRECTORY/arg-test  pos1  --keyword=one --string=foo pos2 --int=7 > OUTPUT
+$TEST_DIRECTORY/arg-test  pos1  --keyword=one --string=foo pos2 --int=7 --flag=one --flag=three > OUTPUT
 cat <<EOF > EXPECTED
 keyword 1
+flags 5
 int 7
 string foo
 positional arg 1 pos1
diff --git a/test/arg-test.c b/test/arg-test.c
index 6c49eac..736686d 100644
--- a/test/arg-test.c
+++ b/test/arg-test.c
@@ -7,6 +7,7 @@ int main(int argc, char **argv){
     int opt_index=1;
 
     int kw_val=0;
+    int fl_val=0;
     int int_val=0;
     char *pos_arg1=NULL;
     char *pos_arg2=NULL;
@@ -17,6 +18,11 @@ int main(int argc, char **argv){
 	  (notmuch_keyword_t []){ { "one", 1 },
 				  { "two", 2 },
 				  { 0, 0 } } },
+	{ NOTMUCH_OPT_KEYWORD_FLAGS, &fl_val, "flag", 'f',
+	  (notmuch_keyword_t []){ { "one",   1 << 0},
+				  { "two",   1 << 1 },
+				  { "three", 1 << 2 },
+				  { 0, 0 } } },
 	{ NOTMUCH_OPT_INT, &int_val, "int", 'i', 0},
 	{ NOTMUCH_OPT_STRING, &string_val, "string", 's', 0},
 	{ NOTMUCH_OPT_POSITION, &pos_arg1, 0,0, 0},
@@ -31,6 +37,9 @@ int main(int argc, char **argv){
     if (kw_val)
 	printf("keyword %d\n", kw_val);
 
+    if (fl_val)
+	printf("flags %d\n", fl_val);
+
     if (int_val)
 	printf("int %d\n", int_val);
 
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/6] cli: search: Convert --output to keyword-flag argument
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 1/6] cli: search: Refactor passing of command line options Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 2/6] cli: Add support for parsing keyword-flag arguments Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-27 14:50 ` [PATCH v4 4/6] cli: search: Add --output={sender,recipients} Michal Sojka
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

This converts "notmuch search" to use the recently introduced
keyword-flag argument parser. At this point, it only makes the code
slightly less readable but following patches that add new --output
keywords will profit from this.
---
 notmuch-search.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/notmuch-search.c b/notmuch-search.c
index 0c3e972..ce46877 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -23,11 +23,11 @@
 #include "string-util.h"
 
 typedef enum {
-    OUTPUT_SUMMARY,
-    OUTPUT_THREADS,
-    OUTPUT_MESSAGES,
-    OUTPUT_FILES,
-    OUTPUT_TAGS
+    OUTPUT_SUMMARY	= 1 << 0,
+    OUTPUT_THREADS	= 1 << 1,
+    OUTPUT_MESSAGES	= 1 << 2,
+    OUTPUT_FILES	= 1 << 3,
+    OUTPUT_TAGS		= 1 << 4,
 } output_t;
 
 typedef struct {
@@ -338,7 +338,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
     notmuch_database_t *notmuch;
     search_options_t opt = {
 	.sort = NOTMUCH_SORT_NEWEST_FIRST,
-	.output = OUTPUT_SUMMARY,
+	.output = 0,
 	.offset = 0,
 	.limit = -1, /* unlimited */
 	.dupe = -1,
@@ -367,7 +367,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 				  { "text0", NOTMUCH_FORMAT_TEXT0 },
 				  { 0, 0 } } },
 	{ NOTMUCH_OPT_INT, &notmuch_format_version, "format-version", 0, 0 },
-	{ NOTMUCH_OPT_KEYWORD, &opt.output, "output", 'o',
+	{ NOTMUCH_OPT_KEYWORD_FLAGS, &opt.output, "output", 'o',
 	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
 				  { "threads", OUTPUT_THREADS },
 				  { "messages", OUTPUT_MESSAGES },
@@ -390,6 +390,9 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
     if (opt_index < 0)
 	return EXIT_FAILURE;
 
+    if (! opt.output)
+	opt.output = OUTPUT_SUMMARY;
+
     switch (format_sel) {
     case NOTMUCH_FORMAT_TEXT:
 	opt.format = sprinter_text_create (config, stdout);
@@ -455,19 +458,17 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	notmuch_query_set_omit_excluded (opt.query, exclude);
     }
 
-    switch (opt.output) {
-    default:
-    case OUTPUT_SUMMARY:
-    case OUTPUT_THREADS:
+    if (opt.output == OUTPUT_SUMMARY ||
+	opt.output == OUTPUT_THREADS)
 	ret = do_search_threads (&opt);
-	break;
-    case OUTPUT_MESSAGES:
-    case OUTPUT_FILES:
+    else if (opt.output == OUTPUT_MESSAGES ||
+	     opt.output == OUTPUT_FILES)
 	ret = do_search_messages (&opt);
-	break;
-    case OUTPUT_TAGS:
+    else if (opt.output == OUTPUT_TAGS)
 	ret = do_search_tags (notmuch, &opt);
-	break;
+    else {
+	fprintf (stderr, "Error: the combination of outputs is not supported.\n");
+	ret = 1;
     }
 
     notmuch_query_destroy (opt.query);
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] cli: search: Add --output={sender,recipients}
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
                   ` (2 preceding siblings ...)
  2014-10-27 14:50 ` [PATCH v4 3/6] cli: search: Convert --output to keyword-flag argument Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-30  8:10   ` Mark Walters
  2014-10-27 14:50 ` [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses Michal Sojka
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

The new outputs allow printing senders, recipients or both of matching
messages. To print both, the user can use --output=sender and
--output=recipients simultaneously.

Currently, the same address can appear multiple times in the output. The
next patch will change this. For this reason, the test are introduced in
the next patch as well.

We use mailbox_t rather than InternetAddressMailbox because we will need
to extend it in a following patch.

This code based on a patch from Jani Nikula.
---
 completion/notmuch-completion.bash |   2 +-
 completion/notmuch-completion.zsh  |   3 +-
 doc/man1/notmuch-search.rst        |  22 +++++++-
 notmuch-search.c                   | 112 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 134 insertions(+), 5 deletions(-)

diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
index 0571dc9..cfbd389 100644
--- a/completion/notmuch-completion.bash
+++ b/completion/notmuch-completion.bash
@@ -294,7 +294,7 @@ _notmuch_search()
 	    return
 	    ;;
 	--output)
-	    COMPREPLY=( $( compgen -W "summary threads messages files tags" -- "${cur}" ) )
+	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients" -- "${cur}" ) )
 	    return
 	    ;;
 	--sort)
diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
index 67a9aba..3e52a00 100644
--- a/completion/notmuch-completion.zsh
+++ b/completion/notmuch-completion.zsh
@@ -52,7 +52,8 @@ _notmuch_search()
   _arguments -s : \
     '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
     '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
-    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))'
+    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
+    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
 }
 
 _notmuch()
diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
index 90160f2..b6607c9 100644
--- a/doc/man1/notmuch-search.rst
+++ b/doc/man1/notmuch-search.rst
@@ -35,7 +35,7 @@ Supported options for **search** include
         intended for programs that invoke **notmuch(1)** internally. If
         omitted, the latest supported version will be used.
 
-    ``--output=(summary|threads|messages|files|tags)``
+    ``--output=(summary|threads|messages|files|tags|sender|recipients)``
 
         **summary**
             Output a summary of each thread with any message matching
@@ -78,6 +78,26 @@ Supported options for **search** include
             by null characters (--format=text0), as a JSON array
             (--format=json), or as an S-Expression list (--format=sexp).
 
+	**sender**
+            Output all addresses from the *From* header that appear on
+            any message matching the search terms, either one per line
+            (--format=text), separated by null characters
+            (--format=text0), as a JSON array (--format=json), or as
+            an S-Expression list (--format=sexp).
+
+	    Note: Searching for **sender** should be much faster than
+	    searching for **recipients**, because sender addresses are
+	    cached directly in the database whereas other addresses
+	    need to be fetched from message files.
+
+	**recipients**
+            Like **sender** but for addresses from *To*, *Cc* and
+	    *Bcc* headers.
+
+	This option can be given multiple times to combine different
+	outputs. Currently, this is only supported for **sender** and
+	**recipients** outputs.
+
     ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
         This option can be used to present results in either
         chronological order (**oldest-first**) or reverse chronological
diff --git a/notmuch-search.c b/notmuch-search.c
index ce46877..ce3bfb2 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -28,8 +28,12 @@ typedef enum {
     OUTPUT_MESSAGES	= 1 << 2,
     OUTPUT_FILES	= 1 << 3,
     OUTPUT_TAGS		= 1 << 4,
+    OUTPUT_SENDER	= 1 << 5,
+    OUTPUT_RECIPIENTS	= 1 << 6,
 } output_t;
 
+#define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
+
 typedef struct {
     sprinter_t *format;
     notmuch_query_t *query;
@@ -40,6 +44,11 @@ typedef struct {
     int dupe;
 } search_options_t;
 
+typedef struct {
+    const char *name;
+    const char *addr;
+} mailbox_t;
+
 /* Return two stable query strings that identify exactly the matched
  * and unmatched messages currently in thread.  If there are no
  * matched or unmatched messages, the returned buffers will be
@@ -220,6 +229,84 @@ do_search_threads (search_options_t *opt)
     return 0;
 }
 
+static void
+print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
+{
+    const char *name = mailbox->name;
+    const char *addr = mailbox->addr;
+
+    if (opt->format->is_text_printer) {
+	char *mailbox_str;
+
+	if (name && *name)
+	    mailbox_str = talloc_asprintf (opt->format, "%s <%s>", name, addr);
+	else
+	    mailbox_str = talloc_strdup (opt->format, addr);
+
+	if (! mailbox_str) {
+	    fprintf (stderr, "Error: out of memory\n");
+	    return;
+	}
+	opt->format->string (opt->format, mailbox_str);
+	opt->format->separator (opt->format);
+
+	talloc_free (mailbox_str);
+    } else {
+	opt->format->begin_map (opt->format);
+	opt->format->map_key (opt->format, "name");
+	opt->format->string (opt->format, name);
+	opt->format->map_key (opt->format, "address");
+	opt->format->string (opt->format, addr);
+	opt->format->end (opt->format);
+	opt->format->separator (opt->format);
+    }
+}
+
+static void
+process_address_list (const search_options_t *opt, InternetAddressList *list)
+{
+    InternetAddress *address;
+    int i;
+
+    for (i = 0; i < internet_address_list_length (list); i++) {
+	address = internet_address_list_get_address (list, i);
+	if (INTERNET_ADDRESS_IS_GROUP (address)) {
+	    InternetAddressGroup *group;
+	    InternetAddressList *group_list;
+
+	    group = INTERNET_ADDRESS_GROUP (address);
+	    group_list = internet_address_group_get_members (group);
+	    if (group_list == NULL)
+		continue;
+
+	    process_address_list (opt, group_list);
+	} else {
+	    InternetAddressMailbox *mailbox = INTERNET_ADDRESS_MAILBOX (address);
+	    mailbox_t mbx = {
+		.name = internet_address_get_name (address),
+		.addr = internet_address_mailbox_get_addr (mailbox),
+	    };
+
+	    print_mailbox (opt, &mbx);
+	}
+    }
+}
+
+static void
+process_address_header (const search_options_t *opt, const char *value)
+{
+    InternetAddressList *list;
+
+    if (value == NULL)
+	return;
+
+    list = internet_address_list_parse_string (value);
+    if (list == NULL)
+	return;
+
+    process_address_list (opt, list);
+}
+
 static int
 do_search_messages (search_options_t *opt)
 {
@@ -266,11 +353,29 @@ do_search_messages (search_options_t *opt)
 	    
 	    notmuch_filenames_destroy( filenames );
 
-	} else { /* output == OUTPUT_MESSAGES */
+	} else if (opt->output == OUTPUT_MESSAGES) {
 	    format->set_prefix (format, "id");
 	    format->string (format,
 			    notmuch_message_get_message_id (message));
 	    format->separator (format);
+	} else {
+	    if (opt->output & OUTPUT_SENDER) {
+		const char *addrs;
+
+		addrs = notmuch_message_get_header (message, "from");
+		process_address_header (opt, addrs);
+	    }
+
+	    if (opt->output & OUTPUT_RECIPIENTS) {
+		const char *hdrs[] = { "to", "cc", "bcc" };
+		const char *addrs;
+		size_t j;
+
+		for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
+		    addrs = notmuch_message_get_header (message, hdrs[j]);
+		    process_address_header (opt, addrs);
+		}
+	    }
 	}
 
 	notmuch_message_destroy (message);
@@ -371,6 +476,8 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
 				  { "threads", OUTPUT_THREADS },
 				  { "messages", OUTPUT_MESSAGES },
+				  { "sender", OUTPUT_SENDER },
+				  { "recipients", OUTPUT_RECIPIENTS },
 				  { "files", OUTPUT_FILES },
 				  { "tags", OUTPUT_TAGS },
 				  { 0, 0 } } },
@@ -462,7 +569,8 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	opt.output == OUTPUT_THREADS)
 	ret = do_search_threads (&opt);
     else if (opt.output == OUTPUT_MESSAGES ||
-	     opt.output == OUTPUT_FILES)
+	     opt.output == OUTPUT_FILES ||
+	     (opt.output & OUTPUT_ADDRESS_FLAGS && !(opt.output & ~OUTPUT_ADDRESS_FLAGS)))
 	ret = do_search_messages (&opt);
     else if (opt.output == OUTPUT_TAGS)
 	ret = do_search_tags (notmuch, &opt);
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
                   ` (3 preceding siblings ...)
  2014-10-27 14:50 ` [PATCH v4 4/6] cli: search: Add --output={sender,recipients} Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-30  8:16   ` Mark Walters
  2014-10-27 14:50 ` [PATCH v4 6/6] cli: search: Add --output=count Michal Sojka
  2014-10-28 21:19 ` [PATCH v4 0/6] notmuch search --output=sender/recipients Tomi Ollila
  6 siblings, 1 reply; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

This adds an algorithm to filter out duplicate addresses from address
outputs (sender, receivers). The algorithm can be configured with
--filter-by command line option.

The code here is an extended version of a patch from Jani Nikula.
---
 completion/notmuch-completion.bash |  6 ++-
 completion/notmuch-completion.zsh  |  3 +-
 doc/man1/notmuch-search.rst        | 38 +++++++++++++++
 notmuch-search.c                   | 98 +++++++++++++++++++++++++++++++++++---
 test/T090-search-output.sh         | 87 +++++++++++++++++++++++++++++++++
 test/T095-search-filter-by.sh      | 64 +++++++++++++++++++++++++
 6 files changed, 288 insertions(+), 8 deletions(-)
 create mode 100755 test/T095-search-filter-by.sh

diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
index cfbd389..6b6d43a 100644
--- a/completion/notmuch-completion.bash
+++ b/completion/notmuch-completion.bash
@@ -305,12 +305,16 @@ _notmuch_search()
 	    COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) )
 	    return
 	    ;;
+	--filter-by)
+	    COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) )
+	    return
+	    ;;
     esac
 
     ! $split &&
     case "${cur}" in
 	-*)
-	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate="
+	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by="
 	    compopt -o nospace
 	    COMPREPLY=( $(compgen -W "$options" -- ${cur}) )
 	    ;;
diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
index 3e52a00..3e535df 100644
--- a/completion/notmuch-completion.zsh
+++ b/completion/notmuch-completion.zsh
@@ -53,7 +53,8 @@ _notmuch_search()
     '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
     '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
     '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
-    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
+    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))' \
+    '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'
 }
 
 _notmuch()
diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
index b6607c9..84af2da 100644
--- a/doc/man1/notmuch-search.rst
+++ b/doc/man1/notmuch-search.rst
@@ -85,6 +85,9 @@ Supported options for **search** include
             (--format=text0), as a JSON array (--format=json), or as
             an S-Expression list (--format=sexp).
 
+            Duplicate addresses are filtered out. Filtering can be
+            configured with the --filter-by option.
+
 	    Note: Searching for **sender** should be much faster than
 	    searching for **recipients**, because sender addresses are
 	    cached directly in the database whereas other addresses
@@ -151,6 +154,41 @@ Supported options for **search** include
         prefix. The prefix matches messages based on filenames. This
         option filters filenames of the matching messages.
 
+    ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\)
+
+	Can be used with ``--output=sender`` or
+	``--output=recipients`` to filter out duplicate addresses. The
+	filtering algorithm receives a sequence of email addresses and
+	outputs the same sequence without the addresses that are
+	considered a duplicate of a previously output address. What is
+	considered a duplicate depends on how the two addresses are
+	compared and this can be controlled with the follwing flags:
+
+	**nameaddr** means that both name and address parts are
+	compared in case-sensitive manner. Therefore, all same looking
+	addresses strings are considered duplicate. This is the
+	default.
+
+	**name** means that only the name part is compared (in
+	case-sensitive manner). For example, the addresses "John Doe
+	<me@example.com>" and "John Doe <john@doe.name>" will be
+	considered duplicate.
+
+	**addr** means that only the address part is compared (in
+	case-sensitive manner). For example, the addresses "John Doe
+	<john@example.com>" and "Dr. John Doe <john@example.com>" will
+	be considered duplicate.
+
+	**addrfold** is like **addr**, but comparison is done in
+	canse-insensitive manner. For example, the addresses "John Doe
+	<john@example.com>" and "Dr. John Doe <JOHN@EXAMPLE.COM>" will
+	be considered duplicate.
+
+	**nameaddrfold** is like **nameaddr**, but address comparison
+	is done in canse-insensitive manner. For example, the
+	addresses "John Doe <john@example.com>" and "John Doe
+	<JOHN@EXAMPLE.COM>" will be considered duplicate.
+
 EXIT STATUS
 ===========
 
diff --git a/notmuch-search.c b/notmuch-search.c
index ce3bfb2..47aa979 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -34,6 +34,14 @@ typedef enum {
 
 #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
 
+typedef enum {
+    FILTER_BY_NAMEADDR = 0,
+    FILTER_BY_NAME,
+    FILTER_BY_ADDR,
+    FILTER_BY_ADDRFOLD,
+    FILTER_BY_NAMEADDRFOLD,
+} filter_by_t;
+
 typedef struct {
     sprinter_t *format;
     notmuch_query_t *query;
@@ -42,6 +50,7 @@ typedef struct {
     int offset;
     int limit;
     int dupe;
+    filter_by_t filter_by;
 } search_options_t;
 
 typedef struct {
@@ -229,6 +238,52 @@ do_search_threads (search_options_t *opt)
     return 0;
 }
 
+/* Returns TRUE iff name and/or addr is considered duplicite. */
+static notmuch_bool_t
+check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr)
+{
+    notmuch_bool_t duplicite;
+    char *key;
+
+    if (opt->filter_by == FILTER_BY_ADDRFOLD ||
+	opt->filter_by == FILTER_BY_NAMEADDRFOLD) {
+	gchar *folded = g_utf8_casefold (addr, -1);
+	addr = talloc_strdup (opt->format, folded);
+	g_free (folded);
+    }
+    switch (opt->filter_by) {
+    case FILTER_BY_NAMEADDR:
+    case FILTER_BY_NAMEADDRFOLD:
+	key = talloc_asprintf (opt->format, "%s <%s>", name, addr);
+	break;
+    case FILTER_BY_NAME:
+	key = talloc_strdup (opt->format, name); /* !name results in !key */
+	break;
+    case FILTER_BY_ADDR:
+    case FILTER_BY_ADDRFOLD:
+	key = talloc_strdup (opt->format, addr);
+	break;
+    default:
+	INTERNAL_ERROR("invalid --filter-by flags");
+    }
+
+    if (opt->filter_by == FILTER_BY_ADDRFOLD ||
+	opt->filter_by == FILTER_BY_NAMEADDRFOLD)
+	talloc_free ((char*)addr);
+
+    if (! key)
+	return FALSE;
+
+    duplicite = g_hash_table_lookup_extended (addrs, key, NULL, NULL);
+
+    if (! duplicite)
+	g_hash_table_insert (addrs, key, NULL);
+    else
+	talloc_free (key);
+
+    return duplicite;
+}
+
 static void
 print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
 {
@@ -263,7 +318,8 @@ print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
 }
 
 static void
-process_address_list (const search_options_t *opt, InternetAddressList *list)
+process_address_list (const search_options_t *opt, GHashTable *addrs,
+		      InternetAddressList *list)
 {
     InternetAddress *address;
     int i;
@@ -279,7 +335,7 @@ process_address_list (const search_options_t *opt, InternetAddressList *list)
 	    if (group_list == NULL)
 		continue;
 
-	    process_address_list (opt, group_list);
+	    process_address_list (opt, addrs, group_list);
 	} else {
 	    InternetAddressMailbox *mailbox = INTERNET_ADDRESS_MAILBOX (address);
 	    mailbox_t mbx = {
@@ -287,13 +343,16 @@ process_address_list (const search_options_t *opt, InternetAddressList *list)
 		.addr = internet_address_mailbox_get_addr (mailbox),
 	    };
 
+	    if (check_duplicite (opt, addrs, mbx.name, mbx.addr))
+		continue;
+
 	    print_mailbox (opt, &mbx);
 	}
     }
 }
 
 static void
-process_address_header (const search_options_t *opt, const char *value)
+process_address_header (const search_options_t *opt, GHashTable *addrs, const char *value)
 {
     InternetAddressList *list;
 
@@ -304,7 +363,13 @@ process_address_header (const search_options_t *opt, const char *value)
     if (list == NULL)
 	return;
 
-    process_address_list (opt, list);
+    process_address_list (opt, addrs, list);
+}
+
+static void
+_my_talloc_free_for_g_hash (void *ptr)
+{
+    talloc_free (ptr);
 }
 
 static int
@@ -314,8 +379,13 @@ do_search_messages (search_options_t *opt)
     notmuch_messages_t *messages;
     notmuch_filenames_t *filenames;
     sprinter_t *format = opt->format;
+    GHashTable *addresses = NULL;
     int i;
 
+    if (opt->output & OUTPUT_ADDRESS_FLAGS)
+	addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
+					   _my_talloc_free_for_g_hash, NULL);
+
     if (opt->offset < 0) {
 	opt->offset += notmuch_query_count_messages (opt->query);
 	if (opt->offset < 0)
@@ -363,7 +433,7 @@ do_search_messages (search_options_t *opt)
 		const char *addrs;
 
 		addrs = notmuch_message_get_header (message, "from");
-		process_address_header (opt, addrs);
+		process_address_header (opt, addresses, addrs);
 	    }
 
 	    if (opt->output & OUTPUT_RECIPIENTS) {
@@ -373,7 +443,7 @@ do_search_messages (search_options_t *opt)
 
 		for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
 		    addrs = notmuch_message_get_header (message, hdrs[j]);
-		    process_address_header (opt, addrs);
+		    process_address_header (opt, addresses, addrs);
 		}
 	    }
 	}
@@ -381,6 +451,9 @@ do_search_messages (search_options_t *opt)
 	notmuch_message_destroy (message);
     }
 
+    if (addresses)
+	g_hash_table_unref (addresses);
+
     notmuch_messages_destroy (messages);
 
     format->end (format);
@@ -447,6 +520,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	.offset = 0,
 	.limit = -1, /* unlimited */
 	.dupe = -1,
+	.filter_by = FILTER_BY_NAMEADDR,
     };
     char *query_str;
     int opt_index, ret;
@@ -490,6 +564,13 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 	{ NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 },
 	{ NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0  },
 	{ NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0  },
+	{ NOTMUCH_OPT_KEYWORD, &opt.filter_by, "filter-by", 'b',
+	  (notmuch_keyword_t []){ { "nameaddr", FILTER_BY_NAMEADDR },
+				  { "name", FILTER_BY_NAME },
+				  { "addr", FILTER_BY_ADDR },
+				  { "addrfold", FILTER_BY_ADDRFOLD },
+				  { "nameaddrfold", FILTER_BY_NAMEADDRFOLD },
+				  { 0, 0 } } },
 	{ 0, 0, 0, 0, 0 }
     };
 
@@ -500,6 +581,11 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
     if (! opt.output)
 	opt.output = OUTPUT_SUMMARY;
 
+    if (opt.filter_by && !(opt.output & OUTPUT_ADDRESS_FLAGS)) {
+	fprintf (stderr, "Error: --filter-by can only be used with address output.\n");
+	return EXIT_FAILURE;
+    }
+
     switch (format_sel) {
     case NOTMUCH_FORMAT_TEXT:
 	opt.format = sprinter_text_create (config, stdout);
diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh
index 947d572..841a721 100755
--- a/test/T090-search-output.sh
+++ b/test/T090-search-output.sh
@@ -387,6 +387,93 @@ cat <<EOF >EXPECTED
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "--output=sender"
+notmuch search --output=sender '*' >OUTPUT
+cat <<EOF >EXPECTED
+François Boulogne <boulogne.f@gmail.com>
+Olivier Berger <olivier.berger@it-sudparis.eu>
+Chris Wilson <chris@chris-wilson.co.uk>
+Carl Worth <cworth@cworth.org>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Keith Packard <keithp@keithp.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Rolland Santimano <rollandsantimano@yahoo.com>
+Jan Janak <jan@ryngle.com>
+Stewart Smith <stewart@flamingspork.com>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+Alex Botero-Lowry <alex.boterolowry@gmail.com>
+Ingmar Vanhassel <ingmar@exherbo.org>
+Aron Griffis <agriffis@n01se.net>
+Adrian Perez de Castro <aperez@igalia.com>
+Israel Herraiz <isra@herraiz.org>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=sender --format=json"
+notmuch search --output=sender --format=json '*' >OUTPUT
+cat <<EOF >EXPECTED
+[{"name": "François Boulogne", "address": "boulogne.f@gmail.com"},
+{"name": "Olivier Berger", "address": "olivier.berger@it-sudparis.eu"},
+{"name": "Chris Wilson", "address": "chris@chris-wilson.co.uk"},
+{"name": "Carl Worth", "address": "cworth@cworth.org"},
+{"name": "Alexander Botero-Lowry", "address": "alex.boterolowry@gmail.com"},
+{"name": "Keith Packard", "address": "keithp@keithp.com"},
+{"name": "Jjgod Jiang", "address": "gzjjgod@gmail.com"},
+{"name": "Rolland Santimano", "address": "rollandsantimano@yahoo.com"},
+{"name": "Jan Janak", "address": "jan@ryngle.com"},
+{"name": "Stewart Smith", "address": "stewart@flamingspork.com"},
+{"name": "Lars Kellogg-Stedman", "address": "lars@seas.harvard.edu"},
+{"name": "Alex Botero-Lowry", "address": "alex.boterolowry@gmail.com"},
+{"name": "Ingmar Vanhassel", "address": "ingmar@exherbo.org"},
+{"name": "Aron Griffis", "address": "agriffis@n01se.net"},
+{"name": "Adrian Perez de Castro", "address": "aperez@igalia.com"},
+{"name": "Israel Herraiz", "address": "isra@herraiz.org"},
+{"name": "Mikhail Gusarov", "address": "dottedmag@dottedmag.net"}]
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients"
+notmuch search --output=recipients '*' >OUTPUT
+cat <<EOF >EXPECTED
+Allan McRae <allan@archlinux.org>
+Discussion about the Arch User Repository (AUR) <aur-general@archlinux.org>
+olivier.berger@it-sudparis.eu
+notmuch@notmuchmail.org
+notmuch <notmuch@notmuchmail.org>
+Keith Packard <keithp@keithp.com>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=sender --output=recipients"
+notmuch search --output=sender --output=recipients '*' >OUTPUT
+cat <<EOF >EXPECTED
+François Boulogne <boulogne.f@gmail.com>
+Allan McRae <allan@archlinux.org>
+Discussion about the Arch User Repository (AUR) <aur-general@archlinux.org>
+Olivier Berger <olivier.berger@it-sudparis.eu>
+olivier.berger@it-sudparis.eu
+Chris Wilson <chris@chris-wilson.co.uk>
+notmuch@notmuchmail.org
+Carl Worth <cworth@cworth.org>
+Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+Keith Packard <keithp@keithp.com>
+Jjgod Jiang <gzjjgod@gmail.com>
+Rolland Santimano <rollandsantimano@yahoo.com>
+Jan Janak <jan@ryngle.com>
+Stewart Smith <stewart@flamingspork.com>
+Lars Kellogg-Stedman <lars@seas.harvard.edu>
+notmuch <notmuch@notmuchmail.org>
+Alex Botero-Lowry <alex.boterolowry@gmail.com>
+Ingmar Vanhassel <ingmar@exherbo.org>
+Aron Griffis <agriffis@n01se.net>
+Adrian Perez de Castro <aperez@igalia.com>
+Israel Herraiz <isra@herraiz.org>
+Mikhail Gusarov <dottedmag@dottedmag.net>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 test_begin_subtest "sanitize output for quoted-printable line-breaks in author and subject"
 add_message "[subject]='two =?ISO-8859-1?Q?line=0A_subject?=
 	headers'"
diff --git a/test/T095-search-filter-by.sh b/test/T095-search-filter-by.sh
new file mode 100755
index 0000000..97d9a9b
--- /dev/null
+++ b/test/T095-search-filter-by.sh
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+test_description='duplicite address filtering in "notmuch search --output=recipients"'
+. ./test-lib.sh
+
+add_message '[to]="Real Name <foo@example.com>, Real Name <bar@example.com>"'
+add_message '[to]="Nickname <foo@example.com>"' '[cc]="Real Name <Bar@Example.COM>"'
+add_message '[to]="Nickname <foo@example.com>"' '[bcc]="Real Name <Bar@Example.COM>"'
+
+test_begin_subtest "--output=recipients"
+notmuch search --output=recipients "*" >OUTPUT
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Real Name <bar@example.com>
+Nickname <foo@example.com>
+Real Name <Bar@Example.COM>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients --filter-by=nameaddr"
+notmuch search --output=recipients --filter-by=nameaddr "*" >OUTPUT
+# The same as above
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Real Name <bar@example.com>
+Nickname <foo@example.com>
+Real Name <Bar@Example.COM>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients --filter-by=name"
+notmuch search --output=recipients --filter-by=name "*" >OUTPUT
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Nickname <foo@example.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients --filter-by=addr"
+notmuch search --output=recipients --filter-by=addr "*" >OUTPUT
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Real Name <bar@example.com>
+Real Name <Bar@Example.COM>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients --filter-by=addrfold"
+notmuch search --output=recipients --filter-by=addrfold "*" >OUTPUT
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Real Name <bar@example.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=recipients --filter-by=nameaddrfold"
+notmuch search --output=recipients --filter-by=nameaddrfold "*" >OUTPUT
+cat <<EOF >EXPECTED
+Real Name <foo@example.com>
+Real Name <bar@example.com>
+Nickname <foo@example.com>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_done
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/6] cli: search: Add --output=count
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
                   ` (4 preceding siblings ...)
  2014-10-27 14:50 ` [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses Michal Sojka
@ 2014-10-27 14:50 ` Michal Sojka
  2014-10-28 21:19 ` [PATCH v4 0/6] notmuch search --output=sender/recipients Tomi Ollila
  6 siblings, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-27 14:50 UTC (permalink / raw)
  To: notmuch

This output can be used with --output=recipients or --output=sender and
in addition to the mailboxes, it prints how many times was each mailbox
encountered during search.
---
 completion/notmuch-completion.bash |  2 +-
 completion/notmuch-completion.zsh  |  2 +-
 doc/man1/notmuch-search.rst        |  9 +++++--
 notmuch-search.c                   | 47 ++++++++++++++++++++++++++++++-----
 test/T090-search-output.sh         | 50 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
index 6b6d43a..b625b02 100644
--- a/completion/notmuch-completion.bash
+++ b/completion/notmuch-completion.bash
@@ -294,7 +294,7 @@ _notmuch_search()
 	    return
 	    ;;
 	--output)
-	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients" -- "${cur}" ) )
+	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients count" -- "${cur}" ) )
 	    return
 	    ;;
 	--sort)
diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
index 3e535df..c1ccc32 100644
--- a/completion/notmuch-completion.zsh
+++ b/completion/notmuch-completion.zsh
@@ -53,7 +53,7 @@ _notmuch_search()
     '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
     '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
     '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
-    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))' \
+    '--output=[select what to output]:output:((summary threads messages files tags sender recipients count))' \
     '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'
 }
 
diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
index 84af2da..4b408f6 100644
--- a/doc/man1/notmuch-search.rst
+++ b/doc/man1/notmuch-search.rst
@@ -97,9 +97,14 @@ Supported options for **search** include
             Like **sender** but for addresses from *To*, *Cc* and
 	    *Bcc* headers.
 
+	**count**
+	    Can be used in combination with **sender** or
+	    **recipients** to print the count of how many times was
+	    the address encountered during search.
+
 	This option can be given multiple times to combine different
-	outputs. Currently, this is only supported for **sender** and
-	**recipients** outputs.
+	outputs. Currently, this is only supported for **sender**,
+	**recipients** and **count** outputs.
 
     ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
         This option can be used to present results in either
diff --git a/notmuch-search.c b/notmuch-search.c
index 47aa979..41f4107 100644
--- a/notmuch-search.c
+++ b/notmuch-search.c
@@ -30,9 +30,10 @@ typedef enum {
     OUTPUT_TAGS		= 1 << 4,
     OUTPUT_SENDER	= 1 << 5,
     OUTPUT_RECIPIENTS	= 1 << 6,
+    OUTPUT_COUNT	= 1 << 7,
 } output_t;
 
-#define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
+#define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS | OUTPUT_COUNT)
 
 typedef enum {
     FILTER_BY_NAMEADDR = 0,
@@ -56,6 +57,7 @@ typedef struct {
 typedef struct {
     const char *name;
     const char *addr;
+    int count;
 } mailbox_t;
 
 /* Return two stable query strings that identify exactly the matched
@@ -244,6 +246,7 @@ check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *nam
 {
     notmuch_bool_t duplicite;
     char *key;
+    mailbox_t *mailbox;
 
     if (opt->filter_by == FILTER_BY_ADDRFOLD ||
 	opt->filter_by == FILTER_BY_NAMEADDRFOLD) {
@@ -274,12 +277,18 @@ check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *nam
     if (! key)
 	return FALSE;
 
-    duplicite = g_hash_table_lookup_extended (addrs, key, NULL, NULL);
+    duplicite = g_hash_table_lookup_extended (addrs, key, NULL, (gpointer)&mailbox);
 
-    if (! duplicite)
-	g_hash_table_insert (addrs, key, NULL);
-    else
+    if (! duplicite) {
+	mailbox = talloc (opt->format, mailbox_t);
+	mailbox->name = talloc_strdup (mailbox, name);
+	mailbox->addr = talloc_strdup (mailbox, addr);
+	mailbox->count = 1;
+	g_hash_table_insert (addrs, key, mailbox);
+    } else {
+	mailbox->count++;
 	talloc_free (key);
+    }
 
     return duplicite;
 }
@@ -289,6 +298,7 @@ print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
 {
     const char *name = mailbox->name;
     const char *addr = mailbox->addr;
+    int count = mailbox->count;
 
     if (opt->format->is_text_printer) {
 	char *mailbox_str;
@@ -302,6 +312,10 @@ print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
 	    fprintf (stderr, "Error: out of memory\n");
 	    return;
 	}
+	if (count > 0) {
+	    opt->format->integer (opt->format, count);
+	    opt->format->string (opt->format, "\t");
+	}
 	opt->format->string (opt->format, mailbox_str);
 	opt->format->separator (opt->format);
 
@@ -312,6 +326,10 @@ print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
 	opt->format->string (opt->format, name);
 	opt->format->map_key (opt->format, "address");
 	opt->format->string (opt->format, addr);
+	if (count > 0) {
+	    opt->format->map_key (opt->format, "count");
+	    opt->format->integer (opt->format, count);
+	}
 	opt->format->end (opt->format);
 	opt->format->separator (opt->format);
     }
@@ -341,11 +359,15 @@ process_address_list (const search_options_t *opt, GHashTable *addrs,
 	    mailbox_t mbx = {
 		.name = internet_address_get_name (address),
 		.addr = internet_address_mailbox_get_addr (mailbox),
+		.count = 0,
 	    };
 
 	    if (check_duplicite (opt, addrs, mbx.name, mbx.addr))
 		continue;
 
+	    if (opt->output & OUTPUT_COUNT)
+		continue;
+
 	    print_mailbox (opt, &mbx);
 	}
     }
@@ -372,6 +394,15 @@ _my_talloc_free_for_g_hash (void *ptr)
     talloc_free (ptr);
 }
 
+static void
+print_hash_value (unused (gpointer key), gpointer value, gpointer user_data)
+{
+    const mailbox_t *mailbox = value;
+    search_options_t *opt = user_data;
+
+    print_mailbox (opt, mailbox);
+}
+
 static int
 do_search_messages (search_options_t *opt)
 {
@@ -384,7 +415,7 @@ do_search_messages (search_options_t *opt)
 
     if (opt->output & OUTPUT_ADDRESS_FLAGS)
 	addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
-					   _my_talloc_free_for_g_hash, NULL);
+					   _my_talloc_free_for_g_hash, _my_talloc_free_for_g_hash);
 
     if (opt->offset < 0) {
 	opt->offset += notmuch_query_count_messages (opt->query);
@@ -451,6 +482,9 @@ do_search_messages (search_options_t *opt)
 	notmuch_message_destroy (message);
     }
 
+    if (addresses && opt->output & OUTPUT_COUNT)
+	g_hash_table_foreach (addresses, print_hash_value, opt);
+
     if (addresses)
 	g_hash_table_unref (addresses);
 
@@ -554,6 +588,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
 				  { "recipients", OUTPUT_RECIPIENTS },
 				  { "files", OUTPUT_FILES },
 				  { "tags", OUTPUT_TAGS },
+				  { "count", OUTPUT_COUNT },
 				  { 0, 0 } } },
         { NOTMUCH_OPT_KEYWORD, &exclude, "exclude", 'x',
           (notmuch_keyword_t []){ { "true", NOTMUCH_EXCLUDE_TRUE },
diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh
index 841a721..5a9bbc9 100755
--- a/test/T090-search-output.sh
+++ b/test/T090-search-output.sh
@@ -433,6 +433,56 @@ cat <<EOF >EXPECTED
 EOF
 test_expect_equal_file OUTPUT EXPECTED
 
+test_begin_subtest "--output=sender --output=count"
+notmuch search --output=sender --output=count '*' | sort -n >OUTPUT
+cat <<EOF >EXPECTED
+1	Adrian Perez de Castro <aperez@igalia.com>
+1	Aron Griffis <agriffis@n01se.net>
+1	Chris Wilson <chris@chris-wilson.co.uk>
+1	François Boulogne <boulogne.f@gmail.com>
+1	Ingmar Vanhassel <ingmar@exherbo.org>
+1	Israel Herraiz <isra@herraiz.org>
+1	Olivier Berger <olivier.berger@it-sudparis.eu>
+1	Rolland Santimano <rollandsantimano@yahoo.com>
+2	Alex Botero-Lowry <alex.boterolowry@gmail.com>
+2	Jjgod Jiang <gzjjgod@gmail.com>
+3	Stewart Smith <stewart@flamingspork.com>
+4	Alexander Botero-Lowry <alex.boterolowry@gmail.com>
+4	Jan Janak <jan@ryngle.com>
+5	Lars Kellogg-Stedman <lars@seas.harvard.edu>
+5	Mikhail Gusarov <dottedmag@dottedmag.net>
+7	Keith Packard <keithp@keithp.com>
+12	Carl Worth <cworth@cworth.org>
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "--output=sender --output=count --format=json"
+# Since the iteration order of GHashTable is not specified, we
+# preprocess and sort the results to keep the order stable here.
+notmuch search --output=sender --output=count --format=json '*' | \
+    sed -e 's/^\[//' -e 's/]$//' -e 's/,$//' | \
+    sort --field-separator=":" --key=4n --key=2 >OUTPUT
+cat <<EOF >EXPECTED
+{"name": "Adrian Perez de Castro", "address": "aperez@igalia.com", "count": 1}
+{"name": "Aron Griffis", "address": "agriffis@n01se.net", "count": 1}
+{"name": "Chris Wilson", "address": "chris@chris-wilson.co.uk", "count": 1}
+{"name": "François Boulogne", "address": "boulogne.f@gmail.com", "count": 1}
+{"name": "Ingmar Vanhassel", "address": "ingmar@exherbo.org", "count": 1}
+{"name": "Israel Herraiz", "address": "isra@herraiz.org", "count": 1}
+{"name": "Olivier Berger", "address": "olivier.berger@it-sudparis.eu", "count": 1}
+{"name": "Rolland Santimano", "address": "rollandsantimano@yahoo.com", "count": 1}
+{"name": "Alex Botero-Lowry", "address": "alex.boterolowry@gmail.com", "count": 2}
+{"name": "Jjgod Jiang", "address": "gzjjgod@gmail.com", "count": 2}
+{"name": "Stewart Smith", "address": "stewart@flamingspork.com", "count": 3}
+{"name": "Alexander Botero-Lowry", "address": "alex.boterolowry@gmail.com", "count": 4}
+{"name": "Jan Janak", "address": "jan@ryngle.com", "count": 4}
+{"name": "Lars Kellogg-Stedman", "address": "lars@seas.harvard.edu", "count": 5}
+{"name": "Mikhail Gusarov", "address": "dottedmag@dottedmag.net", "count": 5}
+{"name": "Keith Packard", "address": "keithp@keithp.com", "count": 7}
+{"name": "Carl Worth", "address": "cworth@cworth.org", "count": 12}
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
 test_begin_subtest "--output=recipients"
 notmuch search --output=recipients '*' >OUTPUT
 cat <<EOF >EXPECTED
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/6] notmuch search --output=sender/recipients
  2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
                   ` (5 preceding siblings ...)
  2014-10-27 14:50 ` [PATCH v4 6/6] cli: search: Add --output=count Michal Sojka
@ 2014-10-28 21:19 ` Tomi Ollila
  6 siblings, 0 replies; 16+ messages in thread
From: Tomi Ollila @ 2014-10-28 21:19 UTC (permalink / raw)
  To: Michal Sojka, notmuch

On Mon, Oct 27 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:

> Hi all,
>
> this is v4 of the search --output=address series. It obsoletes v3 that
> starts at id:1413150093-8383-1-git-send-email-sojkam1@fel.cvut.cz.
>
> It addresses most comments made by Mark Walters and others. In
> addition to v3, it also implements new --output=count (also suggested
> by Mark). This required changes even in the initial patches of the
> series but the result is cleaner code.

I looked through the messages

 id:1414421455-3037-2-git-send-email-sojkam1@fel.cvut.cz
 id:1414421455-3037-3-git-send-email-sojkam1@fel.cvut.cz
 id:1414421455-3037-4-git-send-email-sojkam1@fel.cvut.cz
 id:1414421455-3037-5-git-send-email-sojkam1@fel.cvut.cz

First I read the changes in this email thread and then changes in each of
these messages with xxdiff(1) -- and applied the patches and ran tests
(which passed).

These 4 patches LGTM.

The filtering/uniqueness whatnot handling still ¡IMO! has things to hope
for, which I will write later...

Finally, a promise I made a in reply to another patch series: Although
these emails are patches when sent to this mailing list, when these
are applied and pushed to notmuch repository the commit message should
not refer these as patches -- and therefore the email part that will
eventually be in commit message should be written differently. But that
is not a reason enough to require patch email resent.


Tomi


>
> Detailed changes from v3:
>
> - `o' renamed to `opt'.
> - Conversion of --output from keyword to keyword-flags is now a
>   separate patch.
> - Structured output formats print name and address separately.
> - Added test for --format=json.
> - Changed --filter-by default to nameaddr. In v2, the default was
>   addrfold, in v3 the default was no filtering at all. I believe that
>   Mark's suggestion to make nameaddr the default is good trade off.
> - Added new --output=count
> - Minor style fixes
> - Few typos fixed
> - There is no way to output unfiltered (duplicite) addresses.
>   Hopefully, the introduction of --output=count is sufficient
>   replacement for this "feature".
>
> Cheers,
> -Michal
>
>
> Jani Nikula (1):
>   cli: Add support for parsing keyword-flag arguments
>
> Michal Sojka (5):
>   cli: search: Refactor passing of command line options
>   cli: search: Convert --output to keyword-flag argument
>   cli: search: Add --output={sender,recipients}
>   cli: search: Add configurable way to filter out duplicate addresses
>   cli: search: Add --output=count
>
>  command-line-arguments.c           |   6 +-
>  command-line-arguments.h           |   1 +
>  completion/notmuch-completion.bash |   8 +-
>  completion/notmuch-completion.zsh  |   4 +-
>  doc/man1/notmuch-search.rst        |  65 ++++++-
>  notmuch-search.c                   | 385 +++++++++++++++++++++++++++++--------
>  test/T090-search-output.sh         | 137 +++++++++++++
>  test/T095-search-filter-by.sh      |  64 ++++++
>  test/T410-argument-parsing.sh      |   3 +-
>  test/arg-test.c                    |   9 +
>  10 files changed, 600 insertions(+), 82 deletions(-)
>  create mode 100755 test/T095-search-filter-by.sh
>
> -- 
> 2.1.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/6] cli: search: Refactor passing of command line options
  2014-10-27 14:50 ` [PATCH v4 1/6] cli: search: Refactor passing of command line options Michal Sojka
@ 2014-10-30  7:57   ` Mark Walters
  2014-10-30 21:14     ` Michal Sojka
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Walters @ 2014-10-30  7:57 UTC (permalink / raw)
  To: Michal Sojka, notmuch


Hi

On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> Many functions that implement the search command need to access command
> line options. Instead of passing each option in a separate variable, put
> them in a structure and pass only this structure.
>
> This will become handy in the following patches.
> ---
>  notmuch-search.c | 125 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 64 insertions(+), 61 deletions(-)
>
> diff --git a/notmuch-search.c b/notmuch-search.c
> index bc9be45..0c3e972 100644
> --- a/notmuch-search.c
> +++ b/notmuch-search.c
> @@ -30,6 +30,16 @@ typedef enum {
>      OUTPUT_TAGS
>  } output_t;
>  
> +typedef struct {
> +    sprinter_t *format;
> +    notmuch_query_t *query;
> +    notmuch_sort_t sort;
> +    output_t output;
> +    int offset;
> +    int limit;
> +    int dupe;
> +} search_options_t;
> +
>  /* Return two stable query strings that identify exactly the matched
>   * and unmatched messages currently in thread.  If there are no
>   * matched or unmatched messages, the returned buffers will be
> @@ -70,43 +80,39 @@ get_thread_query (notmuch_thread_t *thread,
>  }
>  
>  static int
> -do_search_threads (sprinter_t *format,
> -		   notmuch_query_t *query,
> -		   notmuch_sort_t sort,
> -		   output_t output,
> -		   int offset,
> -		   int limit)
> +do_search_threads (search_options_t *opt)
>  {
>      notmuch_thread_t *thread;
>      notmuch_threads_t *threads;
>      notmuch_tags_t *tags;
> +    sprinter_t *format = opt->format;
>      time_t date;
>      int i;
>  
> -    if (offset < 0) {
> -	offset += notmuch_query_count_threads (query);
> -	if (offset < 0)
> -	    offset = 0;
> +    if (opt->offset < 0) {
> +	opt->offset += notmuch_query_count_threads (opt->query);
> +	if (opt->offset < 0)
> +	    opt->offset = 0;
>      }
>  
> -    threads = notmuch_query_search_threads (query);
> +    threads = notmuch_query_search_threads (opt->query);
>      if (threads == NULL)
>  	return 1;
>  
>      format->begin_list (format);
>  
>      for (i = 0;
> -	 notmuch_threads_valid (threads) && (limit < 0 || i < offset + limit);
> +	 notmuch_threads_valid (threads) && (opt->limit < 0 || i < opt->offset + opt->limit);
>  	 notmuch_threads_move_to_next (threads), i++)
>      {
>  	thread = notmuch_threads_get (threads);
>  
> -	if (i < offset) {
> +	if (i < opt->offset) {
>  	    notmuch_thread_destroy (thread);
>  	    continue;
>  	}
>  
> -	if (output == OUTPUT_THREADS) {
> +	if (opt->output == OUTPUT_THREADS) {
>  	    format->set_prefix (format, "thread");
>  	    format->string (format,
>  			    notmuch_thread_get_thread_id (thread));
> @@ -123,7 +129,7 @@ do_search_threads (sprinter_t *format,
>  
>  	    format->begin_map (format);
>  
> -	    if (sort == NOTMUCH_SORT_OLDEST_FIRST)
> +	    if (opt->sort == NOTMUCH_SORT_OLDEST_FIRST)
>  		date = notmuch_thread_get_oldest_date (thread);
>  	    else
>  		date = notmuch_thread_get_newest_date (thread);
> @@ -215,40 +221,36 @@ do_search_threads (sprinter_t *format,
>  }
>  
>  static int
> -do_search_messages (sprinter_t *format,
> -		    notmuch_query_t *query,
> -		    output_t output,
> -		    int offset,
> -		    int limit,
> -		    int dupe)
> +do_search_messages (search_options_t *opt)
>  {
>      notmuch_message_t *message;
>      notmuch_messages_t *messages;
>      notmuch_filenames_t *filenames;
> +    sprinter_t *format = opt->format;
>      int i;
>  
> -    if (offset < 0) {
> -	offset += notmuch_query_count_messages (query);
> -	if (offset < 0)
> -	    offset = 0;
> +    if (opt->offset < 0) {
> +	opt->offset += notmuch_query_count_messages (opt->query);
> +	if (opt->offset < 0)
> +	    opt->offset = 0;
>      }
>  
> -    messages = notmuch_query_search_messages (query);
> +    messages = notmuch_query_search_messages (opt->query);
>      if (messages == NULL)
>  	return 1;
>  
>      format->begin_list (format);
>  
>      for (i = 0;
> -	 notmuch_messages_valid (messages) && (limit < 0 || i < offset + limit);
> +	 notmuch_messages_valid (messages) && (opt->limit < 0 || i < opt->offset + opt->limit);
>  	 notmuch_messages_move_to_next (messages), i++)
>      {
> -	if (i < offset)
> +	if (i < opt->offset)
>  	    continue;
>  
>  	message = notmuch_messages_get (messages);
>  
> -	if (output == OUTPUT_FILES) {
> +	if (opt->output == OUTPUT_FILES) {
>  	    int j;
>  	    filenames = notmuch_message_get_filenames (message);
>  
> @@ -256,7 +258,7 @@ do_search_messages (sprinter_t *format,
>  		 notmuch_filenames_valid (filenames);
>  		 notmuch_filenames_move_to_next (filenames), j++)
>  	    {
> -		if (dupe < 0 || dupe == j) {
> +		if (opt->dupe < 0 || opt->dupe == j) {
>  		    format->string (format, notmuch_filenames_get (filenames));
>  		    format->separator (format);
>  		}
> @@ -283,12 +285,13 @@ do_search_messages (sprinter_t *format,
>  
>  static int
>  do_search_tags (notmuch_database_t *notmuch,
> -		sprinter_t *format,
> -		notmuch_query_t *query)
> +		const search_options_t *opt)
>  {

What about leaving out the *notmuch argument and getting that from the
query (query->notmuch I think)? 

Otherwise this looks good to me.

Best wishes

Mark

>      notmuch_messages_t *messages = NULL;
>      notmuch_tags_t *tags;
>      const char *tag;
> +    sprinter_t *format = opt->format;
> +    notmuch_query_t *query = opt->query;
>  
>      /* should the following only special case if no excluded terms
>       * specified? */
> @@ -333,16 +336,16 @@ int
>  notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  {
>      notmuch_database_t *notmuch;
> -    notmuch_query_t *query;
> +    search_options_t opt = {
> +	.sort = NOTMUCH_SORT_NEWEST_FIRST,
> +	.output = OUTPUT_SUMMARY,
> +	.offset = 0,
> +	.limit = -1, /* unlimited */
> +	.dupe = -1,
> +    };
>      char *query_str;
> -    notmuch_sort_t sort = NOTMUCH_SORT_NEWEST_FIRST;
> -    sprinter_t *format = NULL;
>      int opt_index, ret;
> -    output_t output = OUTPUT_SUMMARY;
> -    int offset = 0;
> -    int limit = -1; /* unlimited */
>      notmuch_exclude_t exclude = NOTMUCH_EXCLUDE_TRUE;
> -    int dupe = -1;
>      unsigned int i;
>  
>      enum {
> @@ -353,7 +356,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>      } format_sel = NOTMUCH_FORMAT_TEXT;
>  
>      notmuch_opt_desc_t options[] = {
> -	{ NOTMUCH_OPT_KEYWORD, &sort, "sort", 's',
> +	{ NOTMUCH_OPT_KEYWORD, &opt.sort, "sort", 's',
>  	  (notmuch_keyword_t []){ { "oldest-first", NOTMUCH_SORT_OLDEST_FIRST },
>  				  { "newest-first", NOTMUCH_SORT_NEWEST_FIRST },
>  				  { 0, 0 } } },
> @@ -364,7 +367,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  				  { "text0", NOTMUCH_FORMAT_TEXT0 },
>  				  { 0, 0 } } },
>  	{ NOTMUCH_OPT_INT, &notmuch_format_version, "format-version", 0, 0 },
> -	{ NOTMUCH_OPT_KEYWORD, &output, "output", 'o',
> +	{ NOTMUCH_OPT_KEYWORD, &opt.output, "output", 'o',
>  	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
>  				  { "threads", OUTPUT_THREADS },
>  				  { "messages", OUTPUT_MESSAGES },
> @@ -377,9 +380,9 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>                                    { "flag", NOTMUCH_EXCLUDE_FLAG },
>                                    { "all", NOTMUCH_EXCLUDE_ALL },
>                                    { 0, 0 } } },
> -	{ NOTMUCH_OPT_INT, &offset, "offset", 'O', 0 },
> -	{ NOTMUCH_OPT_INT, &limit, "limit", 'L', 0  },
> -	{ NOTMUCH_OPT_INT, &dupe, "duplicate", 'D', 0  },
> +	{ NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 },
> +	{ NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0  },
> +	{ NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0  },
>  	{ 0, 0, 0, 0, 0 }
>      };
>  
> @@ -389,20 +392,20 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  
>      switch (format_sel) {
>      case NOTMUCH_FORMAT_TEXT:
> -	format = sprinter_text_create (config, stdout);
> +	opt.format = sprinter_text_create (config, stdout);
>  	break;
>      case NOTMUCH_FORMAT_TEXT0:
> -	if (output == OUTPUT_SUMMARY) {
> +	if (opt.output == OUTPUT_SUMMARY) {
>  	    fprintf (stderr, "Error: --format=text0 is not compatible with --output=summary.\n");
>  	    return EXIT_FAILURE;
>  	}
> -	format = sprinter_text0_create (config, stdout);
> +	opt.format = sprinter_text0_create (config, stdout);
>  	break;
>      case NOTMUCH_FORMAT_JSON:
> -	format = sprinter_json_create (config, stdout);
> +	opt.format = sprinter_json_create (config, stdout);
>  	break;
>      case NOTMUCH_FORMAT_SEXP:
> -	format = sprinter_sexp_create (config, stdout);
> +	opt.format = sprinter_sexp_create (config, stdout);
>  	break;
>      default:
>  	/* this should never happen */
> @@ -425,15 +428,15 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	return EXIT_FAILURE;
>      }
>  
> -    query = notmuch_query_create (notmuch, query_str);
> -    if (query == NULL) {
> +    opt.query = notmuch_query_create (notmuch, query_str);
> +    if (opt.query == NULL) {
>  	fprintf (stderr, "Out of memory\n");
>  	return EXIT_FAILURE;
>      }
>  
> -    notmuch_query_set_sort (query, sort);
> +    notmuch_query_set_sort (opt.query, opt.sort);
>  
> -    if (exclude == NOTMUCH_EXCLUDE_FLAG && output != OUTPUT_SUMMARY) {
> +    if (exclude == NOTMUCH_EXCLUDE_FLAG && opt.output != OUTPUT_SUMMARY) {
>  	/* If we are not doing summary output there is nowhere to
>  	 * print the excluded flag so fall back on including the
>  	 * excluded messages. */
> @@ -448,29 +451,29 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	search_exclude_tags = notmuch_config_get_search_exclude_tags
>  	    (config, &search_exclude_tags_length);
>  	for (i = 0; i < search_exclude_tags_length; i++)
> -	    notmuch_query_add_tag_exclude (query, search_exclude_tags[i]);
> -	notmuch_query_set_omit_excluded (query, exclude);
> +	    notmuch_query_add_tag_exclude (opt.query, search_exclude_tags[i]);
> +	notmuch_query_set_omit_excluded (opt.query, exclude);
>      }
>  
> -    switch (output) {
> +    switch (opt.output) {
>      default:
>      case OUTPUT_SUMMARY:
>      case OUTPUT_THREADS:
> -	ret = do_search_threads (format, query, sort, output, offset, limit);
> +	ret = do_search_threads (&opt);
>  	break;
>      case OUTPUT_MESSAGES:
>      case OUTPUT_FILES:
> -	ret = do_search_messages (format, query, output, offset, limit, dupe);
> +	ret = do_search_messages (&opt);
>  	break;
>      case OUTPUT_TAGS:
> -	ret = do_search_tags (notmuch, format, query);
> +	ret = do_search_tags (notmuch, &opt);
>  	break;
>      }
>  
> -    notmuch_query_destroy (query);
> +    notmuch_query_destroy (opt.query);
>      notmuch_database_destroy (notmuch);
>  
> -    talloc_free (format);
> +    talloc_free (opt.format);
>  
>      return ret ? EXIT_FAILURE : EXIT_SUCCESS;
>  }
> -- 
> 2.1.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/6] cli: search: Add --output={sender,recipients}
  2014-10-27 14:50 ` [PATCH v4 4/6] cli: search: Add --output={sender,recipients} Michal Sojka
@ 2014-10-30  8:10   ` Mark Walters
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Walters @ 2014-10-30  8:10 UTC (permalink / raw)
  To: Michal Sojka, notmuch

On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> The new outputs allow printing senders, recipients or both of matching
> messages. To print both, the user can use --output=sender and
> --output=recipients simultaneously.
>
> Currently, the same address can appear multiple times in the output. The
> next patch will change this. For this reason, the test are introduced in
> the next patch as well.
>
> We use mailbox_t rather than InternetAddressMailbox because we will need
> to extend it in a following patch.
>
> This code based on a patch from Jani Nikula.
> ---
>  completion/notmuch-completion.bash |   2 +-
>  completion/notmuch-completion.zsh  |   3 +-
>  doc/man1/notmuch-search.rst        |  22 +++++++-
>  notmuch-search.c                   | 112 ++++++++++++++++++++++++++++++++++++-
>  4 files changed, 134 insertions(+), 5 deletions(-)
>
> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
> index 0571dc9..cfbd389 100644
> --- a/completion/notmuch-completion.bash
> +++ b/completion/notmuch-completion.bash
> @@ -294,7 +294,7 @@ _notmuch_search()
>  	    return
>  	    ;;
>  	--output)
> -	    COMPREPLY=( $( compgen -W "summary threads messages files tags" -- "${cur}" ) )
> +	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients" -- "${cur}" ) )
>  	    return
>  	    ;;
>  	--sort)
> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
> index 67a9aba..3e52a00 100644
> --- a/completion/notmuch-completion.zsh
> +++ b/completion/notmuch-completion.zsh
> @@ -52,7 +52,8 @@ _notmuch_search()
>    _arguments -s : \
>      '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
>      '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
> -    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))'
> +    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
> +    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
>  }
>  
>  _notmuch()
> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
> index 90160f2..b6607c9 100644
> --- a/doc/man1/notmuch-search.rst
> +++ b/doc/man1/notmuch-search.rst
> @@ -35,7 +35,7 @@ Supported options for **search** include
>          intended for programs that invoke **notmuch(1)** internally. If
>          omitted, the latest supported version will be used.
>  
> -    ``--output=(summary|threads|messages|files|tags)``
> +    ``--output=(summary|threads|messages|files|tags|sender|recipients)``
>  
>          **summary**
>              Output a summary of each thread with any message matching
> @@ -78,6 +78,26 @@ Supported options for **search** include
>              by null characters (--format=text0), as a JSON array
>              (--format=json), or as an S-Expression list (--format=sexp).
>  
> +	**sender**
> +            Output all addresses from the *From* header that appear on
> +            any message matching the search terms, either one per line
> +            (--format=text), separated by null characters
> +            (--format=text0), as a JSON array (--format=json), or as
> +            an S-Expression list (--format=sexp).
> +
> +	    Note: Searching for **sender** should be much faster than
> +	    searching for **recipients**, because sender addresses are
> +	    cached directly in the database whereas other addresses
> +	    need to be fetched from message files.
> +
> +	**recipients**
> +            Like **sender** but for addresses from *To*, *Cc* and
> +	    *Bcc* headers.
> +
> +	This option can be given multiple times to combine different
> +	outputs. Currently, this is only supported for **sender** and
> +	**recipients** outputs.
> +
>      ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
>          This option can be used to present results in either
>          chronological order (**oldest-first**) or reverse chronological
> diff --git a/notmuch-search.c b/notmuch-search.c
> index ce46877..ce3bfb2 100644
> --- a/notmuch-search.c
> +++ b/notmuch-search.c
> @@ -28,8 +28,12 @@ typedef enum {
>      OUTPUT_MESSAGES	= 1 << 2,
>      OUTPUT_FILES	= 1 << 3,
>      OUTPUT_TAGS		= 1 << 4,
> +    OUTPUT_SENDER	= 1 << 5,
> +    OUTPUT_RECIPIENTS	= 1 << 6,
>  } output_t;
>  
> +#define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
> +
>  typedef struct {
>      sprinter_t *format;
>      notmuch_query_t *query;
> @@ -40,6 +44,11 @@ typedef struct {
>      int dupe;
>  } search_options_t;
>  
> +typedef struct {
> +    const char *name;
> +    const char *addr;
> +} mailbox_t;
> +
>  /* Return two stable query strings that identify exactly the matched
>   * and unmatched messages currently in thread.  If there are no
>   * matched or unmatched messages, the returned buffers will be
> @@ -220,6 +229,84 @@ do_search_threads (search_options_t *opt)
>      return 0;
>  }
>  
> +static void
> +print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
> +{
> +    const char *name = mailbox->name;
> +    const char *addr = mailbox->addr;
> +
> +    if (opt->format->is_text_printer) {
> +	char *mailbox_str;
> +
> +	if (name && *name)
> +	    mailbox_str = talloc_asprintf (opt->format, "%s <%s>", name, addr);
> +	else
> +	    mailbox_str = talloc_strdup (opt->format, addr);
> +
> +	if (! mailbox_str) {
> +	    fprintf (stderr, "Error: out of memory\n");
> +	    return;
> +	}
> +	opt->format->string (opt->format, mailbox_str);
> +	opt->format->separator (opt->format);
> +
> +	talloc_free (mailbox_str);
> +    } else {
> +	opt->format->begin_map (opt->format);
> +	opt->format->map_key (opt->format, "name");
> +	opt->format->string (opt->format, name);
> +	opt->format->map_key (opt->format, "address");
> +	opt->format->string (opt->format, addr);
> +	opt->format->end (opt->format);
> +	opt->format->separator (opt->format);
> +    }
> +}

How about letting format = opt->format in all the above. I think it
would improve readability and look more similar to the other cases.

> +
> +static void
> +process_address_list (const search_options_t *opt, InternetAddressList *list)
> +{

It might be worth documenting these functions: just something  brief
saying what arguments are expected and what is returned.

Otherwise LGTM.

Best wishes

Mark


> +    InternetAddress *address;
> +    int i;
> +
> +    for (i = 0; i < internet_address_list_length (list); i++) {
> +	address = internet_address_list_get_address (list, i);
> +	if (INTERNET_ADDRESS_IS_GROUP (address)) {
> +	    InternetAddressGroup *group;
> +	    InternetAddressList *group_list;
> +
> +	    group = INTERNET_ADDRESS_GROUP (address);
> +	    group_list = internet_address_group_get_members (group);
> +	    if (group_list == NULL)
> +		continue;
> +
> +	    process_address_list (opt, group_list);
> +	} else {
> +	    InternetAddressMailbox *mailbox = INTERNET_ADDRESS_MAILBOX (address);
> +	    mailbox_t mbx = {
> +		.name = internet_address_get_name (address),
> +		.addr = internet_address_mailbox_get_addr (mailbox),
> +	    };
> +
> +	    print_mailbox (opt, &mbx);
> +	}
> +    }
> +}
> +
> +static void
> +process_address_header (const search_options_t *opt, const char *value)
> +{
> +    InternetAddressList *list;
> +
> +    if (value == NULL)
> +	return;
> +
> +    list = internet_address_list_parse_string (value);
> +    if (list == NULL)
> +	return;
> +
> +    process_address_list (opt, list);
> +}
> +
>  static int
>  do_search_messages (search_options_t *opt)
>  {
> @@ -266,11 +353,29 @@ do_search_messages (search_options_t *opt)
>  	    
>  	    notmuch_filenames_destroy( filenames );
>  
> -	} else { /* output == OUTPUT_MESSAGES */
> +	} else if (opt->output == OUTPUT_MESSAGES) {
>  	    format->set_prefix (format, "id");
>  	    format->string (format,
>  			    notmuch_message_get_message_id (message));
>  	    format->separator (format);
> +	} else {
> +	    if (opt->output & OUTPUT_SENDER) {
> +		const char *addrs;
> +
> +		addrs = notmuch_message_get_header (message, "from");
> +		process_address_header (opt, addrs);
> +	    }
> +
> +	    if (opt->output & OUTPUT_RECIPIENTS) {
> +		const char *hdrs[] = { "to", "cc", "bcc" };
> +		const char *addrs;
> +		size_t j;
> +
> +		for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
> +		    addrs = notmuch_message_get_header (message, hdrs[j]);
> +		    process_address_header (opt, addrs);
> +		}
> +	    }
>  	}
>  
>  	notmuch_message_destroy (message);
> @@ -371,6 +476,8 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
>  				  { "threads", OUTPUT_THREADS },
>  				  { "messages", OUTPUT_MESSAGES },
> +				  { "sender", OUTPUT_SENDER },
> +				  { "recipients", OUTPUT_RECIPIENTS },
>  				  { "files", OUTPUT_FILES },
>  				  { "tags", OUTPUT_TAGS },
>  				  { 0, 0 } } },
> @@ -462,7 +569,8 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	opt.output == OUTPUT_THREADS)
>  	ret = do_search_threads (&opt);
>      else if (opt.output == OUTPUT_MESSAGES ||
> -	     opt.output == OUTPUT_FILES)
> +	     opt.output == OUTPUT_FILES ||
> +	     (opt.output & OUTPUT_ADDRESS_FLAGS && !(opt.output & ~OUTPUT_ADDRESS_FLAGS)))
>  	ret = do_search_messages (&opt);
>      else if (opt.output == OUTPUT_TAGS)
>  	ret = do_search_tags (notmuch, &opt);
> -- 
> 2.1.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-27 14:50 ` [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses Michal Sojka
@ 2014-10-30  8:16   ` Mark Walters
  2014-10-30  8:52     ` Tomi Ollila
  2014-10-30 21:34     ` Michal Sojka
  0 siblings, 2 replies; 16+ messages in thread
From: Mark Walters @ 2014-10-30  8:16 UTC (permalink / raw)
  To: Michal Sojka, notmuch


On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
> This adds an algorithm to filter out duplicate addresses from address
> outputs (sender, receivers). The algorithm can be configured with
> --filter-by command line option.
>
> The code here is an extended version of a patch from Jani Nikula.

Hi

As this is getting into the more controversial bike shedding region I
wonder if it would be worth splitting this into 2 patches: the first
could do the default dedupe based on name/address and the second could
do add the filter-by options. 

I think the default deduping is obviously worth doing but I am not sure
about the rest. In any case I think the default deduping could go in
pre-freeze but I would recommend the rest is left until after.

> ---
>  completion/notmuch-completion.bash |  6 ++-
>  completion/notmuch-completion.zsh  |  3 +-
>  doc/man1/notmuch-search.rst        | 38 +++++++++++++++
>  notmuch-search.c                   | 98 +++++++++++++++++++++++++++++++++++---
>  test/T090-search-output.sh         | 87 +++++++++++++++++++++++++++++++++
>  test/T095-search-filter-by.sh      | 64 +++++++++++++++++++++++++
>  6 files changed, 288 insertions(+), 8 deletions(-)
>  create mode 100755 test/T095-search-filter-by.sh
>
> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
> index cfbd389..6b6d43a 100644
> --- a/completion/notmuch-completion.bash
> +++ b/completion/notmuch-completion.bash
> @@ -305,12 +305,16 @@ _notmuch_search()
>  	    COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) )
>  	    return
>  	    ;;
> +	--filter-by)
> +	    COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) )
> +	    return
> +	    ;;
>      esac
>  
>      ! $split &&
>      case "${cur}" in
>  	-*)
> -	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate="
> +	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by="
>  	    compopt -o nospace
>  	    COMPREPLY=( $(compgen -W "$options" -- ${cur}) )
>  	    ;;
> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
> index 3e52a00..3e535df 100644
> --- a/completion/notmuch-completion.zsh
> +++ b/completion/notmuch-completion.zsh
> @@ -53,7 +53,8 @@ _notmuch_search()
>      '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
>      '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
>      '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
> -    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
> +    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))' \
> +    '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'
>  }
>  
>  _notmuch()
> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
> index b6607c9..84af2da 100644
> --- a/doc/man1/notmuch-search.rst
> +++ b/doc/man1/notmuch-search.rst
> @@ -85,6 +85,9 @@ Supported options for **search** include
>              (--format=text0), as a JSON array (--format=json), or as
>              an S-Expression list (--format=sexp).
>  
> +            Duplicate addresses are filtered out. Filtering can be
> +            configured with the --filter-by option.
> +
>  	    Note: Searching for **sender** should be much faster than
>  	    searching for **recipients**, because sender addresses are
>  	    cached directly in the database whereas other addresses
> @@ -151,6 +154,41 @@ Supported options for **search** include
>          prefix. The prefix matches messages based on filenames. This
>          option filters filenames of the matching messages.
>  
> +    ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\)
> +
> +	Can be used with ``--output=sender`` or
> +	``--output=recipients`` to filter out duplicate addresses. The
> +	filtering algorithm receives a sequence of email addresses and
> +	outputs the same sequence without the addresses that are
> +	considered a duplicate of a previously output address. What is
> +	considered a duplicate depends on how the two addresses are
> +	compared and this can be controlled with the follwing flags:
> +
> +	**nameaddr** means that both name and address parts are
> +	compared in case-sensitive manner. Therefore, all same looking
> +	addresses strings are considered duplicate. This is the
> +	default.
> +
> +	**name** means that only the name part is compared (in
> +	case-sensitive manner). For example, the addresses "John Doe
> +	<me@example.com>" and "John Doe <john@doe.name>" will be
> +	considered duplicate.
> +
> +	**addr** means that only the address part is compared (in
> +	case-sensitive manner). For example, the addresses "John Doe
> +	<john@example.com>" and "Dr. John Doe <john@example.com>" will
> +	be considered duplicate.
> +
> +	**addrfold** is like **addr**, but comparison is done in
> +	canse-insensitive manner. For example, the addresses "John Doe
> +	<john@example.com>" and "Dr. John Doe <JOHN@EXAMPLE.COM>" will
> +	be considered duplicate.
> +
> +	**nameaddrfold** is like **nameaddr**, but address comparison
> +	is done in canse-insensitive manner. For example, the
> +	addresses "John Doe <john@example.com>" and "John Doe
> +	<JOHN@EXAMPLE.COM>" will be considered duplicate.
> +
>  EXIT STATUS
>  ===========
>  
> diff --git a/notmuch-search.c b/notmuch-search.c
> index ce3bfb2..47aa979 100644
> --- a/notmuch-search.c
> +++ b/notmuch-search.c
> @@ -34,6 +34,14 @@ typedef enum {
>  
>  #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
>  
> +typedef enum {
> +    FILTER_BY_NAMEADDR = 0,
> +    FILTER_BY_NAME,
> +    FILTER_BY_ADDR,
> +    FILTER_BY_ADDRFOLD,
> +    FILTER_BY_NAMEADDRFOLD,
> +} filter_by_t;
> +
>  typedef struct {
>      sprinter_t *format;
>      notmuch_query_t *query;
> @@ -42,6 +50,7 @@ typedef struct {
>      int offset;
>      int limit;
>      int dupe;
> +    filter_by_t filter_by;
>  } search_options_t;
>  
>  typedef struct {
> @@ -229,6 +238,52 @@ do_search_threads (search_options_t *opt)
>      return 0;
>  }
>  
> +/* Returns TRUE iff name and/or addr is considered duplicite. */

A triviality; duplicite should be duplicate

> +static notmuch_bool_t
> +check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr)

I am not sure on style but maybe is_duplicate would be clearer?

Best wishes

Mark

> +{
> +    notmuch_bool_t duplicite;
> +    char *key;
> +
> +    if (opt->filter_by == FILTER_BY_ADDRFOLD ||
> +	opt->filter_by == FILTER_BY_NAMEADDRFOLD) {
> +	gchar *folded = g_utf8_casefold (addr, -1);
> +	addr = talloc_strdup (opt->format, folded);
> +	g_free (folded);
> +    }
> +    switch (opt->filter_by) {
> +    case FILTER_BY_NAMEADDR:
> +    case FILTER_BY_NAMEADDRFOLD:
> +	key = talloc_asprintf (opt->format, "%s <%s>", name, addr);
> +	break;
> +    case FILTER_BY_NAME:
> +	key = talloc_strdup (opt->format, name); /* !name results in !key */
> +	break;
> +    case FILTER_BY_ADDR:
> +    case FILTER_BY_ADDRFOLD:
> +	key = talloc_strdup (opt->format, addr);
> +	break;
> +    default:
> +	INTERNAL_ERROR("invalid --filter-by flags");
> +    }
> +
> +    if (opt->filter_by == FILTER_BY_ADDRFOLD ||
> +	opt->filter_by == FILTER_BY_NAMEADDRFOLD)
> +	talloc_free ((char*)addr);
> +
> +    if (! key)
> +	return FALSE;
> +
> +    duplicite = g_hash_table_lookup_extended (addrs, key, NULL, NULL);
> +
> +    if (! duplicite)
> +	g_hash_table_insert (addrs, key, NULL);
> +    else
> +	talloc_free (key);
> +
> +    return duplicite;
> +}
> +
>  static void
>  print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
>  {
> @@ -263,7 +318,8 @@ print_mailbox (const search_options_t *opt, const mailbox_t *mailbox)
>  }
>  
>  static void
> -process_address_list (const search_options_t *opt, InternetAddressList *list)
> +process_address_list (const search_options_t *opt, GHashTable *addrs,
> +		      InternetAddressList *list)
>  {
>      InternetAddress *address;
>      int i;
> @@ -279,7 +335,7 @@ process_address_list (const search_options_t *opt, InternetAddressList *list)
>  	    if (group_list == NULL)
>  		continue;
>  
> -	    process_address_list (opt, group_list);
> +	    process_address_list (opt, addrs, group_list);
>  	} else {
>  	    InternetAddressMailbox *mailbox = INTERNET_ADDRESS_MAILBOX (address);
>  	    mailbox_t mbx = {
> @@ -287,13 +343,16 @@ process_address_list (const search_options_t *opt, InternetAddressList *list)
>  		.addr = internet_address_mailbox_get_addr (mailbox),
>  	    };
>  
> +	    if (check_duplicite (opt, addrs, mbx.name, mbx.addr))
> +		continue;
> +
>  	    print_mailbox (opt, &mbx);
>  	}
>      }
>  }
>  
>  static void
> -process_address_header (const search_options_t *opt, const char *value)
> +process_address_header (const search_options_t *opt, GHashTable *addrs, const char *value)
>  {
>      InternetAddressList *list;
>  
> @@ -304,7 +363,13 @@ process_address_header (const search_options_t *opt, const char *value)
>      if (list == NULL)
>  	return;
>  
> -    process_address_list (opt, list);
> +    process_address_list (opt, addrs, list);
> +}
> +
> +static void
> +_my_talloc_free_for_g_hash (void *ptr)
> +{
> +    talloc_free (ptr);
>  }
>  
>  static int
> @@ -314,8 +379,13 @@ do_search_messages (search_options_t *opt)
>      notmuch_messages_t *messages;
>      notmuch_filenames_t *filenames;
>      sprinter_t *format = opt->format;
> +    GHashTable *addresses = NULL;
>      int i;
>  
> +    if (opt->output & OUTPUT_ADDRESS_FLAGS)
> +	addresses = g_hash_table_new_full (g_str_hash, g_str_equal,
> +					   _my_talloc_free_for_g_hash, NULL);
> +
>      if (opt->offset < 0) {
>  	opt->offset += notmuch_query_count_messages (opt->query);
>  	if (opt->offset < 0)
> @@ -363,7 +433,7 @@ do_search_messages (search_options_t *opt)
>  		const char *addrs;
>  
>  		addrs = notmuch_message_get_header (message, "from");
> -		process_address_header (opt, addrs);
> +		process_address_header (opt, addresses, addrs);
>  	    }
>  
>  	    if (opt->output & OUTPUT_RECIPIENTS) {
> @@ -373,7 +443,7 @@ do_search_messages (search_options_t *opt)
>  
>  		for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
>  		    addrs = notmuch_message_get_header (message, hdrs[j]);
> -		    process_address_header (opt, addrs);
> +		    process_address_header (opt, addresses, addrs);
>  		}
>  	    }
>  	}
> @@ -381,6 +451,9 @@ do_search_messages (search_options_t *opt)
>  	notmuch_message_destroy (message);
>      }
>  
> +    if (addresses)
> +	g_hash_table_unref (addresses);
> +
>      notmuch_messages_destroy (messages);
>  
>      format->end (format);
> @@ -447,6 +520,7 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	.offset = 0,
>  	.limit = -1, /* unlimited */
>  	.dupe = -1,
> +	.filter_by = FILTER_BY_NAMEADDR,
>      };
>      char *query_str;
>      int opt_index, ret;
> @@ -490,6 +564,13 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>  	{ NOTMUCH_OPT_INT, &opt.offset, "offset", 'O', 0 },
>  	{ NOTMUCH_OPT_INT, &opt.limit, "limit", 'L', 0  },
>  	{ NOTMUCH_OPT_INT, &opt.dupe, "duplicate", 'D', 0  },
> +	{ NOTMUCH_OPT_KEYWORD, &opt.filter_by, "filter-by", 'b',
> +	  (notmuch_keyword_t []){ { "nameaddr", FILTER_BY_NAMEADDR },
> +				  { "name", FILTER_BY_NAME },
> +				  { "addr", FILTER_BY_ADDR },
> +				  { "addrfold", FILTER_BY_ADDRFOLD },
> +				  { "nameaddrfold", FILTER_BY_NAMEADDRFOLD },
> +				  { 0, 0 } } },
>  	{ 0, 0, 0, 0, 0 }
>      };
>  
> @@ -500,6 +581,11 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>      if (! opt.output)
>  	opt.output = OUTPUT_SUMMARY;
>  
> +    if (opt.filter_by && !(opt.output & OUTPUT_ADDRESS_FLAGS)) {
> +	fprintf (stderr, "Error: --filter-by can only be used with address output.\n");
> +	return EXIT_FAILURE;
> +    }
> +
>      switch (format_sel) {
>      case NOTMUCH_FORMAT_TEXT:
>  	opt.format = sprinter_text_create (config, stdout);
> diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh
> index 947d572..841a721 100755
> --- a/test/T090-search-output.sh
> +++ b/test/T090-search-output.sh
> @@ -387,6 +387,93 @@ cat <<EOF >EXPECTED
>  EOF
>  test_expect_equal_file OUTPUT EXPECTED
>  
> +test_begin_subtest "--output=sender"
> +notmuch search --output=sender '*' >OUTPUT
> +cat <<EOF >EXPECTED
> +François Boulogne <boulogne.f@gmail.com>
> +Olivier Berger <olivier.berger@it-sudparis.eu>
> +Chris Wilson <chris@chris-wilson.co.uk>
> +Carl Worth <cworth@cworth.org>
> +Alexander Botero-Lowry <alex.boterolowry@gmail.com>
> +Keith Packard <keithp@keithp.com>
> +Jjgod Jiang <gzjjgod@gmail.com>
> +Rolland Santimano <rollandsantimano@yahoo.com>
> +Jan Janak <jan@ryngle.com>
> +Stewart Smith <stewart@flamingspork.com>
> +Lars Kellogg-Stedman <lars@seas.harvard.edu>
> +Alex Botero-Lowry <alex.boterolowry@gmail.com>
> +Ingmar Vanhassel <ingmar@exherbo.org>
> +Aron Griffis <agriffis@n01se.net>
> +Adrian Perez de Castro <aperez@igalia.com>
> +Israel Herraiz <isra@herraiz.org>
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=sender --format=json"
> +notmuch search --output=sender --format=json '*' >OUTPUT
> +cat <<EOF >EXPECTED
> +[{"name": "François Boulogne", "address": "boulogne.f@gmail.com"},
> +{"name": "Olivier Berger", "address": "olivier.berger@it-sudparis.eu"},
> +{"name": "Chris Wilson", "address": "chris@chris-wilson.co.uk"},
> +{"name": "Carl Worth", "address": "cworth@cworth.org"},
> +{"name": "Alexander Botero-Lowry", "address": "alex.boterolowry@gmail.com"},
> +{"name": "Keith Packard", "address": "keithp@keithp.com"},
> +{"name": "Jjgod Jiang", "address": "gzjjgod@gmail.com"},
> +{"name": "Rolland Santimano", "address": "rollandsantimano@yahoo.com"},
> +{"name": "Jan Janak", "address": "jan@ryngle.com"},
> +{"name": "Stewart Smith", "address": "stewart@flamingspork.com"},
> +{"name": "Lars Kellogg-Stedman", "address": "lars@seas.harvard.edu"},
> +{"name": "Alex Botero-Lowry", "address": "alex.boterolowry@gmail.com"},
> +{"name": "Ingmar Vanhassel", "address": "ingmar@exherbo.org"},
> +{"name": "Aron Griffis", "address": "agriffis@n01se.net"},
> +{"name": "Adrian Perez de Castro", "address": "aperez@igalia.com"},
> +{"name": "Israel Herraiz", "address": "isra@herraiz.org"},
> +{"name": "Mikhail Gusarov", "address": "dottedmag@dottedmag.net"}]
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients"
> +notmuch search --output=recipients '*' >OUTPUT
> +cat <<EOF >EXPECTED
> +Allan McRae <allan@archlinux.org>
> +Discussion about the Arch User Repository (AUR) <aur-general@archlinux.org>
> +olivier.berger@it-sudparis.eu
> +notmuch@notmuchmail.org
> +notmuch <notmuch@notmuchmail.org>
> +Keith Packard <keithp@keithp.com>
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=sender --output=recipients"
> +notmuch search --output=sender --output=recipients '*' >OUTPUT
> +cat <<EOF >EXPECTED
> +François Boulogne <boulogne.f@gmail.com>
> +Allan McRae <allan@archlinux.org>
> +Discussion about the Arch User Repository (AUR) <aur-general@archlinux.org>
> +Olivier Berger <olivier.berger@it-sudparis.eu>
> +olivier.berger@it-sudparis.eu
> +Chris Wilson <chris@chris-wilson.co.uk>
> +notmuch@notmuchmail.org
> +Carl Worth <cworth@cworth.org>
> +Alexander Botero-Lowry <alex.boterolowry@gmail.com>
> +Keith Packard <keithp@keithp.com>
> +Jjgod Jiang <gzjjgod@gmail.com>
> +Rolland Santimano <rollandsantimano@yahoo.com>
> +Jan Janak <jan@ryngle.com>
> +Stewart Smith <stewart@flamingspork.com>
> +Lars Kellogg-Stedman <lars@seas.harvard.edu>
> +notmuch <notmuch@notmuchmail.org>
> +Alex Botero-Lowry <alex.boterolowry@gmail.com>
> +Ingmar Vanhassel <ingmar@exherbo.org>
> +Aron Griffis <agriffis@n01se.net>
> +Adrian Perez de Castro <aperez@igalia.com>
> +Israel Herraiz <isra@herraiz.org>
> +Mikhail Gusarov <dottedmag@dottedmag.net>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
>  test_begin_subtest "sanitize output for quoted-printable line-breaks in author and subject"
>  add_message "[subject]='two =?ISO-8859-1?Q?line=0A_subject?=
>  	headers'"
> diff --git a/test/T095-search-filter-by.sh b/test/T095-search-filter-by.sh
> new file mode 100755
> index 0000000..97d9a9b
> --- /dev/null
> +++ b/test/T095-search-filter-by.sh
> @@ -0,0 +1,64 @@
> +#!/usr/bin/env bash
> +test_description='duplicite address filtering in "notmuch search --output=recipients"'
> +. ./test-lib.sh
> +
> +add_message '[to]="Real Name <foo@example.com>, Real Name <bar@example.com>"'
> +add_message '[to]="Nickname <foo@example.com>"' '[cc]="Real Name <Bar@Example.COM>"'
> +add_message '[to]="Nickname <foo@example.com>"' '[bcc]="Real Name <Bar@Example.COM>"'
> +
> +test_begin_subtest "--output=recipients"
> +notmuch search --output=recipients "*" >OUTPUT
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Real Name <bar@example.com>
> +Nickname <foo@example.com>
> +Real Name <Bar@Example.COM>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients --filter-by=nameaddr"
> +notmuch search --output=recipients --filter-by=nameaddr "*" >OUTPUT
> +# The same as above
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Real Name <bar@example.com>
> +Nickname <foo@example.com>
> +Real Name <Bar@Example.COM>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients --filter-by=name"
> +notmuch search --output=recipients --filter-by=name "*" >OUTPUT
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Nickname <foo@example.com>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients --filter-by=addr"
> +notmuch search --output=recipients --filter-by=addr "*" >OUTPUT
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Real Name <bar@example.com>
> +Real Name <Bar@Example.COM>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients --filter-by=addrfold"
> +notmuch search --output=recipients --filter-by=addrfold "*" >OUTPUT
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Real Name <bar@example.com>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "--output=recipients --filter-by=nameaddrfold"
> +notmuch search --output=recipients --filter-by=nameaddrfold "*" >OUTPUT
> +cat <<EOF >EXPECTED
> +Real Name <foo@example.com>
> +Real Name <bar@example.com>
> +Nickname <foo@example.com>
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_done
> -- 
> 2.1.1
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-30  8:16   ` Mark Walters
@ 2014-10-30  8:52     ` Tomi Ollila
  2014-10-30  9:00       ` Mark Walters
  2014-10-30 21:42       ` Michal Sojka
  2014-10-30 21:34     ` Michal Sojka
  1 sibling, 2 replies; 16+ messages in thread
From: Tomi Ollila @ 2014-10-30  8:52 UTC (permalink / raw)
  To: Mark Walters, Michal Sojka, notmuch

On Thu, Oct 30 2014, Mark Walters <markwalters1009@gmail.com> wrote:

> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>> This adds an algorithm to filter out duplicate addresses from address
>> outputs (sender, receivers). The algorithm can be configured with
>> --filter-by command line option.
>>
>> The code here is an extended version of a patch from Jani Nikula.
>
> Hi
>
> As this is getting into the more controversial bike shedding region I
> wonder if it would be worth splitting this into 2 patches: the first
> could do the default dedupe based on name/address and the second could
> do add the filter-by options. 
>
> I think the default deduping is obviously worth doing but I am not sure
> about the rest. In any case I think the default deduping could go in
> pre-freeze but I would recommend the rest is left until after.

I can agree with that, but there is one hard thing to resolve: 
"naming things"(*)

(*) http://martinfowler.com/bliki/TwoHardThings.html

With all rest ignored (sorry no time to work on this in more detail now),
this default deduping could be done with single argument '--unique'...

Tomi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-30  8:52     ` Tomi Ollila
@ 2014-10-30  9:00       ` Mark Walters
  2014-10-30 21:42       ` Michal Sojka
  1 sibling, 0 replies; 16+ messages in thread
From: Mark Walters @ 2014-10-30  9:00 UTC (permalink / raw)
  To: Tomi Ollila, Michal Sojka, notmuch

On Thu, 30 Oct 2014, Tomi Ollila <tomi.ollila@iki.fi> wrote:
> On Thu, Oct 30 2014, Mark Walters <markwalters1009@gmail.com> wrote:
>
>> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>>> This adds an algorithm to filter out duplicate addresses from address
>>> outputs (sender, receivers). The algorithm can be configured with
>>> --filter-by command line option.
>>>
>>> The code here is an extended version of a patch from Jani Nikula.
>>
>> Hi
>>
>> As this is getting into the more controversial bike shedding region I
>> wonder if it would be worth splitting this into 2 patches: the first
>> could do the default dedupe based on name/address and the second could
>> do add the filter-by options. 
>>
>> I think the default deduping is obviously worth doing but I am not sure
>> about the rest. In any case I think the default deduping could go in
>> pre-freeze but I would recommend the rest is left until after.
>
> I can agree with that, but there is one hard thing to resolve: 
> "naming things"(*)
>
> (*) http://martinfowler.com/bliki/TwoHardThings.html
>
> With all rest ignored (sorry no time to work on this in more detail now),
> this default deduping could be done with single argument '--unique'...

In this case I am suggesting that to start with the default deduping is
unconditionally done and that there is no command line argument. We can
decide on other filter options, possibly including a completely
unfiltered list (*), later.

Best wishes

Mark

(*) Personally I don't really see a use case for the unfiltered list but
others may disagree.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/6] cli: search: Refactor passing of command line options
  2014-10-30  7:57   ` Mark Walters
@ 2014-10-30 21:14     ` Michal Sojka
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-30 21:14 UTC (permalink / raw)
  To: Mark Walters, notmuch

Hi Mark,

On Thu, Oct 30 2014, Mark Walters wrote:
> Hi
>
> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>> Many functions that implement the search command need to access command
>> line options. Instead of passing each option in a separate variable, put
>> them in a structure and pass only this structure.
>>
>> This will become handy in the following patches.
>> ---
>>  notmuch-search.c | 125 ++++++++++++++++++++++++++++---------------------------
>>  1 file changed, 64 insertions(+), 61 deletions(-)
>>
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index bc9be45..0c3e972 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -30,6 +30,16 @@ typedef enum {
>>      OUTPUT_TAGS
>>  } output_t;
>>  
>> +typedef struct {
>> +    sprinter_t *format;
>> +    notmuch_query_t *query;
>> +    notmuch_sort_t sort;
>> +    output_t output;
>> +    int offset;
>> +    int limit;
>> +    int dupe;
>> +} search_options_t;
>> +
>>  /* Return two stable query strings that identify exactly the matched
>>   * and unmatched messages currently in thread.  If there are no
>>   * matched or unmatched messages, the returned buffers will be
>> @@ -70,43 +80,39 @@ get_thread_query (notmuch_thread_t *thread,
>>  }
>>  
>>  static int
>> -do_search_threads (sprinter_t *format,
>> -		   notmuch_query_t *query,
>> -		   notmuch_sort_t sort,
>> -		   output_t output,
>> -		   int offset,
>> -		   int limit)
>> +do_search_threads (search_options_t *opt)
>>  {
>>      notmuch_thread_t *thread;
>>      notmuch_threads_t *threads;
>>      notmuch_tags_t *tags;
>> +    sprinter_t *format = opt->format;
>>      time_t date;
>>      int i;
>>  
>> -    if (offset < 0) {
>> -	offset += notmuch_query_count_threads (query);
>> -	if (offset < 0)
>> -	    offset = 0;
>> +    if (opt->offset < 0) {
>> +	opt->offset += notmuch_query_count_threads (opt->query);
>> +	if (opt->offset < 0)
>> +	    opt->offset = 0;
>>      }
>>  
>> -    threads = notmuch_query_search_threads (query);
>> +    threads = notmuch_query_search_threads (opt->query);
>>      if (threads == NULL)
>>  	return 1;
>>  
>>      format->begin_list (format);
>>  
>>      for (i = 0;
>> -	 notmuch_threads_valid (threads) && (limit < 0 || i < offset + limit);
>> +	 notmuch_threads_valid (threads) && (opt->limit < 0 || i < opt->offset + opt->limit);
>>  	 notmuch_threads_move_to_next (threads), i++)
>>      {
>>  	thread = notmuch_threads_get (threads);
>>  
>> -	if (i < offset) {
>> +	if (i < opt->offset) {
>>  	    notmuch_thread_destroy (thread);
>>  	    continue;
>>  	}
>>  
>> -	if (output == OUTPUT_THREADS) {
>> +	if (opt->output == OUTPUT_THREADS) {
>>  	    format->set_prefix (format, "thread");
>>  	    format->string (format,
>>  			    notmuch_thread_get_thread_id (thread));
>> @@ -123,7 +129,7 @@ do_search_threads (sprinter_t *format,
>>  
>>  	    format->begin_map (format);
>>  
>> -	    if (sort == NOTMUCH_SORT_OLDEST_FIRST)
>> +	    if (opt->sort == NOTMUCH_SORT_OLDEST_FIRST)
>>  		date = notmuch_thread_get_oldest_date (thread);
>>  	    else
>>  		date = notmuch_thread_get_newest_date (thread);
>> @@ -215,40 +221,36 @@ do_search_threads (sprinter_t *format,
>>  }
>>  
>>  static int
>> -do_search_messages (sprinter_t *format,
>> -		    notmuch_query_t *query,
>> -		    output_t output,
>> -		    int offset,
>> -		    int limit,
>> -		    int dupe)
>> +do_search_messages (search_options_t *opt)
>>  {
>>      notmuch_message_t *message;
>>      notmuch_messages_t *messages;
>>      notmuch_filenames_t *filenames;
>> +    sprinter_t *format = opt->format;
>>      int i;
>>  
>> -    if (offset < 0) {
>> -	offset += notmuch_query_count_messages (query);
>> -	if (offset < 0)
>> -	    offset = 0;
>> +    if (opt->offset < 0) {
>> +	opt->offset += notmuch_query_count_messages (opt->query);
>> +	if (opt->offset < 0)
>> +	    opt->offset = 0;
>>      }
>>  
>> -    messages = notmuch_query_search_messages (query);
>> +    messages = notmuch_query_search_messages (opt->query);
>>      if (messages == NULL)
>>  	return 1;
>>  
>>      format->begin_list (format);
>>  
>>      for (i = 0;
>> -	 notmuch_messages_valid (messages) && (limit < 0 || i < offset + limit);
>> +	 notmuch_messages_valid (messages) && (opt->limit < 0 || i < opt->offset + opt->limit);
>>  	 notmuch_messages_move_to_next (messages), i++)
>>      {
>> -	if (i < offset)
>> +	if (i < opt->offset)
>>  	    continue;
>>  
>>  	message = notmuch_messages_get (messages);
>>  
>> -	if (output == OUTPUT_FILES) {
>> +	if (opt->output == OUTPUT_FILES) {
>>  	    int j;
>>  	    filenames = notmuch_message_get_filenames (message);
>>  
>> @@ -256,7 +258,7 @@ do_search_messages (sprinter_t *format,
>>  		 notmuch_filenames_valid (filenames);
>>  		 notmuch_filenames_move_to_next (filenames), j++)
>>  	    {
>> -		if (dupe < 0 || dupe == j) {
>> +		if (opt->dupe < 0 || opt->dupe == j) {
>>  		    format->string (format, notmuch_filenames_get (filenames));
>>  		    format->separator (format);
>>  		}
>> @@ -283,12 +285,13 @@ do_search_messages (sprinter_t *format,
>>  
>>  static int
>>  do_search_tags (notmuch_database_t *notmuch,
>> -		sprinter_t *format,
>> -		notmuch_query_t *query)
>> +		const search_options_t *opt)
>>  {
>
> What about leaving out the *notmuch argument and getting that from the
> query (query->notmuch I think)?

I was also thinking about this, but notmuch_query_t is an opaque
structure which means that we would need a getter function in the
library. I think it is simpler to have it this way.

-Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-30  8:16   ` Mark Walters
  2014-10-30  8:52     ` Tomi Ollila
@ 2014-10-30 21:34     ` Michal Sojka
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-30 21:34 UTC (permalink / raw)
  To: Mark Walters, notmuch

On Thu, Oct 30 2014, Mark Walters wrote:
> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>> This adds an algorithm to filter out duplicate addresses from address
>> outputs (sender, receivers). The algorithm can be configured with
>> --filter-by command line option.
>>
>> The code here is an extended version of a patch from Jani Nikula.
>
> Hi
>
> As this is getting into the more controversial bike shedding region I
> wonder if it would be worth splitting this into 2 patches: the first
> could do the default dedupe based on name/address and the second could
> do add the filter-by options. 
>
> I think the default deduping is obviously worth doing but I am not sure
> about the rest. In any case I think the default deduping could go in
> pre-freeze but I would recommend the rest is left until after.

Yes, this makes sense. I'll send v5 in a while.

>
>> ---
>>  completion/notmuch-completion.bash |  6 ++-
>>  completion/notmuch-completion.zsh  |  3 +-
>>  doc/man1/notmuch-search.rst        | 38 +++++++++++++++
>>  notmuch-search.c                   | 98 +++++++++++++++++++++++++++++++++++---
>>  test/T090-search-output.sh         | 87 +++++++++++++++++++++++++++++++++
>>  test/T095-search-filter-by.sh      | 64 +++++++++++++++++++++++++
>>  6 files changed, 288 insertions(+), 8 deletions(-)
>>  create mode 100755 test/T095-search-filter-by.sh
>>
>> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
>> index cfbd389..6b6d43a 100644
>> --- a/completion/notmuch-completion.bash
>> +++ b/completion/notmuch-completion.bash
>> @@ -305,12 +305,16 @@ _notmuch_search()
>>  	    COMPREPLY=( $( compgen -W "true false flag all" -- "${cur}" ) )
>>  	    return
>>  	    ;;
>> +	--filter-by)
>> +	    COMPREPLY=( $( compgen -W "nameaddr name addr addrfold nameaddrfold" -- "${cur}" ) )
>> +	    return
>> +	    ;;
>>      esac
>>  
>>      ! $split &&
>>      case "${cur}" in
>>  	-*)
>> -	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate="
>> +	    local options="--format= --output= --sort= --offset= --limit= --exclude= --duplicate= --filter-by="
>>  	    compopt -o nospace
>>  	    COMPREPLY=( $(compgen -W "$options" -- ${cur}) )
>>  	    ;;
>> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
>> index 3e52a00..3e535df 100644
>> --- a/completion/notmuch-completion.zsh
>> +++ b/completion/notmuch-completion.zsh
>> @@ -53,7 +53,8 @@ _notmuch_search()
>>      '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
>>      '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
>>      '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
>> -    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))'
>> +    '--output=[select what to output]:output:((summary threads messages files tags sender recipients))' \
>> +    '--filter-by=[filter out duplicate addresses]:filter-by:((nameaddr\:"both name and address part" name\:"name part" addr\:"address part" addrfold\:"case-insensitive address part" nameaddrfold\:"name and case-insensitive address part"))'
>>  }
>>  
>>  _notmuch()
>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
>> index b6607c9..84af2da 100644
>> --- a/doc/man1/notmuch-search.rst
>> +++ b/doc/man1/notmuch-search.rst
>> @@ -85,6 +85,9 @@ Supported options for **search** include
>>              (--format=text0), as a JSON array (--format=json), or as
>>              an S-Expression list (--format=sexp).
>>  
>> +            Duplicate addresses are filtered out. Filtering can be
>> +            configured with the --filter-by option.
>> +
>>  	    Note: Searching for **sender** should be much faster than
>>  	    searching for **recipients**, because sender addresses are
>>  	    cached directly in the database whereas other addresses
>> @@ -151,6 +154,41 @@ Supported options for **search** include
>>          prefix. The prefix matches messages based on filenames. This
>>          option filters filenames of the matching messages.
>>  
>> +    ``--filter-by=``\ (**nameaddr**\ \|\ **name** \|\ **addr**\ \|\ **addrfold**\ \|\ **nameaddrfold**\)
>> +
>> +	Can be used with ``--output=sender`` or
>> +	``--output=recipients`` to filter out duplicate addresses. The
>> +	filtering algorithm receives a sequence of email addresses and
>> +	outputs the same sequence without the addresses that are
>> +	considered a duplicate of a previously output address. What is
>> +	considered a duplicate depends on how the two addresses are
>> +	compared and this can be controlled with the follwing flags:
>> +
>> +	**nameaddr** means that both name and address parts are
>> +	compared in case-sensitive manner. Therefore, all same looking
>> +	addresses strings are considered duplicate. This is the
>> +	default.
>> +
>> +	**name** means that only the name part is compared (in
>> +	case-sensitive manner). For example, the addresses "John Doe
>> +	<me@example.com>" and "John Doe <john@doe.name>" will be
>> +	considered duplicate.
>> +
>> +	**addr** means that only the address part is compared (in
>> +	case-sensitive manner). For example, the addresses "John Doe
>> +	<john@example.com>" and "Dr. John Doe <john@example.com>" will
>> +	be considered duplicate.
>> +
>> +	**addrfold** is like **addr**, but comparison is done in
>> +	canse-insensitive manner. For example, the addresses "John Doe
>> +	<john@example.com>" and "Dr. John Doe <JOHN@EXAMPLE.COM>" will
>> +	be considered duplicate.
>> +
>> +	**nameaddrfold** is like **nameaddr**, but address comparison
>> +	is done in canse-insensitive manner. For example, the
>> +	addresses "John Doe <john@example.com>" and "John Doe
>> +	<JOHN@EXAMPLE.COM>" will be considered duplicate.
>> +
>>  EXIT STATUS
>>  ===========
>>  
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index ce3bfb2..47aa979 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -34,6 +34,14 @@ typedef enum {
>>  
>>  #define OUTPUT_ADDRESS_FLAGS (OUTPUT_SENDER | OUTPUT_RECIPIENTS)
>>  
>> +typedef enum {
>> +    FILTER_BY_NAMEADDR = 0,
>> +    FILTER_BY_NAME,
>> +    FILTER_BY_ADDR,
>> +    FILTER_BY_ADDRFOLD,
>> +    FILTER_BY_NAMEADDRFOLD,
>> +} filter_by_t;
>> +
>>  typedef struct {
>>      sprinter_t *format;
>>      notmuch_query_t *query;
>> @@ -42,6 +50,7 @@ typedef struct {
>>      int offset;
>>      int limit;
>>      int dupe;
>> +    filter_by_t filter_by;
>>  } search_options_t;
>>  
>>  typedef struct {
>> @@ -229,6 +238,52 @@ do_search_threads (search_options_t *opt)
>>      return 0;
>>  }
>>  
>> +/* Returns TRUE iff name and/or addr is considered duplicite. */
>
> A triviality; duplicite should be duplicate
>
>> +static notmuch_bool_t
>> +check_duplicite (const search_options_t *opt, GHashTable *addrs, const char *name, const char *addr)
>
> I am not sure on style but maybe is_duplicate would be clearer?

OK

-Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses
  2014-10-30  8:52     ` Tomi Ollila
  2014-10-30  9:00       ` Mark Walters
@ 2014-10-30 21:42       ` Michal Sojka
  1 sibling, 0 replies; 16+ messages in thread
From: Michal Sojka @ 2014-10-30 21:42 UTC (permalink / raw)
  To: Tomi Ollila, Mark Walters, notmuch

On Thu, Oct 30 2014, Tomi Ollila wrote:
> On Thu, Oct 30 2014, Mark Walters <markwalters1009@gmail.com> wrote:
>
>> On Mon, 27 Oct 2014, Michal Sojka <sojkam1@fel.cvut.cz> wrote:
>>> This adds an algorithm to filter out duplicate addresses from address
>>> outputs (sender, receivers). The algorithm can be configured with
>>> --filter-by command line option.
>>>
>>> The code here is an extended version of a patch from Jani Nikula.
>>
>> Hi
>>
>> As this is getting into the more controversial bike shedding region I
>> wonder if it would be worth splitting this into 2 patches: the first
>> could do the default dedupe based on name/address and the second could
>> do add the filter-by options. 
>>
>> I think the default deduping is obviously worth doing but I am not sure
>> about the rest. In any case I think the default deduping could go in
>> pre-freeze but I would recommend the rest is left until after.
>
> I can agree with that, but there is one hard thing to resolve: 
> "naming things"(*)
>
> (*) http://martinfowler.com/bliki/TwoHardThings.html
>
> With all rest ignored (sorry no time to work on this in more detail now),
> this default deduping could be done with single argument '--unique'...

I would agree that --unique is slightly better than --filter-by, but I
don't see what is so attractive on having no deduplication at all.
Anyway, I'll keep it in v5 as it is now and we can add
--unique=no/none/whatever later.

-Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-10-30 21:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-27 14:50 [PATCH v4 0/6] notmuch search --output=sender/recipients Michal Sojka
2014-10-27 14:50 ` [PATCH v4 1/6] cli: search: Refactor passing of command line options Michal Sojka
2014-10-30  7:57   ` Mark Walters
2014-10-30 21:14     ` Michal Sojka
2014-10-27 14:50 ` [PATCH v4 2/6] cli: Add support for parsing keyword-flag arguments Michal Sojka
2014-10-27 14:50 ` [PATCH v4 3/6] cli: search: Convert --output to keyword-flag argument Michal Sojka
2014-10-27 14:50 ` [PATCH v4 4/6] cli: search: Add --output={sender,recipients} Michal Sojka
2014-10-30  8:10   ` Mark Walters
2014-10-27 14:50 ` [PATCH v4 5/6] cli: search: Add configurable way to filter out duplicate addresses Michal Sojka
2014-10-30  8:16   ` Mark Walters
2014-10-30  8:52     ` Tomi Ollila
2014-10-30  9:00       ` Mark Walters
2014-10-30 21:42       ` Michal Sojka
2014-10-30 21:34     ` Michal Sojka
2014-10-27 14:50 ` [PATCH v4 6/6] cli: search: Add --output=count Michal Sojka
2014-10-28 21:19 ` [PATCH v4 0/6] notmuch search --output=sender/recipients Tomi Ollila

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).