unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] lib: Add a new prefix "list" to the search-terms syntax
@ 2013-04-03 13:46 Alexey I. Froloff
  2013-04-06 11:54 ` David Bremner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey I. Froloff @ 2013-04-03 13:46 UTC (permalink / raw)
  To: notmuch; +Cc: Alexey I. Froloff

From: "Alexey I. Froloff" <raorn@raorn.name>

Add support for indexing and searching the message's List-Id header.
This is useful when matching all the messages belonging to a particular
mailing list.

Rework of the patch by Pablo Oliveira <pablo@sifflez.org>

Cc: Pablo Oliveira <pablo@sifflez.org>
Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
---
 lib/database.cc                 |  1 +
 lib/index.cc                    | 48 ++++++++++++++++++++++++++++++++++++++++-
 man/man7/notmuch-search-terms.7 |  8 +++++++
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 91d4329..9311505 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -214,6 +214,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= {
     { "to",			"XTO" },
     { "attachment",		"XATTACHMENT" },
     { "subject",		"XSUBJECT"},
+    { "list",			"XLIST"},
     { "folder",			"XFOLDER"}
 };
 
diff --git a/lib/index.cc b/lib/index.cc
index a2edd6d..d79bd95 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -304,6 +304,49 @@ _index_address_list (notmuch_message_t *message,
     }
 }
 
+static void
+_index_list_id (notmuch_message_t *message,
+               const char *list_id_header)
+{
+    const char *begin_list_id, *end_list_id;
+
+    if (list_id_header == NULL)
+	return;
+
+    /* RFC2919 says that the list-id is found at the end of the header
+     * and enclosed between angle brackets. If we cannot find a
+     * matching pair of brackets containing at least one character,
+     * we ignore the list id header. */
+    begin_list_id = strrchr (list_id_header, '<');
+    if (!begin_list_id)
+	return;
+
+    end_list_id = strrchr(begin_list_id, '>');
+    if (!end_list_id || (end_list_id - begin_list_id < 2))
+	return;
+
+    void *local = talloc_new (message);
+
+    /* We extract the list id between the angle brackets */
+    const char *list_id = talloc_strndup(local, begin_list_id + 1,
+					 end_list_id - begin_list_id - 1);
+
+    /* All the text before is the description of the list */
+    const char *description = talloc_strndup(local, list_id_header,
+					     begin_list_id - list_id_header);
+
+    /* Description may be RFC2047 encoded */
+    char *decoded_desc = g_mime_utils_header_decode_phrase(description);
+
+    _notmuch_message_gen_terms(message, "list", list_id);
+
+    if (decoded_desc)
+	_notmuch_message_gen_terms(message, "list", decoded_desc);
+
+    free(decoded_desc);
+    talloc_free (local);
+}
+
 /* Callback to generate terms for each mime part of a message. */
 static void
 _index_mime_part (notmuch_message_t *message,
@@ -432,7 +475,7 @@ _notmuch_message_index_file (notmuch_message_t *message,
     GMimeMessage *mime_message = NULL;
     InternetAddressList *addresses;
     FILE *file = NULL;
-    const char *from, *subject;
+    const char *from, *subject, *list_id;
     notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
     static int initialized = 0;
     char from_buf[5];
@@ -500,6 +543,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
     subject = g_mime_message_get_subject (mime_message);
     _notmuch_message_gen_terms (message, "subject", subject);
 
+    list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
+    _index_list_id (message, list_id);
+
     _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
 
   DONE:
diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
index eb417ba..9cae107 100644
--- a/man/man7/notmuch-search-terms.7
+++ b/man/man7/notmuch-search-terms.7
@@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
 
 	thread:<thread-id>
 
+	list:<list-id>
+
 	folder:<directory-path>
 
 	date:<since>..<until>
@@ -100,6 +102,12 @@ thread ID values can be seen in the first column of output from
 .B "notmuch search"
 
 The
+.BR list: ,
+is used to match mailing list ID of an email message \- contents of the
+List\-Id: header without the '<', '>' delimiters or decoded list
+description.
+
+The
 .B folder:
 prefix can be used to search for email message files that are
 contained within particular directories within the mail store. Only
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-03 13:46 [PATCH] lib: Add a new prefix "list" to the search-terms syntax Alexey I. Froloff
@ 2013-04-06 11:54 ` David Bremner
  2013-04-08 10:03   ` Alexey I. Froloff
  0 siblings, 1 reply; 13+ messages in thread
From: David Bremner @ 2013-04-06 11:54 UTC (permalink / raw)
  To: Alexey I. Froloff, notmuch

"Alexey I. Froloff" <raorn@raorn.name> writes:
> +
> +    /* Description may be RFC2047 encoded */
> +    char *decoded_desc = g_mime_utils_header_decode_phrase(description);

Surprisingly, the docs claim g_mime_utils_header_decode_phrase has no
error conditions, so I guess this is OK.

> +
> +    _notmuch_message_gen_terms(message, "list", list_id);
> +
> +    if (decoded_desc)
> +	_notmuch_message_gen_terms(message, "list", decoded_desc);
> 

On the other hand, _notmuch_message_gen_terms does return a status. I
agree that currently this status is not useful, but that could change in
the future.  I also agree that the existing code does the same thing in
a few places, but I think it's better not to introduce more.

We'll need a test or two before we introduce a core change.

Any objections to the list: syntax?  The only issue I see is that at
some point we will probably add a generic header search syntax, and this
implicitly says list-id is more important/common than other headers.

d

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-06 11:54 ` David Bremner
@ 2013-04-08 10:03   ` Alexey I. Froloff
  2013-04-08 21:56     ` David Bremner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey I. Froloff @ 2013-04-08 10:03 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Sat, Apr 06, 2013 at 08:54:59AM -0300, David Bremner wrote:
> On the other hand, _notmuch_message_gen_terms does return a status. I
> agree that currently this status is not useful, but that could change in
> the future.  I also agree that the existing code does the same thing in
> a few places, but I think it's better not to introduce more.
Well, this is an adaptation of earlier patch, posted in this list
some time ago.  Personally I see no reason in indexing list
description.

> Any objections to the list: syntax?  The only issue I see is that at
> some point we will probably add a generic header search syntax, and this
> implicitly says list-id is more important/common than other headers.
Actual list ID differs from List-Id header value.  I can't give
an example of other message header with similar syntax other from
From/To/Cc, but those headers already specially processed.

I also will be happy if I can just assign tag to a message using
list ID.

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-08 10:03   ` Alexey I. Froloff
@ 2013-04-08 21:56     ` David Bremner
  2013-04-09  8:30       ` Alexey I. Froloff
  0 siblings, 1 reply; 13+ messages in thread
From: David Bremner @ 2013-04-08 21:56 UTC (permalink / raw)
  To: Alexey I. Froloff; +Cc: notmuch

"Alexey I. Froloff" <raorn@raorn.name> writes:

> On Sat, Apr 06, 2013 at 08:54:59AM -0300, David Bremner wrote:
>> On the other hand, _notmuch_message_gen_terms does return a status. I
>> agree that currently this status is not useful, but that could change in
>> the future.  I also agree that the existing code does the same thing in
>> a few places, but I think it's better not to introduce more.
> Well, this is an adaptation of earlier patch, posted in this list
> some time ago.

Sure, no blame attaches. But somebody still needs to fix the patch or
convince us it doesn't need fixing. 

>  Personally I see no reason in indexing list description.

That's an independent question.  I guess there is the question of how
much overhead this introduces into 

>> Any objections to the list: syntax?  The only issue I see is that at
>> some point we will probably add a generic header search syntax, and this
>> implicitly says list-id is more important/common than other headers.
> Actual list ID differs from List-Id header value.  I can't give
> an example of other message header with similar syntax other from
> From/To/Cc, but those headers already specially processed.

OK, that part seems relatively convincing to me.

d

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-08 21:56     ` David Bremner
@ 2013-04-09  8:30       ` Alexey I. Froloff
  2013-04-09 23:16         ` Alexey I. Froloff
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey I. Froloff @ 2013-04-09  8:30 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

On Mon, Apr 08, 2013 at 06:56:26PM -0300, David Bremner wrote:
> Sure, no blame attaches. But somebody still needs to fix the patch or
> convince us it doesn't need fixing. 

According to git grep -C2 _notmuch_message_gen_terms, there are
seven calls to this function.  Returned status checked zero
times :-)

> >  Personally I see no reason in indexing list description.
> That's an independent question.  I guess there is the question of how
> much overhead this introduces into 

In general, one or six unique descriptions per list (wild guess
based on observation on my several long-running lists).
Depending on list age and list administrator sanity.  Indexing
names for From/To/Cc fields brings bigger overhead, I guess.

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-09  8:30       ` Alexey I. Froloff
@ 2013-04-09 23:16         ` Alexey I. Froloff
  2013-04-30  1:12           ` David Bremner
  2013-10-17 14:17           ` Jani Nikula
  0 siblings, 2 replies; 13+ messages in thread
From: Alexey I. Froloff @ 2013-04-09 23:16 UTC (permalink / raw)
  To: notmuch; +Cc: Alexey I. Froloff

From: "Alexey I. Froloff" <raorn@raorn.name>

Add support for indexing and searching the message's List-Id header.
This is useful when matching all the messages belonging to a particular
mailing list.

Rework of the patch by Pablo Oliveira <pablo@sifflez.org>

Differences from original patch:

The whole list ID indexed as boolean term, not split by words.
List description is not indexed at all.

Thanks to ojwb and amdragon from irc://irc.freenode.net/notmuch

Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
---
 lib/database.cc                 |  1 +
 lib/index.cc                    | 45 ++++++++++++++++++++++++++++++++++++++++-
 man/man7/notmuch-search-terms.7 |  8 ++++++++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/lib/database.cc b/lib/database.cc
index 91d4329..6313913 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -203,6 +203,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] = {
 };
 
 static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
+    { "list",			"XLIST"},
     { "thread",			"G" },
     { "tag",			"K" },
     { "is",			"K" },
diff --git a/lib/index.cc b/lib/index.cc
index a2edd6d..8b97ec3 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -304,6 +304,46 @@ _index_address_list (notmuch_message_t *message,
     }
 }
 
+static void
+_index_list_id (notmuch_message_t *message,
+               const char *list_id_header)
+{
+    const char *begin_list_id, *end_list_id;
+
+    if (list_id_header == NULL)
+	return;
+
+    /* RFC2919 says that the list-id is found at the end of the header
+     * and enclosed between angle brackets. If we cannot find a
+     * matching pair of brackets containing at least one character,
+     * we ignore the list id header. */
+    begin_list_id = strrchr (list_id_header, '<');
+    if (!begin_list_id) {
+	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
+	return;
+    }
+
+    end_list_id = strrchr(begin_list_id, '>');
+    if (!end_list_id || (end_list_id - begin_list_id < 2)) {
+	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
+	return;
+    }
+
+    void *local = talloc_new (message);
+
+    /* We extract the list id between the angle brackets */
+    const char *list_id = talloc_strndup (local, begin_list_id + 1,
+					  end_list_id - begin_list_id - 1);
+
+    /* _notmuch_message_add_term() may return
+     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
+     * this is not a reason to exit with error... */
+    if (_notmuch_message_add_term (message, "list", list_id))
+	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
+
+    talloc_free (local);
+}
+
 /* Callback to generate terms for each mime part of a message. */
 static void
 _index_mime_part (notmuch_message_t *message,
@@ -432,7 +472,7 @@ _notmuch_message_index_file (notmuch_message_t *message,
     GMimeMessage *mime_message = NULL;
     InternetAddressList *addresses;
     FILE *file = NULL;
-    const char *from, *subject;
+    const char *from, *subject, *list_id;
     notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
     static int initialized = 0;
     char from_buf[5];
@@ -500,6 +540,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
     subject = g_mime_message_get_subject (mime_message);
     _notmuch_message_gen_terms (message, "subject", subject);
 
+    list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
+    _index_list_id (message, list_id);
+
     _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
 
   DONE:
diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
index eb417ba..9cae107 100644
--- a/man/man7/notmuch-search-terms.7
+++ b/man/man7/notmuch-search-terms.7
@@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
 
 	thread:<thread-id>
 
+	list:<list-id>
+
 	folder:<directory-path>
 
 	date:<since>..<until>
@@ -100,6 +102,12 @@ thread ID values can be seen in the first column of output from
 .B "notmuch search"
 
 The
+.BR list: ,
+is used to match mailing list ID of an email message \- contents of the
+List\-Id: header without the '<', '>' delimiters or decoded list
+description.
+
+The
 .B folder:
 prefix can be used to search for email message files that are
 contained within particular directories within the mail store. Only
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-09 23:16         ` Alexey I. Froloff
@ 2013-04-30  1:12           ` David Bremner
  2013-04-30  9:52             ` Alexey I. Froloff
  2013-10-17 14:17           ` Jani Nikula
  1 sibling, 1 reply; 13+ messages in thread
From: David Bremner @ 2013-04-30  1:12 UTC (permalink / raw)
  To: Alexey I. Froloff, notmuch


Hi Alexey, 

Thanks for working on this. I think the boolean prefix version makes
more sense, and it seems to work OK. I have a few comments below

"Alexey I. Froloff" <raorn@raorn.name> writes:

> +    begin_list_id = strrchr (list_id_header, '<');
> +    if (!begin_list_id) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }

- I guess this should say "malformed". 

- I got about 1800 lines of such messages when indexing 280k
  messages. That might strike some people as excessive. On the otherhand
  I guess we need to re-think error reporting overall.

  What do you think about printing filename or message-id here its
  easier to double check that it is not a bug?

> +    end_list_id = strrchr(begin_list_id, '>');
> +    if (!end_list_id || (end_list_id - begin_list_id < 2)) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }
> +

Same comments here.

> +    void *local = talloc_new (message);
> +
> +    /* We extract the list id between the angle brackets */
> +    const char *list_id = talloc_strndup (local, begin_list_id + 1,
> +					  end_list_id - begin_list_id - 1);
> +
      we should handle ENOMEM here, I think.

> +    /* _notmuch_message_add_term() may return
> +     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
> +     * this is not a reason to exit with error... */
> +    if (_notmuch_message_add_term (message, "list", list_id))
> +	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);

This should say why the indexing failed.

Other than that:

- We need a couple tests for this code; tests/search should give some
  hints how to proceed.

- We need a patch for NEWS, explaining what people need to do take
  advantage of the new functionality.  I think that adding new prefixes
  to an existing database is OK, but I'd welcome confirmation.

BTW, my not too scientific tests show no detectable bloat in the
database, at least after running xapian-compact. I'd be curious what
other people report.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-30  1:12           ` David Bremner
@ 2013-04-30  9:52             ` Alexey I. Froloff
  2013-05-04  0:54               ` David Bremner
  0 siblings, 1 reply; 13+ messages in thread
From: Alexey I. Froloff @ 2013-04-30  9:52 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1943 bytes --]

On Mon, Apr 29, 2013 at 10:12:16PM -0300, David Bremner wrote:
> > +    begin_list_id = strrchr (list_id_header, '<');
> > +    if (!begin_list_id) {
> > +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> > +	return;
> > +    }
> - I guess this should say "malformed". 
My bad.  English is not my native language ;-)

> - I got about 1800 lines of such messages when indexing 280k
>   messages. That might strike some people as excessive. On the otherhand
>   I guess we need to re-think error reporting overall.
If I understand correctly, this code belongs to library and
should not print anything neither on stderr nor stdout.  OTOH,
surrounding functions do print messages on error, so I just did
as other do.

>   What do you think about printing filename or message-id here its
>   easier to double check that it is not a bug?
Giving Message-Id makes sense.

> > +    void *local = talloc_new (message);
>       we should handle ENOMEM here, I think.
There are 16 talloc_new() calls and ENOMEM is not handled
anywhere.

> > +    /* _notmuch_message_add_term() may return
> > +     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
> > +     * this is not a reason to exit with error... */
> > +    if (_notmuch_message_add_term (message, "list", list_id))
> > +	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
> This should say why the indexing failed.
There should be strerror-like function, then can give description
for a given status code.

> - We need a couple tests for this code; tests/search should give some
>   hints how to proceed.
OK

> - We need a patch for NEWS, explaining what people need to do take
>   advantage of the new functionality.  I think that adding new prefixes
>   to an existing database is OK, but I'd welcome confirmation.
OK

-- 
Regards,    --
Sir Raorn.   --- http://thousandsofhate.blogspot.com/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-30  9:52             ` Alexey I. Froloff
@ 2013-05-04  0:54               ` David Bremner
  0 siblings, 0 replies; 13+ messages in thread
From: David Bremner @ 2013-05-04  0:54 UTC (permalink / raw)
  To: Alexey I. Froloff; +Cc: notmuch

"Alexey I. Froloff" <raorn@raorn.name> writes:

>
>> > +    void *local = talloc_new (message);
>>       we should handle ENOMEM here, I think.
> There are 16 talloc_new() calls and ENOMEM is not handled
> anywhere.

This makes me a bit sad but I agree it's not your fault ;).
>
>> > +    /* _notmuch_message_add_term() may return
>> > +     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
>> > +     * this is not a reason to exit with error... */
>> > +    if (_notmuch_message_add_term (message, "list", list_id))
>> > +	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
>> This should say why the indexing failed.
> There should be strerror-like function, then can give description
> for a given status code.

There is, but only for the public status values. 

I guess the most correct thing would be to check for equality with
NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG, print an appropriate error message
if so, and otherwise use
notmuch_status_to_string(COERCE_STATUS(private_status))

d

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-04-09 23:16         ` Alexey I. Froloff
  2013-04-30  1:12           ` David Bremner
@ 2013-10-17 14:17           ` Jani Nikula
  2013-12-17 18:03             ` Kirill A. Shutemov
  1 sibling, 1 reply; 13+ messages in thread
From: Jani Nikula @ 2013-10-17 14:17 UTC (permalink / raw)
  To: Alexey I. Froloff, notmuch; +Cc: Alexey I. Froloff

On Wed, 10 Apr 2013, "Alexey I. Froloff" <raorn@raorn.name> wrote:
> From: "Alexey I. Froloff" <raorn@raorn.name>
>
> Add support for indexing and searching the message's List-Id header.
> This is useful when matching all the messages belonging to a particular
> mailing list.

There's an issue with our duplicate message-id handling that is likely
to cause confusion with List-Id: searches. If you receive several
duplicates of the same message (judged by the message-id), only the
first one of them gets indexed, and the rest are ignored. This means
that for messages you receive both directly and through a list, it will
be arbitrary whether the List-Id: gets indexed or not. Therefore a list:
search might not return all the messages you'd expect.

BR,
Jani.


> Rework of the patch by Pablo Oliveira <pablo@sifflez.org>
>
> Differences from original patch:
>
> The whole list ID indexed as boolean term, not split by words.
> List description is not indexed at all.
>
> Thanks to ojwb and amdragon from irc://irc.freenode.net/notmuch
>
> Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
> ---
>  lib/database.cc                 |  1 +
>  lib/index.cc                    | 45 ++++++++++++++++++++++++++++++++++++++++-
>  man/man7/notmuch-search-terms.7 |  8 ++++++++
>  3 files changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index 91d4329..6313913 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -203,6 +203,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] = {
>  };
>  
>  static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
> +    { "list",			"XLIST"},
>      { "thread",			"G" },
>      { "tag",			"K" },
>      { "is",			"K" },
> diff --git a/lib/index.cc b/lib/index.cc
> index a2edd6d..8b97ec3 100644
> --- a/lib/index.cc
> +++ b/lib/index.cc
> @@ -304,6 +304,46 @@ _index_address_list (notmuch_message_t *message,
>      }
>  }
>  
> +static void
> +_index_list_id (notmuch_message_t *message,
> +               const char *list_id_header)
> +{
> +    const char *begin_list_id, *end_list_id;
> +
> +    if (list_id_header == NULL)
> +	return;
> +
> +    /* RFC2919 says that the list-id is found at the end of the header
> +     * and enclosed between angle brackets. If we cannot find a
> +     * matching pair of brackets containing at least one character,
> +     * we ignore the list id header. */
> +    begin_list_id = strrchr (list_id_header, '<');
> +    if (!begin_list_id) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }
> +
> +    end_list_id = strrchr(begin_list_id, '>');
> +    if (!end_list_id || (end_list_id - begin_list_id < 2)) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }
> +
> +    void *local = talloc_new (message);
> +
> +    /* We extract the list id between the angle brackets */
> +    const char *list_id = talloc_strndup (local, begin_list_id + 1,
> +					  end_list_id - begin_list_id - 1);
> +
> +    /* _notmuch_message_add_term() may return
> +     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
> +     * this is not a reason to exit with error... */
> +    if (_notmuch_message_add_term (message, "list", list_id))
> +	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
> +
> +    talloc_free (local);
> +}
> +
>  /* Callback to generate terms for each mime part of a message. */
>  static void
>  _index_mime_part (notmuch_message_t *message,
> @@ -432,7 +472,7 @@ _notmuch_message_index_file (notmuch_message_t *message,
>      GMimeMessage *mime_message = NULL;
>      InternetAddressList *addresses;
>      FILE *file = NULL;
> -    const char *from, *subject;
> +    const char *from, *subject, *list_id;
>      notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
>      static int initialized = 0;
>      char from_buf[5];
> @@ -500,6 +540,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
>      subject = g_mime_message_get_subject (mime_message);
>      _notmuch_message_gen_terms (message, "subject", subject);
>  
> +    list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
> +    _index_list_id (message, list_id);
> +
>      _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
>  
>    DONE:
> diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
> index eb417ba..9cae107 100644
> --- a/man/man7/notmuch-search-terms.7
> +++ b/man/man7/notmuch-search-terms.7
> @@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
>  
>  	thread:<thread-id>
>  
> +	list:<list-id>
> +
>  	folder:<directory-path>
>  
>  	date:<since>..<until>
> @@ -100,6 +102,12 @@ thread ID values can be seen in the first column of output from
>  .B "notmuch search"
>  
>  The
> +.BR list: ,
> +is used to match mailing list ID of an email message \- contents of the
> +List\-Id: header without the '<', '>' delimiters or decoded list
> +description.
> +
> +The
>  .B folder:
>  prefix can be used to search for email message files that are
>  contained within particular directories within the mail store. Only
> -- 
> 1.8.1.4
>
> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-10-17 14:17           ` Jani Nikula
@ 2013-12-17 18:03             ` Kirill A. Shutemov
  2013-12-17 19:46               ` Kirill A. Shutemov
  2014-01-01 12:04               ` Jani Nikula
  0 siblings, 2 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2013-12-17 18:03 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch, Alexey I. Froloff

On Thu, Oct 17, 2013 at 05:17:00PM +0300, Jani Nikula wrote:
> On Wed, 10 Apr 2013, "Alexey I. Froloff" <raorn@raorn.name> wrote:
> > From: "Alexey I. Froloff" <raorn@raorn.name>
> >
> > Add support for indexing and searching the message's List-Id header.
> > This is useful when matching all the messages belonging to a particular
> > mailing list.
> 
> There's an issue with our duplicate message-id handling that is likely
> to cause confusion with List-Id: searches. If you receive several
> duplicates of the same message (judged by the message-id), only the
> first one of them gets indexed, and the rest are ignored. This means
> that for messages you receive both directly and through a list, it will
> be arbitrary whether the List-Id: gets indexed or not. Therefore a list:
> search might not return all the messages you'd expect.

I've tried to address this. The patch also adds few tests for the feature.

There's still missing functionality: re-indexing existing messages for
list-id, handling message removal, etc.

Any comments?

diff --git a/lib/database.cc b/lib/database.cc
index f395061e3a73..196243e15d1a 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -205,6 +205,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] = {
 };
 
 static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
+    { "list",			"XLIST"},
     { "thread",			"G" },
     { "tag",			"K" },
     { "is",			"K" },
@@ -2025,10 +2026,13 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
 	    date = notmuch_message_file_get_header (message_file, "date");
 	    _notmuch_message_set_header_values (message, date, from, subject);
 
-	    ret = _notmuch_message_index_file (message, filename);
+	    ret = _notmuch_message_index_file (message, filename, false);
 	    if (ret)
 		goto DONE;
 	} else {
+	    ret = _notmuch_message_index_file (message, filename, true);
+	    if (ret)
+		goto DONE;
 	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
 	}
 
diff --git a/lib/index.cc b/lib/index.cc
index 78c18cf36d10..9fe1ad6502ed 100644
--- a/lib/index.cc
+++ b/lib/index.cc
@@ -304,6 +304,47 @@ _index_address_list (notmuch_message_t *message,
     }
 }
 
+static void
+_index_list_id (notmuch_message_t *message,
+               const char *list_id_header)
+{
+    const char *begin_list_id, *end_list_id, *list_id;
+    void *local;
+
+    if (list_id_header == NULL)
+	return;
+
+    /* RFC2919 says that the list-id is found at the end of the header
+     * and enclosed between angle brackets. If we cannot find a
+     * matching pair of brackets containing at least one character,
+     * we ignore the list id header. */
+    begin_list_id = strrchr (list_id_header, '<');
+    if (!begin_list_id) {
+	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
+	return;
+    }
+
+    end_list_id = strrchr(begin_list_id, '>');
+    if (!end_list_id || (end_list_id - begin_list_id < 2)) {
+	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
+	return;
+    }
+
+    local = talloc_new (message);
+
+    /* We extract the list id between the angle brackets */
+    list_id = talloc_strndup (local, begin_list_id + 1,
+			      end_list_id - begin_list_id - 1);
+
+    /* _notmuch_message_add_term() may return
+     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
+     * this is not a reason to exit with error... */
+    if (_notmuch_message_add_term (message, "list", list_id))
+	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
+
+    talloc_free (local);
+}
+
 /* Callback to generate terms for each mime part of a message. */
 static void
 _index_mime_part (notmuch_message_t *message,
@@ -425,14 +466,15 @@ _index_mime_part (notmuch_message_t *message,
 
 notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
-			     const char *filename)
+			     const char *filename,
+			     notmuch_bool_t duplicate)
 {
     GMimeStream *stream = NULL;
     GMimeParser *parser = NULL;
     GMimeMessage *mime_message = NULL;
     InternetAddressList *addresses;
     FILE *file = NULL;
-    const char *from, *subject;
+    const char *from, *subject, *list_id;
     notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
     static int initialized = 0;
     char from_buf[5];
@@ -485,6 +527,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
 
     from = g_mime_message_get_sender (mime_message);
 
+    if (duplicate)
+	goto DUP;
+
     addresses = internet_address_list_parse_string (from);
     if (addresses) {
 	_index_address_list (message, "from", addresses);
@@ -502,6 +547,10 @@ mboxes is deprecated and may be removed in the future.\n", filename);
 
     _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
 
+  DUP:
+    list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
+    _index_list_id (message, list_id);
+
   DONE:
     if (mime_message)
 	g_object_unref (mime_message);
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index af185c7c5ba8..138dfa58efc8 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -322,7 +322,8 @@ notmuch_message_get_author (notmuch_message_t *message);
 
 notmuch_status_t
 _notmuch_message_index_file (notmuch_message_t *message,
-			     const char *filename);
+			     const char *filename,
+			     notmuch_bool_t duplicate);
 
 /* message-file.c */
 
diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
index f1627b3488f8..29b30b7b0b00 100644
--- a/man/man7/notmuch-search-terms.7
+++ b/man/man7/notmuch-search-terms.7
@@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
 
 	thread:<thread-id>
 
+	list:<list-id>
+
 	folder:<directory-path>
 
 	date:<since>..<until>
@@ -109,6 +111,12 @@ within a matching directory. Only the directory components below the
 top-level mail database path are available to be searched.
 
 The
+.BR list: ,
+is used to match mailing list ID of an email message \- contents of the
+List\-Id: header without the '<', '>' delimiters or decoded list
+description.
+
+The
 .B date:
 prefix can be used to restrict the results to only messages within a
 particular time range (based on the Date: header) with a range syntax
diff --git a/test/corpus/cur/18:2, b/test/corpus/cur/18:2,
index f522f69eb933..2b54925bd5d1 100644
--- a/test/corpus/cur/18:2,
+++ b/test/corpus/cur/18:2,
@@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
 Date: Tue, 17 Nov 2009 18:21:38 -0500
 Subject: [notmuch] archive
 Message-ID: <20091117232137.GA7669@griffis1.net>
+List-Id: <test1.example.com>
 
 Just subscribed, I'd like to catch up on the previous postings,
 but the archive link seems to be bogus?
diff --git a/test/corpus/cur/51:2, b/test/corpus/cur/51:2,
index f522f69eb933..b155e6ee64a5 100644
--- a/test/corpus/cur/51:2,
+++ b/test/corpus/cur/51:2,
@@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
 Date: Tue, 17 Nov 2009 18:21:38 -0500
 Subject: [notmuch] archive
 Message-ID: <20091117232137.GA7669@griffis1.net>
+List-Id: <test2.example.com>
 
 Just subscribed, I'd like to catch up on the previous postings,
 but the archive link seems to be bogus?
diff --git a/test/search b/test/search
index a7a0b18d2e48..bef42971226c 100755
--- a/test/search
+++ b/test/search
@@ -129,4 +129,28 @@ add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12
 output=$(notmuch search "bödý" | notmuch_search_sanitize)
 test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)"
 
+test_begin_subtest "Search by List-Id"
+notmuch search list:notmuch.notmuchmail.org | notmuch_search_sanitize > OUTPUT
+cat <<EOF >EXPECTED
+thread:XXX   2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] "notmuch help" outputs to stderr? (attachment inbox signed unread)
+thread:XXX   2009-11-18 [4/7] Lars Kellogg-Stedman, Mikhail Gusarov| Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
+thread:XXX   2009-11-18 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)
+thread:XXX   2009-11-17 [1/3] Adrian Perez de Castro| Keith Packard, Carl Worth; [notmuch] Introducing myself (inbox signed unread)
+thread:XXX   2009-11-17 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "Search by List-Id, duplicated messages, step 1"
+notmuch search list:test1.example.com | notmuch_search_sanitize > OUTPUT
+cat <<EOF >EXPECTED
+thread:XXX   2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
+EOF
+test_expect_equal_file OUTPUT EXPECTED
+
+test_begin_subtest "Search by List-Id, duplicated messages, step 2"
+notmuch search list:test2.example.com | notmuch_search_sanitize > OUTPUT
+cat <<EOF >EXPECTED
+thread:XXX   2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
+EOF
+test_expect_equal_file OUTPUT EXPECTED
 test_done
diff --git a/test/test-lib.sh b/test/test-lib.sh
index d8e0d9115a69..981bde4a4004 100644
--- a/test/test-lib.sh
+++ b/test/test-lib.sh
@@ -576,9 +576,9 @@ test_expect_equal_json () {
     # The test suite forces LC_ALL=C, but this causes Python 3 to
     # decode stdin as ASCII.  We need to read JSON in UTF-8, so
     # override Python's stdio encoding defaults.
-    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python -mjson.tool \
+    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
         || echo "$1")
-    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python -mjson.tool \
+    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
         || echo "$2")
     shift 2
     test_expect_equal "$output" "$expected" "$@"
-- 
 Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-12-17 18:03             ` Kirill A. Shutemov
@ 2013-12-17 19:46               ` Kirill A. Shutemov
  2014-01-01 12:04               ` Jani Nikula
  1 sibling, 0 replies; 13+ messages in thread
From: Kirill A. Shutemov @ 2013-12-17 19:46 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch, Alexey I. Froloff

On Tue, Dec 17, 2013 at 08:03:22PM +0200, Kirill A. Shutemov wrote:
> diff --git a/test/test-lib.sh b/test/test-lib.sh
> index d8e0d9115a69..981bde4a4004 100644
> --- a/test/test-lib.sh
> +++ b/test/test-lib.sh
> @@ -576,9 +576,9 @@ test_expect_equal_json () {
>      # The test suite forces LC_ALL=C, but this causes Python 3 to
>      # decode stdin as ASCII.  We need to read JSON in UTF-8, so
>      # override Python's stdio encoding defaults.
> -    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python -mjson.tool \
> +    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
>          || echo "$1")
> -    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python -mjson.tool \
> +    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
>          || echo "$2")
>      shift 2
>      test_expect_equal "$output" "$expected" "$@"

This part is not relevant.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] lib: Add a new prefix "list" to the search-terms syntax
  2013-12-17 18:03             ` Kirill A. Shutemov
  2013-12-17 19:46               ` Kirill A. Shutemov
@ 2014-01-01 12:04               ` Jani Nikula
  1 sibling, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2014-01-01 12:04 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: notmuch, Alexey I. Froloff

On Tue, 17 Dec 2013, "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> On Thu, Oct 17, 2013 at 05:17:00PM +0300, Jani Nikula wrote:
>> On Wed, 10 Apr 2013, "Alexey I. Froloff" <raorn@raorn.name> wrote:
>> > From: "Alexey I. Froloff" <raorn@raorn.name>
>> >
>> > Add support for indexing and searching the message's List-Id header.
>> > This is useful when matching all the messages belonging to a particular
>> > mailing list.
>> 
>> There's an issue with our duplicate message-id handling that is likely
>> to cause confusion with List-Id: searches. If you receive several
>> duplicates of the same message (judged by the message-id), only the
>> first one of them gets indexed, and the rest are ignored. This means
>> that for messages you receive both directly and through a list, it will
>> be arbitrary whether the List-Id: gets indexed or not. Therefore a list:
>> search might not return all the messages you'd expect.
>
> I've tried to address this. The patch also adds few tests for the feature.
>
> There's still missing functionality: re-indexing existing messages for
> list-id, handling message removal, etc.
>
> Any comments?

Hi Kirill, sorry it took me so long to get to this!

I've looked into our duplicate message-id handling and indexing before,
and it's not very good.

First, we should pay more attention to checking whether the messages
really are duplicates or not. This is not trivial, but we should go a
bit further than just comparing the message-ids. Sadly, handling the
case of colliding message-ids on clearly different messages is not
trivial either, as we rely on the message-ids being unique all around.

Second, we should be more clever about indexing duplicates that we think
are the same message. This is orthogonal to the first point. Currently,
only the first duplicate gets indexed, and will remain indexed even if
it's deleted and other copies remain. A message that matches a search
might end up not having the matching search terms, for example. A
rebuild of the database might index a different duplicate from the last
time.

Having said that (partially just to write the thoughts down somewhere!),
I think your basic approach of indexing the list-id for duplicates is
sane, and we can grow more smarts to _notmuch_message_index_file() for
duplicate == true in the future, checking more headers etc. One thing I
wonder about though: what if more than one duplicate has list-id, and
_index_list_id() gets called multiple times on a message? (CC Austin, he
probably has more clues on this than me.)

For merging, you should also address the previous comments to the
original patch. There's been plenty of dropping the ball here it
seems... I think we've also agreed (perhaps only on IRC, I forget) that
we should use "listid" as the prefix, not "list" (sadly hyphens are not
allowed). Splitting the patch to code, test, and man parts might be a
good idea too.

BR,
Jani.


>
> diff --git a/lib/database.cc b/lib/database.cc
> index f395061e3a73..196243e15d1a 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -205,6 +205,7 @@ static prefix_t BOOLEAN_PREFIX_INTERNAL[] = {
>  };
>  
>  static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
> +    { "list",			"XLIST"},
>      { "thread",			"G" },
>      { "tag",			"K" },
>      { "is",			"K" },
> @@ -2025,10 +2026,13 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
>  	    date = notmuch_message_file_get_header (message_file, "date");
>  	    _notmuch_message_set_header_values (message, date, from, subject);
>  
> -	    ret = _notmuch_message_index_file (message, filename);
> +	    ret = _notmuch_message_index_file (message, filename, false);
>  	    if (ret)
>  		goto DONE;
>  	} else {
> +	    ret = _notmuch_message_index_file (message, filename, true);
> +	    if (ret)
> +		goto DONE;
>  	    ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
>  	}
>  
> diff --git a/lib/index.cc b/lib/index.cc
> index 78c18cf36d10..9fe1ad6502ed 100644
> --- a/lib/index.cc
> +++ b/lib/index.cc
> @@ -304,6 +304,47 @@ _index_address_list (notmuch_message_t *message,
>      }
>  }
>  
> +static void
> +_index_list_id (notmuch_message_t *message,
> +               const char *list_id_header)
> +{
> +    const char *begin_list_id, *end_list_id, *list_id;
> +    void *local;
> +
> +    if (list_id_header == NULL)
> +	return;
> +
> +    /* RFC2919 says that the list-id is found at the end of the header
> +     * and enclosed between angle brackets. If we cannot find a
> +     * matching pair of brackets containing at least one character,
> +     * we ignore the list id header. */
> +    begin_list_id = strrchr (list_id_header, '<');
> +    if (!begin_list_id) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }
> +
> +    end_list_id = strrchr(begin_list_id, '>');
> +    if (!end_list_id || (end_list_id - begin_list_id < 2)) {
> +	fprintf (stderr, "Warning: Not indexing mailformed List-Id tag.\n");
> +	return;
> +    }
> +
> +    local = talloc_new (message);
> +
> +    /* We extract the list id between the angle brackets */
> +    list_id = talloc_strndup (local, begin_list_id + 1,
> +			      end_list_id - begin_list_id - 1);
> +
> +    /* _notmuch_message_add_term() may return
> +     * NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG here.  We can't fix it, but
> +     * this is not a reason to exit with error... */
> +    if (_notmuch_message_add_term (message, "list", list_id))
> +	fprintf (stderr, "Warning: Not indexing List-Id: <%s>\n", list_id);
> +
> +    talloc_free (local);
> +}
> +
>  /* Callback to generate terms for each mime part of a message. */
>  static void
>  _index_mime_part (notmuch_message_t *message,
> @@ -425,14 +466,15 @@ _index_mime_part (notmuch_message_t *message,
>  
>  notmuch_status_t
>  _notmuch_message_index_file (notmuch_message_t *message,
> -			     const char *filename)
> +			     const char *filename,
> +			     notmuch_bool_t duplicate)
>  {
>      GMimeStream *stream = NULL;
>      GMimeParser *parser = NULL;
>      GMimeMessage *mime_message = NULL;
>      InternetAddressList *addresses;
>      FILE *file = NULL;
> -    const char *from, *subject;
> +    const char *from, *subject, *list_id;
>      notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
>      static int initialized = 0;
>      char from_buf[5];
> @@ -485,6 +527,9 @@ mboxes is deprecated and may be removed in the future.\n", filename);
>  
>      from = g_mime_message_get_sender (mime_message);
>  
> +    if (duplicate)
> +	goto DUP;
> +
>      addresses = internet_address_list_parse_string (from);
>      if (addresses) {
>  	_index_address_list (message, "from", addresses);
> @@ -502,6 +547,10 @@ mboxes is deprecated and may be removed in the future.\n", filename);
>  
>      _index_mime_part (message, g_mime_message_get_mime_part (mime_message));
>  
> +  DUP:
> +    list_id = g_mime_object_get_header (GMIME_OBJECT (mime_message), "List-Id");
> +    _index_list_id (message, list_id);
> +
>    DONE:
>      if (mime_message)
>  	g_object_unref (mime_message);
> diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> index af185c7c5ba8..138dfa58efc8 100644
> --- a/lib/notmuch-private.h
> +++ b/lib/notmuch-private.h
> @@ -322,7 +322,8 @@ notmuch_message_get_author (notmuch_message_t *message);
>  
>  notmuch_status_t
>  _notmuch_message_index_file (notmuch_message_t *message,
> -			     const char *filename);
> +			     const char *filename,
> +			     notmuch_bool_t duplicate);
>  
>  /* message-file.c */
>  
> diff --git a/man/man7/notmuch-search-terms.7 b/man/man7/notmuch-search-terms.7
> index f1627b3488f8..29b30b7b0b00 100644
> --- a/man/man7/notmuch-search-terms.7
> +++ b/man/man7/notmuch-search-terms.7
> @@ -52,6 +52,8 @@ terms to match against specific portions of an email, (where
>  
>  	thread:<thread-id>
>  
> +	list:<list-id>
> +
>  	folder:<directory-path>
>  
>  	date:<since>..<until>
> @@ -109,6 +111,12 @@ within a matching directory. Only the directory components below the
>  top-level mail database path are available to be searched.
>  
>  The
> +.BR list: ,
> +is used to match mailing list ID of an email message \- contents of the
> +List\-Id: header without the '<', '>' delimiters or decoded list
> +description.
> +
> +The
>  .B date:
>  prefix can be used to restrict the results to only messages within a
>  particular time range (based on the Date: header) with a range syntax
> diff --git a/test/corpus/cur/18:2, b/test/corpus/cur/18:2,
> index f522f69eb933..2b54925bd5d1 100644
> --- a/test/corpus/cur/18:2,
> +++ b/test/corpus/cur/18:2,
> @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
>  Date: Tue, 17 Nov 2009 18:21:38 -0500
>  Subject: [notmuch] archive
>  Message-ID: <20091117232137.GA7669@griffis1.net>
> +List-Id: <test1.example.com>
>  
>  Just subscribed, I'd like to catch up on the previous postings,
>  but the archive link seems to be bogus?
> diff --git a/test/corpus/cur/51:2, b/test/corpus/cur/51:2,
> index f522f69eb933..b155e6ee64a5 100644
> --- a/test/corpus/cur/51:2,
> +++ b/test/corpus/cur/51:2,
> @@ -3,6 +3,7 @@ To: notmuch@notmuchmail.org
>  Date: Tue, 17 Nov 2009 18:21:38 -0500
>  Subject: [notmuch] archive
>  Message-ID: <20091117232137.GA7669@griffis1.net>
> +List-Id: <test2.example.com>
>  
>  Just subscribed, I'd like to catch up on the previous postings,
>  but the archive link seems to be bogus?
> diff --git a/test/search b/test/search
> index a7a0b18d2e48..bef42971226c 100755
> --- a/test/search
> +++ b/test/search
> @@ -129,4 +129,28 @@ add_message '[subject]="utf8-message-body-subject"' '[date]="Sat, 01 Jan 2000 12
>  output=$(notmuch search "bödý" | notmuch_search_sanitize)
>  test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; utf8-message-body-subject (inbox unread)"
>  
> +test_begin_subtest "Search by List-Id"
> +notmuch search list:notmuch.notmuchmail.org | notmuch_search_sanitize > OUTPUT
> +cat <<EOF >EXPECTED
> +thread:XXX   2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] "notmuch help" outputs to stderr? (attachment inbox signed unread)
> +thread:XXX   2009-11-18 [4/7] Lars Kellogg-Stedman, Mikhail Gusarov| Keith Packard, Carl Worth; [notmuch] Working with Maildir storage? (inbox signed unread)
> +thread:XXX   2009-11-18 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] [PATCH] Error out if no query is supplied to search instead of going into an infinite loop (attachment inbox unread)
> +thread:XXX   2009-11-17 [1/3] Adrian Perez de Castro| Keith Packard, Carl Worth; [notmuch] Introducing myself (inbox signed unread)
> +thread:XXX   2009-11-17 [1/2] Alex Botero-Lowry| Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "Search by List-Id, duplicated messages, step 1"
> +notmuch search list:test1.example.com | notmuch_search_sanitize > OUTPUT
> +cat <<EOF >EXPECTED
> +thread:XXX   2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
> +
> +test_begin_subtest "Search by List-Id, duplicated messages, step 2"
> +notmuch search list:test2.example.com | notmuch_search_sanitize > OUTPUT
> +cat <<EOF >EXPECTED
> +thread:XXX   2009-11-17 [1/3] Aron Griffis| Keith Packard, Carl Worth; [notmuch] archive (inbox unread)
> +EOF
> +test_expect_equal_file OUTPUT EXPECTED
>  test_done
> diff --git a/test/test-lib.sh b/test/test-lib.sh
> index d8e0d9115a69..981bde4a4004 100644
> --- a/test/test-lib.sh
> +++ b/test/test-lib.sh
> @@ -576,9 +576,9 @@ test_expect_equal_json () {
>      # The test suite forces LC_ALL=C, but this causes Python 3 to
>      # decode stdin as ASCII.  We need to read JSON in UTF-8, so
>      # override Python's stdio encoding defaults.
> -    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python -mjson.tool \
> +    output=$(echo "$1" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
>          || echo "$1")
> -    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python -mjson.tool \
> +    expected=$(echo "$2" | PYTHONIOENCODING=utf-8 python2 -mjson.tool \
>          || echo "$2")
>      shift 2
>      test_expect_equal "$output" "$expected" "$@"
> -- 
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-01-01 12:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-03 13:46 [PATCH] lib: Add a new prefix "list" to the search-terms syntax Alexey I. Froloff
2013-04-06 11:54 ` David Bremner
2013-04-08 10:03   ` Alexey I. Froloff
2013-04-08 21:56     ` David Bremner
2013-04-09  8:30       ` Alexey I. Froloff
2013-04-09 23:16         ` Alexey I. Froloff
2013-04-30  1:12           ` David Bremner
2013-04-30  9:52             ` Alexey I. Froloff
2013-05-04  0:54               ` David Bremner
2013-10-17 14:17           ` Jani Nikula
2013-12-17 18:03             ` Kirill A. Shutemov
2013-12-17 19:46               ` Kirill A. Shutemov
2014-01-01 12:04               ` Jani Nikula

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).