unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] Add a few tests for searching LWN emails.
@ 2011-01-27 10:31 Thomas Schwinge
  2011-01-28 19:59 ` Carl Worth
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Schwinge @ 2011-01-27 10:31 UTC (permalink / raw)
  To: notmuch, notmuch; +Cc: Thomas Schwinge

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4590 bytes --]

These tests should pass -- but they currently don't.

Signed-off-by: Thomas Schwinge <thomas@schwinge.name>

---

Hallo!

I reported this on IRC some weeks ago; here is a more elaborate report.

What we get from these emails, is an author named ``LWN.net'', and the
``Weekly Notification'' / ``Mailing Lists'' bits are stripped away.  I
suspect this may be a misinterpretation in the notmuch address parser,
related to the dot in the name.  I have not yet looked at the relevant
code.

The same problem might exist for the To: parser, which I have not yet
checked.


Grüße,
 Thomas

 test/search |   36 ++++++++++++++++++++++++++++++++++++
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/test/search b/test/search
index b180c7f..2ae5640 100755
--- a/test/search
+++ b/test/search
@@ -3,6 +3,12 @@ test_description='"notmuch search" in several variations'
 . ./test-lib.sh
 
 add_email_corpus
+add_message \
+  '[from]="LWN.net Weekly Notification <lwn@lwn.net>"' \
+  '[subject]="LWN.net Weekly Edition for January 27, 2011 available"'
+add_message \
+  '[from]="LWN.net Mailing Lists <lwn@lwn.net>"' \
+  '[subject]="LWN.net newly freed content for January 27, 2011"'
 
 test_begin_subtest "Search body"
 add_message '[subject]="body search"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' [body]=bodysearchtest
@@ -57,6 +63,34 @@ add_message '[subject]="search by from (name)"' '[date]="Sat, 01 Jan 2000 12:00:
 output=$(notmuch search from:"Search By From Name" | notmuch_search_sanitize)
 test_expect_equal "$output" "thread:XXX   2000-01-01 [1/1] Search By From Name; search by from (name) (inbox unread)"
 
+test_begin_subtest "LWN, I:"
+output=$(notmuch search from:'lwn.net weekly notification' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Weekly Notification; LWN.net Weekly Edition for January 27, 2011 available (inbox unread)"
+
+test_begin_subtest "LWN, II:"
+output=$(notmuch search from:'lwn.net mailing lists' | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Mailing Lists; LWN.net newly freed content for January 27, 2011 (inbox unread)"
+
+test_begin_subtest "LWN, III:"
+output=$(notmuch search from:lwn and from:weekly | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Weekly Notification; LWN.net Weekly Edition for January 27, 2011 available (inbox unread)"
+
+test_begin_subtest "LWN, IV:"
+output=$(notmuch search from:lwn and from:mailing | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Mailing Lists; LWN.net newly freed content for January 27, 2011 (inbox unread)"
+
+test_begin_subtest "LWN, V:"
+output=$(notmuch search from:lwn@lwn.net and subject:weekly | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Weekly Notification; LWN.net Weekly Edition for January 27, 2011 available (inbox unread)"
+
+test_begin_subtest "LWN, VI:"
+output=$(notmuch search from:lwn@lwn.net and subject:mailing | notmuch_search_sanitize)
+test_expect_equal "$output" "thread:XXX   2001-01-05 [1/1] LWN.net Mailing Lists; LWN.net newly freed content for January 27, 2011 (inbox unread)"
+
+test_begin_subtest "LWN, VII:"
+output=$(notmuch count from:lwn@lwn.net)
+test_expect_equal "$output" 2
+
 test_begin_subtest "Search by to: (address)"
 add_message '[subject]="search by to (address)"' '[date]="Sat, 01 Jan 2000 12:00:00 -0000"' [to]=searchbyto@example.com
 output=$(notmuch search to:searchbyto@example.com | notmuch_search_sanitize)
@@ -97,6 +131,8 @@ thread:XXX   2009-11-18 [1/1] Stewart Smith; [notmuch] [PATCH] Fix linking with
 thread:XXX   2009-11-18 [2/2] Lars Kellogg-Stedman; [notmuch] \"notmuch help\" outputs to stderr? (attachment inbox unread)
 thread:XXX   2009-11-17 [1/1] Mikhail Gusarov; [notmuch] [PATCH] Handle rename of message file (inbox unread)
 thread:XXX   2009-11-17 [2/2] Alex Botero-Lowry, Carl Worth; [notmuch] preliminary FreeBSD support (attachment inbox unread)
+thread:XXX   2001-01-05 [1/1] LWN.net Weekly Notification; LWN.net Weekly Edition for January 27, 2011 available (inbox unread)
+thread:XXX   2001-01-05 [1/1] LWN.net Mailing Lists; LWN.net newly freed content for January 27, 2011 (inbox unread)
 thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; body search (inbox unread)
 thread:XXX   2000-01-01 [1/1] searchbyfrom; search by from (inbox unread)
 thread:XXX   2000-01-01 [1/1] Notmuch Test Suite; search by to (inbox unread)
-- 
tg: (74cb76a..) t/from-lwn (depends on: master)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-27 10:31 [PATCH] Add a few tests for searching LWN emails Thomas Schwinge
@ 2011-01-28 19:59 ` Carl Worth
  2011-01-28 22:17   ` Jake Edge
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Carl Worth @ 2011-01-28 19:59 UTC (permalink / raw)
  To: Thomas Schwinge, notmuch; +Cc: lwn, Thomas Schwinge

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Thu, 27 Jan 2011 03:31:49 -0700, Thomas Schwinge <thomas@schwinge.name> wrote:
> These tests should pass -- but they currently don't.

Hi Thomas,

Thanks for sending these test cases. This is actually my favorite way to
receive bug reports. I really appreciate it!

> What we get from these emails, is an author named ``LWN.net'', and the
> ``Weekly Notification'' / ``Mailing Lists'' bits are stripped away.  I
> suspect this may be a misinterpretation in the notmuch address parser,
> related to the dot in the name.  I have not yet looked at the relevant
> code.

Yes, I believe this is related to the dot in the name. From my
recollection a name with an address requires quoting. So the header that
is currently formatted as:

	From: LWN.net Weekly Notification <lwn@lwn.net>

should instead be:

	From: "LWN.net Weekly Notification" <lwn@lwn.net>

I verified that adding this quoting fixes the tests[*]. I've CCed
lwn@lwn.net in case the kind editors there would like to add quoting
here.

I don't have any of the relevant RFCs in front of me now, so I don't
know exactly how an address with this missing quoting should be
parsed. But I recall having failures trying to send mail to an address
like this formatted without the quoting.

So I'm not sure what could reasonably be changed here in GMime or not. I
definitely do want to fix notmuch so that it indexes all of the text
here regardless if it's formatted in an RFC-compliant way or not.

-Carl

[*] Except for test VI which has a bug in that it searches for the word
"mailing" in the subject header, but no such word exists in the
message's subject header.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-28 19:59 ` Carl Worth
@ 2011-01-28 22:17   ` Jake Edge
  2011-01-29 12:37     ` Chris Wilson
  2011-01-29 13:45   ` Matthieu Lemerre
  2011-01-30 11:01   ` Thomas Schwinge
  2 siblings, 1 reply; 8+ messages in thread
From: Jake Edge @ 2011-01-28 22:17 UTC (permalink / raw)
  To: Carl Worth; +Cc: lwn, notmuch, Thomas Schwinge

Hi Carl and Thomas, 

On Sat, 29 Jan 2011 05:59:38 +1000 Carl Worth wrote:

> Yes, I believe this is related to the dot in the name. From my
> recollection a name with an address requires quoting. So the header
> that is currently formatted as:
> 
> 	From: LWN.net Weekly Notification <lwn@lwn.net>
> 
> should instead be:
> 
> 	From: "LWN.net Weekly Notification" <lwn@lwn.net>

I am by no means an expert, but http://www.ietf.org/rfc/rfc2821.txt
would seem to indicate that names with a '.' in them don't need to be
quoted as there are several lines in the Scenarios section that look
like:

C: From: John Q. Public <JQP@bar.com>

unless the problem is XXX.yyy (i.e. no spaces on either side of the
'.'), but that seems like a pretty arbitrary differentiator (i.e. 'Q. '
is fine, but 'LWN.net' isn't)

aren't the '<' and '>' the real delimiters here?

If we do need to fix something, though, we'd be more than happy to do
so I suspect ...

jake

-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-28 22:17   ` Jake Edge
@ 2011-01-29 12:37     ` Chris Wilson
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2011-01-29 12:37 UTC (permalink / raw)
  To: Jake Edge, Carl Worth; +Cc: lwn, notmuch, Thomas Schwinge

On Fri, 28 Jan 2011 15:17:54 -0700, Jake Edge <jake@lwn.net> wrote:
> Hi Carl and Thomas, 
> 
> On Sat, 29 Jan 2011 05:59:38 +1000 Carl Worth wrote:
> 
> > Yes, I believe this is related to the dot in the name. From my
> > recollection a name with an address requires quoting. So the header
> > that is currently formatted as:
> > 
> > 	From: LWN.net Weekly Notification <lwn@lwn.net>
> > 
> > should instead be:
> > 
> > 	From: "LWN.net Weekly Notification" <lwn@lwn.net>
> 
> I am by no means an expert, but http://www.ietf.org/rfc/rfc2821.txt
> would seem to indicate that names with a '.' in them don't need to be
> quoted as there are several lines in the Scenarios section that look
> like:
> 
> C: From: John Q. Public <JQP@bar.com>
> 
> unless the problem is XXX.yyy (i.e. no spaces on either side of the
> '.'), but that seems like a pretty arbitrary differentiator (i.e. 'Q. '
> is fine, but 'LWN.net' isn't)

The syntax is defined in http://www.faqs.org/rfcs/rfc2822.html in
particular section 3.2.4. From that it appears the unquoted use of
[a-Z][.a-Z]* is valid. However, I shall leave the intricacies to those
whose understand and appreciate the whole problem...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-28 19:59 ` Carl Worth
  2011-01-28 22:17   ` Jake Edge
@ 2011-01-29 13:45   ` Matthieu Lemerre
  2011-01-30 10:54     ` Thomas Schwinge
  2011-01-30 11:01   ` Thomas Schwinge
  2 siblings, 1 reply; 8+ messages in thread
From: Matthieu Lemerre @ 2011-01-29 13:45 UTC (permalink / raw)
  To: Carl Worth, Thomas Schwinge, notmuch


> Yes, I believe this is related to the dot in the name. From my
> recollection a name with an address requires quoting. So the header that
> is currently formatted as:
> 
> 	From: LWN.net Weekly Notification <lwn@lwn.net>
> 
> should instead be:
> 
> 	From: "LWN.net Weekly Notification" <lwn@lwn.net>

Hi all,

I had already reported this problem in id:"87ipzvk2xh.fsf@free.fr".

Recent versions of GMime perform more robust parsing that fix the
problem, but unfortunately debian only ship old versions of the package.

I don't believe we will be able to make all people from whom we receive
email always send RFC2822-compliant email addresses :)

Matthieu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-29 13:45   ` Matthieu Lemerre
@ 2011-01-30 10:54     ` Thomas Schwinge
  2011-04-25 22:20       ` Carl Worth
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Schwinge @ 2011-01-30 10:54 UTC (permalink / raw)
  To: Matthieu Lemerre, Carl Worth, notmuch

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

Hallo!

On Sat, 29 Jan 2011 14:45:19 +0100, Matthieu Lemerre <racin@free.fr> wrote:
> I had already reported this problem in id:"87ipzvk2xh.fsf@free.fr".

Oh, I see!

> Recent versions of GMime perform more robust parsing that fix the
> problem, but unfortunately debian only ship old versions of the package.

In this case, I'm not eager to work around this issue in notmuch -- this
would only complicate the local code, and yet would be redundand in some
months.  (Your mileage may vary.)  That is, we already do rely on the
GMIME library for this parsing (etc.), so that's the place where this
issue needs to be fixed (and has been).  Hopefully, after the imminent
Debian release, the version of the library will be updated, or someone
(we, I guess) will propose backporting this patch, and get it accepted.
(Which may be non-trivial, as that wasn't a simple patch, but instead
sort of a parser re-write, as I understood it.)


> I don't believe we will be able to make all people from whom we receive
> email always send RFC2822-compliant email addresses :)

Indeed.  :-)


Grüße,
 Thomas


PS: Matthieu, interesting how peoples' (that is, our) paths cross again.
:-)

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-28 19:59 ` Carl Worth
  2011-01-28 22:17   ` Jake Edge
  2011-01-29 13:45   ` Matthieu Lemerre
@ 2011-01-30 11:01   ` Thomas Schwinge
  2 siblings, 0 replies; 8+ messages in thread
From: Thomas Schwinge @ 2011-01-30 11:01 UTC (permalink / raw)
  To: Carl Worth, notmuch

[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]

Hallo!

On Sat, 29 Jan 2011 05:59:38 +1000, Carl Worth <cworth@cworth.org> wrote:
> On Thu, 27 Jan 2011 03:31:49 -0700, Thomas Schwinge <thomas@schwinge.name> wrote:
> > These tests should pass -- but they currently don't.

> Thanks for sending these test cases. This is actually my favorite way to
> receive bug reports. I really appreciate it!

I'm doing this for a simple resons: I won't manually have to remember (!)
and later re-run my testings in order to confirm that the issue has been
fixed.  :-)


> So I'm not sure what could reasonably be changed here in GMime or not. I
> definitely do want to fix notmuch so that it indexes all of the text
> here regardless if it's formatted in an RFC-compliant way or not.

Given Matthieu's comment, I wouldn't; as I explained in my reply to his
email.  Now, if this is indeed a GMIME issue -- do we want these tests in
notmuch at all, or should I rather (adapt and) submit them for GMIME?


> [*] Except for test VI which has a bug in that it searches for the word
> "mailing" in the subject header, but no such word exists in the
> message's subject header.

Ack.  Fixed locally.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 489 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add a few tests for searching LWN emails.
  2011-01-30 10:54     ` Thomas Schwinge
@ 2011-04-25 22:20       ` Carl Worth
  0 siblings, 0 replies; 8+ messages in thread
From: Carl Worth @ 2011-04-25 22:20 UTC (permalink / raw)
  To: Thomas Schwinge, Matthieu Lemerre, notmuch

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Sun, 30 Jan 2011 11:54:46 +0100, Thomas Schwinge <thomas@schwinge.name> wrote:
> On Sat, 29 Jan 2011 14:45:19 +0100, Matthieu Lemerre <racin@free.fr> wrote:
>
> > Recent versions of GMime perform more robust parsing that fix the
> > problem, but unfortunately debian only ship old versions of the package.
> 
> In this case, I'm not eager to work around this issue in notmuch -- this
> would only complicate the local code, and yet would be redundand in some
> months.  (Your mileage may vary.)  That is, we already do rely on the
> GMIME library for this parsing (etc.), so that's the place where this
> issue needs to be fixed (and has been).  Hopefully, after the imminent
> Debian release, the version of the library will be updated,

Debian has indeed caught up here. I've now got version 2.4.23 of Gmime
installed from Debian unstable and these tests all pass now.

Thanks again for the test cases.

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-04-25 22:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-27 10:31 [PATCH] Add a few tests for searching LWN emails Thomas Schwinge
2011-01-28 19:59 ` Carl Worth
2011-01-28 22:17   ` Jake Edge
2011-01-29 12:37     ` Chris Wilson
2011-01-29 13:45   ` Matthieu Lemerre
2011-01-30 10:54     ` Thomas Schwinge
2011-04-25 22:20       ` Carl Worth
2011-01-30 11:01   ` Thomas Schwinge

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).