unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] Fix code extracting the MTA from Received: headers
@ 2010-04-07 20:38 Dirk Hohndel
  2010-04-07 22:56 ` Carl Worth
  2010-04-08  7:59 ` Sebastian Spaeth
  0 siblings, 2 replies; 6+ messages in thread
From: Dirk Hohndel @ 2010-04-07 20:38 UTC (permalink / raw)
  To: notmuch


The previous code made too many assumptions about the (sadly not
standardized) format of the Received headers. This version should
be more robust to deal with different variations.

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
---
 notmuch-reply.c |   23 +++++++++--------------
 1 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/notmuch-reply.c b/notmuch-reply.c
index 8eb4754..39377e1 100644
--- a/notmuch-reply.c
+++ b/notmuch-reply.c
@@ -296,28 +296,23 @@ guess_from_received_header (notmuch_config_t *config, notmuch_message_t *message
     received = notmuch_message_get_header (message, "received");
     by = strstr (received, " by ");
     if (by && *(by+4)) {
-	/* we know that there are 4 characters after by - either the 4th one
-	 * is '\0' (broken header) or it is the first letter of the hostname 
-	 * that last received this email - which we'll use to guess the right
-	 * from email address
+	/* sadly, the format of Received: headers is a bit inconsistent,
+	 * depending on the MTA used. So we try to extract just the MTA
+	 * here by removing leading whitespace and assuming that the MTA
+	 * name ends at the next whitespace
+	 * we test for *(by+4) to be non-'\0' to make sure there's something
+	 * there at all - and then assume that the first whitespace delimited
+	 * token that follows is the last receiving server
 	 */
 	mta = strdup (by+4);
 	if (mta == NULL)
 	    return NULL;
-
-	/* After the MTA comes its IP address (or HELO response) in parenthesis.
-	 * so let's terminate the string there
-	 */
-	if ((ptr = strchr (mta, '(')) == NULL) {
-	    free (mta);
+	token = strtok(mta," \t");
+	if (token == NULL)
 	    return NULL;
-	}
-	*ptr = '\0';
-
 	/* Now extract the last two components of the MTA host name
 	 * as domain and tld
 	 */
-	token = mta;
 	while ((ptr = strsep (&token, delim)) != NULL) {
 	    if (*ptr == '\0')
 		continue;
-- 
1.6.6.1


-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix code extracting the MTA from Received: headers
  2010-04-07 20:38 [PATCH] Fix code extracting the MTA from Received: headers Dirk Hohndel
@ 2010-04-07 22:56 ` Carl Worth
  2010-04-08  7:59 ` Sebastian Spaeth
  1 sibling, 0 replies; 6+ messages in thread
From: Carl Worth @ 2010-04-07 22:56 UTC (permalink / raw)
  To: Dirk Hohndel, notmuch

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Wed, 07 Apr 2010 13:38:29 -0700, Dirk Hohndel <hohndel@infradead.org> wrote:
> The previous code made too many assumptions about the (sadly not
> standardized) format of the Received headers. This version should
> be more robust to deal with different variations.

Thanks for maintaining this. I'll have to fiddle with my mail setup
before this feature is useful for me. So I haven't tested this, (other
than to verify that it hasn't broken "notmuch reply" for me).

But I've pushed this now at least.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix code extracting the MTA from Received: headers
  2010-04-07 20:38 [PATCH] Fix code extracting the MTA from Received: headers Dirk Hohndel
  2010-04-07 22:56 ` Carl Worth
@ 2010-04-08  7:59 ` Sebastian Spaeth
  2010-04-08 15:07   ` Dirk Hohndel
  1 sibling, 1 reply; 6+ messages in thread
From: Sebastian Spaeth @ 2010-04-08  7:59 UTC (permalink / raw)
  To: Dirk Hohndel, notmuch

On 2010-04-07, Dirk Hohndel wrote:
> 
> The previous code made too many assumptions about the (sadly not
> standardized) format of the Received headers. This version should
> be more robust to deal with different variations.

This code might be useful for some, but I know it is not being useful
for me. I use e.g. dreamhost.com as my mail provider and I never have my
email domain name show up after the Received: by .....
See my Received headers for your message below.

On the other hand, it contains "for <sebastian@sspaeth.de>" stating the
intended email address explicitely. IMHO, we should use this before we
start some hand-wavy guessing.

Also, I have the "X-Original-To: sebastian@sspaeth.de" header. Is that
something that we could make use of before starting to guess?

Sebastian
-----------------------------------------------------------------------
Received: from segal.dreamhost.com (mx1.spunky.mail.dreamhost.com [208.97.132.47])
	by homiemail-mx12.g.dreamhost.com (Postfix) with ESMTP id 9A6602781BC
	for <sebastian@sspaeth.de>; Wed,  7 Apr 2010 13:38:48 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by segal.dreamhost.com (Postfix) with ESMTP id 9CF8A5341BE
	for <sebastian@sspaeth.de>; Wed,  7 Apr 2010 13:38:48 -0700 (PDT)
Received: from connor.dreamhost.com ([208.97.132.81])
	by localhost (segal.dreamhost.com [208.97.132.104]) (amavisd-new, port 10024)
	with ESMTP id S3IlsMcJewY1 for <sebastian@sspaeth.de>;
	Wed,  7 Apr 2010 13:38:39 -0700 (PDT)
Received: from olra.theworths.org (u15218177.onlinehome-server.com [82.165.184.25])
	by connor.dreamhost.com (Postfix) with ESMTP id 33B472C9806F
	for <sebastian@sspaeth.de>; Wed,  7 Apr 2010 13:38:39 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 1978741733A;
	Wed,  7 Apr 2010 13:38:38 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
Received: from olra.theworths.org ([127.0.0.1])
	by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id ZbcQaubefNY6; Wed,  7 Apr 2010 13:38:37 -0700 (PDT)
Received: from olra.theworths.org (localhost [127.0.0.1])
	by olra.theworths.org (Postfix) with ESMTP id 044574196F4;
	Wed,  7 Apr 2010 13:38:35 -0700 (PDT)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix code extracting the MTA from Received: headers
  2010-04-08  7:59 ` Sebastian Spaeth
@ 2010-04-08 15:07   ` Dirk Hohndel
  2010-04-13 17:37     ` Carl Worth
  0 siblings, 1 reply; 6+ messages in thread
From: Dirk Hohndel @ 2010-04-08 15:07 UTC (permalink / raw)
  To: Sebastian Spaeth, notmuch

On Thu, 08 Apr 2010 09:59:14 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> On 2010-04-07, Dirk Hohndel wrote:
> > 
> > The previous code made too many assumptions about the (sadly not
> > standardized) format of the Received headers. This version should
> > be more robust to deal with different variations.
> 
> This code might be useful for some, but I know it is not being useful
> for me. I use e.g. dreamhost.com as my mail provider and I never have my
> email domain name show up after the Received: by .....
> See my Received headers for your message below.

That's the funny thing about heuristics - they are always based on the
cases the author has access to. I run my own mail servers and they put
in useful Received lines. Dreamhost doesn't appear to do that - I'm sure
there are many other scenarios that I don't handle, yet.
Please keep them coming.
 
> On the other hand, it contains "for <sebastian@sspaeth.de>" stating the
> intended email address explicitely. IMHO, we should use this before we
> start some hand-wavy guessing.
> 
> Also, I have the "X-Original-To: sebastian@sspaeth.de" header. Is that
> something that we could make use of before starting to guess?

It's complicated. Some MTAs put in bogux "for <user@localhost>" or "for
UID 1000" into Received headers. I haven't seen any incorrect
"X-Original-To" headers, but wouldn't be surprised to see those be faked
or wrong, either.
Right now my plan is to do something like this:

1) look for my email address in To/Cc
2) look for my email in "for <email@add.res>" in Received headers
3) look for my email in X-Original-To
4) look for the domain of my email in Received headers (not just 1st)
5) punt and use default email address

Does that sound sane?

(and thanks for sending the headers - this really helps... can others
for whom the current code or the logic mentioned above wouldn't work
send their headers, too, please?)

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix code extracting the MTA from Received: headers
  2010-04-08 15:07   ` Dirk Hohndel
@ 2010-04-13 17:37     ` Carl Worth
  2010-04-13 18:06       ` Dirk Hohndel
  0 siblings, 1 reply; 6+ messages in thread
From: Carl Worth @ 2010-04-13 17:37 UTC (permalink / raw)
  To: Dirk Hohndel, Sebastian Spaeth, notmuch

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel <hohndel@infradead.org> wrote:
> Right now my plan is to do something like this:
> 
> 1) look for my email address in To/Cc
> 2) look for my email in "for <email@add.res>" in Received headers
> 3) look for my email in X-Original-To
> 4) look for the domain of my email in Received headers (not just 1st)
> 5) punt and use default email address
> 
> Does that sound sane?

It sounds sane.

> (and thanks for sending the headers - this really helps... can others
> for whom the current code or the logic mentioned above wouldn't work
> send their headers, too, please?)

I started using fetchmail many years ago and have never really needed to
switch. So I'm still using that, (but don't necessarily recommend it to
anyone.

It seems to break the above since it delivers mail locally, so the first
headers I get are:

	X-Original-To: cworth@localhost
	Delivered-To: cworth@localhost
	Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
		by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
		for <cworth@localhost>; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
	MIME-Version: 1.0
	Received: from 10.22.226.213 [10.22.226.213]
		by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
		for <cworth@localhost> (single-drop); Mon, 12 Apr 2010 09:11:18 -0700 (PDT)

And none of these are useful for your detection. Worse, the presence of
"cworth.org" in the above might throw your detection off before it could
find something useful like "intel.com" in a later Received header.

I'll send a complete message with full headers to you separately.

Perhaps I can just switch programs to transfer email and avoid this
problem. Anyone have a recommendation for something to transfer mail
From an imap server to the local matchine, (but *not* leaving it stored
on the imap server)[*]. I don't think offlineimap supports this mode
does it?

-Carl

[*] I do separately want to start playing with remote notmuch, but I
won't use this with the imap servers currently accepting my
mail. Instead, I'd rather just rsync my mail from my local machine to a
server I own, (which could then export imap if needed), and do remote
notmuch stuff from there.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix code extracting the MTA from Received: headers
  2010-04-13 17:37     ` Carl Worth
@ 2010-04-13 18:06       ` Dirk Hohndel
  0 siblings, 0 replies; 6+ messages in thread
From: Dirk Hohndel @ 2010-04-13 18:06 UTC (permalink / raw)
  To: Carl Worth, Sebastian Spaeth, notmuch

On Tue, 13 Apr 2010 10:37:49 -0700, Carl Worth <cworth@cworth.org> wrote:
> On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel <hohndel@infradead.org> wrote:
> > Right now my plan is to do something like this:
> > 
> > 1) look for my email address in To/Cc
> > 2) look for my email in "for <email@add.res>" in Received headers
> > 3) look for my email in X-Original-To
> > 4) look for the domain of my email in Received headers (not just 1st)
> > 5) punt and use default email address
> > 
> > Does that sound sane?
> 
> It sounds sane.

Good.
 
> > (and thanks for sending the headers - this really helps... can others
> > for whom the current code or the logic mentioned above wouldn't work
> > send their headers, too, please?)
> 
> I started using fetchmail many years ago and have never really needed to
> switch. So I'm still using that, (but don't necessarily recommend it to
> anyone.
> 
> It seems to break the above since it delivers mail locally, so the first
> headers I get are:
> 
> 	X-Original-To: cworth@localhost

Easy to detect. I'll add that as an exclusion

> 	Delivered-To: cworth@localhost
> 	Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
> 		by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
> 		for <cworth@localhost>; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
> 	MIME-Version: 1.0
> 	Received: from 10.22.226.213 [10.22.226.213]
> 		by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
> 		for <cworth@localhost> (single-drop); Mon, 12 Apr 2010 09:11:18 -0700 (PDT)

AHHHHHHHH
(he runs screaming out of the room)

> And none of these are useful for your detection. Worse, the presence of
> "cworth.org" in the above might throw your detection off before it could
> find something useful like "intel.com" in a later Received header.

I have some choice words for these headers...
And an idea how to exclude these false positives as well... It's kind of
a hack, but I'm thinking that in order for the "Received: ... by ..."
part to be truly relevant to us, the from host should have a non-private
IP address. 

Yes, I can envision within-your-own-network cases where none of the
systems have a non-private email address... but then hopefully your last
hop is correct... if not - your setup is even more screwed up than Carl's.

> I'll send a complete message with full headers to you separately.

Thanks
 
> Perhaps I can just switch programs to transfer email and avoid this
> problem. Anyone have a recommendation for something to transfer mail
> From an imap server to the local matchine, (but *not* leaving it stored
> on the imap server)[*]. I don't think offlineimap supports this mode
> does it?

Don't think so. I'm not going to comment on the usefulness of this mode
in public :-)

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-13 18:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-07 20:38 [PATCH] Fix code extracting the MTA from Received: headers Dirk Hohndel
2010-04-07 22:56 ` Carl Worth
2010-04-08  7:59 ` Sebastian Spaeth
2010-04-08 15:07   ` Dirk Hohndel
2010-04-13 17:37     ` Carl Worth
2010-04-13 18:06       ` Dirk Hohndel

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).