unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: Carl Worth <cworth@cworth.org>,
	Mark Walters <markwalters1009@gmail.com>,
	 notmuch <notmuch@notmuchmail.org>
Subject: Re: [RFC PATCH] Re: excessive thread fusing
Date: Sun, 20 Apr 2014 21:59:26 +0900	[thread overview]
Message-ID: <87fvl8upg1.fsf@maritornes.cs.unb.ca> (raw)
In-Reply-To: <87oazwjq1e.fsf@yoom.home.cworth.org>

Carl Worth <cworth@cworth.org> writes:
>
> Another idea would be to trigger specifically on common forms. Judging
> From the samples in this particular thread, it seems like a workable
> heuristic would be:
>
> 	If the In-Reply-To header begins with '<':
>
> 		Parse that initial portion as a message ID
>
> 	Else if it ends with '>':
>
> 		Parse that final portion as a message ID
>
> 	Else
>
> 		Ignore this garbage-valued header.
>

using the hacky script below, I scanned my own mail collection of about
300k messages. I can make the following observations

- I have some RFC compliant in-reply-to's with multiple ids
- I have have a non-trivial number of Message from $NAME <address> of $date <id>
- I didn't see any cases where using the last angle bracketed thing
  would fail.
- I did see some some cases where the header starts with '<' but the
  matching '>' was missing
- I also noticed some rfc2047 encoding of in-reply-to headers.


######################################################################
# hacky script follows
dir=$1
echo Scanning $dir

tempdir=$(mktemp -d)
echo Writing to ${tempdir}

find $dir -exec sh -c "formail -c -xIn-reply-to < {}" \; \
  > ${tempdir}/ids

sed  -e 's/\t/ /' -e 's/   */ /g' -e 's/<[^ ]*>/<id>/g' -e 's/(.*)/(comment)/' < ${tempdir}/ids | sort | uniq | tee ${tempdir}/report

      parent reply	other threads:[~2014-04-20 12:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-19 12:33 excessive thread fusing David Bremner
2014-04-19 17:52 ` Eric
2014-04-19 21:04   ` Andrei POPESCU
2014-04-20 16:48     ` Austin Clements
2014-04-20 17:46       ` Austin Clements
2014-04-20  7:14 ` [RFC PATCH] " Mark Walters
     [not found]   ` <87oazwjq1e.fsf@yoom.home.cworth.org>
2014-04-20 12:03     ` Mark Walters
2014-04-21  7:20       ` Mark Walters
2014-04-21 16:20         ` Austin Clements
2022-01-01  0:26         ` David Bremner
2014-04-20 12:59     ` David Bremner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fvl8upg1.fsf@maritornes.cs.unb.ca \
    --to=david@tethera.net \
    --cc=cworth@cworth.org \
    --cc=markwalters1009@gmail.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).