On Fri, Nov 07, 2014 at 11:03:21AM -0800, W. Trevor King wrote: > Eric Wong has been working on some tools to store email in a Git > repository, and his client-side code is ssoma [1]. I wanted a bit > more metadata than the stock ssoma-mda [2], and ended up just > writing a ssoma-mda in Python [3]… > > Then I grabbed the archives, and pulled them into Git: > … > The messages I dropped removed duplicate Message-IDs: > … ssoma and public-inbox came up recently (with the end of Gmane) in [1]. I've brought my archives [2] up to speed with a fresh mbox downloaded today [3]. Beyond the ignored messages mentioned in my initial email, I had to ignore: * id:67EEA3E1-918F-47AE-8AD7-EF0A5923D800@m0g.net Which had different headers up through: -X-List-Received-Date: Wed, 06 Jan 2016 15:49:49 -0000 +X-List-Received-Date: Wed, 06 Jan 2016 15:50:34 -0000 but the same body in both instances. I also had to remove two control characters: $ tr -d '\034' notmuch-fixed.mbox to get the mbox into a format that Python could parse without errors. I've pushed the mbox → ssoma(ish) import script to the ‘import’ branch of [2] if folks want to play around. Cheers, Trevor [1]: id:20160820062931.GY30347@odin.tremily.us [2]: git://tremily.us/notmuch-archives.git [3]: http://notmuchmail.org/archives/notmuch.mbox -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy