From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 4B5D46DE0BB9 for ; Sat, 20 Aug 2016 21:38:37 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.07 X-Spam-Level: X-Spam-Status: No, score=-0.07 tagged_above=-999 required=5 tests=[AWL=0.242, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8BTnXnP0iKki for ; Sat, 20 Aug 2016 21:38:34 -0700 (PDT) Received: from resqmta-ch2-12v.sys.comcast.net (resqmta-ch2-12v.sys.comcast.net [69.252.207.44]) by arlo.cworth.org (Postfix) with ESMTPS id C11976DE0BB8 for ; Sat, 20 Aug 2016 21:38:34 -0700 (PDT) Received: from resomta-ch2-02v.sys.comcast.net ([69.252.207.98]) by resqmta-ch2-12v.sys.comcast.net with SMTP id bKWjbeDXkxBKTbKWnb9OHp; Sun, 21 Aug 2016 04:38:33 +0000 Received: from mail.tremily.us ([73.221.72.168]) by resomta-ch2-02v.sys.comcast.net with SMTP id bKUpbW5bBu1gJbKUqbUr9r; Sun, 21 Aug 2016 04:36:33 +0000 Received: by mail.tremily.us (Postfix, from userid 1000) id A8E431E8BE8B; Sat, 20 Aug 2016 21:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tremily.us; s=odin; t=1471754191; bh=4lkyee6vn25iYq9GFalNuep9YEXz+22THbLqfFgFngo=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=MZ0u3uX0mO97A1LOAIi6tXvB9fQDcEWESWgPuLfEWin/GV3qF9U+GC+eOeCNva+Rs b0W+VvSABPCsUqUpIMEJX6X022+GnMSv0NpdXpQov7DVpo0wpH8hcghYsDgmoJAyEA jKD0KXIsV0xYkbidXzQNW4kXKDAOj6OLlpQ1jg/I= Date: Sat, 20 Aug 2016 21:36:31 -0700 From: "W. Trevor King" To: notmuch@notmuchmail.org Cc: Eric Wong Subject: Re: Mail archives in Git using ssoma Message-ID: <20160821043631.GA2338@odin.tremily.us> References: <20141107190321.GL23609@odin.tremily.us> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0F1p//8PRICkK4MW" Content-Disposition: inline In-Reply-To: <20141107190321.GL23609@odin.tremily.us> OpenPGP: id=39A2F3FA2AB17E5D8764F388FC29BDCDF15F5BE8; url=http://tremily.us/pubkey.txt User-Agent: Mutt/1.5.23 (2014-03-12) X-CMAE-Envelope: MS4wfFBmDLNM7Aidf3Z1k+c8z1Zh8fkDQmnkXEtNAN52reCLJK44W9hOG4bYykte9DTA4NC5L6kxtHjDC5JPMzpaB44l6T29Wqx+n4FVEcsHpe9z1cBr8f2v Gaec+uWZmfXW0hhCq6RC2lcTwgj86ZeQQ9aCTiEKoRldJ7BE6qnjObl2HdB+BFXqJcHhs4YkNFK6nA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2016 04:38:37 -0000 --0F1p//8PRICkK4MW Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 07, 2014 at 11:03:21AM -0800, W. Trevor King wrote: > Eric Wong has been working on some tools to store email in a Git > repository, and his client-side code is ssoma [1]. I wanted a bit > more metadata than the stock ssoma-mda [2], and ended up just > writing a ssoma-mda in Python [3]=E2=80=A6 > > Then I grabbed the archives, and pulled them into Git: > =E2=80=A6 > The messages I dropped removed duplicate Message-IDs: > =E2=80=A6 ssoma and public-inbox came up recently (with the end of Gmane) in [1]. I've brought my archives [2] up to speed with a fresh mbox downloaded today [3]. Beyond the ignored messages mentioned in my initial email, I had to ignore: * id:67EEA3E1-918F-47AE-8AD7-EF0A5923D800@m0g.net Which had different headers up through: -X-List-Received-Date: Wed, 06 Jan 2016 15:49:49 -0000 +X-List-Received-Date: Wed, 06 Jan 2016 15:50:34 -0000 but the same body in both instances. I also had to remove two control characters: $ tr -d '\034' notmuch-fixed.mbox to get the mbox into a format that Python could parse without errors. I've pushed the mbox =E2=86=92 ssoma(ish) import script to the =E2=80=98imp= ort=E2=80=99 branch of [2] if folks want to play around. Cheers, Trevor [1]: id:20160820062931.GY30347@odin.tremily.us [2]: git://tremily.us/notmuch-archives.git [3]: http://notmuchmail.org/archives/notmuch.mbox --=20 This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy --0F1p//8PRICkK4MW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJXuS/KAAoJEBBpoQVgXJg1B5AQAIQVhKOtIsoeXcnagCGPBPZp GospYw+tRlj3XMbsut2PIZdGBPUcSWcVEIU5KahHQrhnc3I8asPb8rp1pptoK+Uk AlxHm67U/1ytYh36SXi4/QuhMwqhPL5m/JMSYSYweRNvEbevtCvaY+t1I83ax+vd OVS3g1vUW2IWImCW1uxRT0zJQhqUrZUnukVUV+y71gFFIhFphZ/6PMWPNEN7n9eh BozQ2FQksu67nyjIJidzNxfBjrhNTB0sWsujgElob5BJEGfWQ3JFojyNvZNkDImA u8R5wORZbiurwFzknFMz+cNN+fgo/GfkslIX1+iemlV5H/NCH8aeUCHqgq6DQTAD 5C0pcr3w3GGEegs0vZnERet4XIDMIsJmM6HNxDE7M8wvaS/9Kj+90TBOaOiV/+X/ 9idyDZ22sC2/g+ln0Huf2ruHlXCp2pmY7bz0waC4Eefn//u/bdj9AdOLwtG6/b3+ T9pFUU9iUn8bNgbrcXb4ita13RM/rSTSo8cI4+a4U3LpAoXMCLTJumWQbhBbfk7D sbOE+YJlCNtF4S0IcFR6z88/QIA2X2S070QCXpfLWNIZ5IjUPTsNJmG3EfzLQt0c 0Bak1ALWwQYIS8MKWDVV6P8c2Vx5+UZgFCCAEeKBeeFOHXrZm2snzUmS2eAKOyhD cg46H/jelWheViZmZkV3 =ufhK -----END PGP SIGNATURE----- --0F1p//8PRICkK4MW--