From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id BA4CE1F6C1; Sun, 21 Aug 2016 18:28:58 +0000 (UTC) Date: Sun, 21 Aug 2016 18:28:58 +0000 From: Eric Wong To: "W. Trevor King" Cc: notmuch@notmuchmail.org, David Bremner , Steven Allen , Tomi Ollila , Carl Worth , meta@public-inbox.org Subject: Re: Mail archives in Git using ssoma (Docker image) Message-ID: <20160821182858.GA11495@dcvr> References: <20141107190321.GL23609@odin.tremily.us> <20160821043631.GA2338@odin.tremily.us> <20160821094833.GB2338@odin.tremily.us> <20160821120852.GA12964@dcvr> <20160821173600.GB30347@odin.tremily.us> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20160821173600.GB30347@odin.tremily.us> List-Id: "W. Trevor King" wrote: > On Sun, Aug 21, 2016 at 12:08:52PM +0000, Eric Wong wrote: > > "W. Trevor King" wrote: > > > This is the ssoma archive (with the data in it). I just set up a > > > basic HTTP archive (following [1]) based on a Docker image [2] (Gentoo > > > doesn't package all the Perl dependencies public-inbox needs). > > > > Ugh, that sucks (sorry, not a fan of Docker). > > > > What's missing from Gentoo? > > Gentoo doesn't package (or I couldn't find the package for) > Encode::MIME::Header or Mail::Thread. I tried installing things from > CPAN, but ran into a compile-time error from the ‘cpan’ invocationand > gave up ;). I can try and reproduce the error if you're curious, but > I don't have it handy at the moment. Encode::MIME::Header is distributed with perl itself on Debian and also the stock upstream install. Not sure if there's an option you missed or disabled. Which perl version do you use? perl on 5.14 Debian wheezy even seems to have it. I actually still want everything to work on 5.8, since that seems to be the de-facto baseline in the wild. Mail::Thread is one .pm, and I'll probably replace it with something (same algorithm) which can use half the memory by avoiding wrapper object abstractions (it's probably the biggest memory hog at the moment). lib/PublicInbox/Thread.pm already has 3 monkey patches to workaround upstream bugs in Mail::Thread. It's dead upstream, and not available on FreeBSD, either. > > > $ git config -f srv/notmuch.git/config publicinbox.http http://tremily.us > > > $ git config -f srv/notmuch.git/config publicinbox.email notmuch@notmuchmail.org > > > > That should probably be: > > > > ; based on your [3] > > git config -f srv/notmuch.git/config \ > > publicinbox.notmuch.url http://tremily.us/notmuch > > > > git config -f srv/notmuch.git/config \ > > publicinbox.notmuch.address notmuch@notmuchmail.org > > > > ; this is crucial for all the public-inbox-* tools > > git config -f srv/notmuch.git/config \ > > publicinbox.notmuch.mainrepo /path/to/notmuch.git > > I was using these in the Dockerfile's CMD: > > (cd /srv; > for NAME in *; > do > CONF="/srv/${NAME}/config"; > public-inbox-init "${NAME}" "/srv/${NAME}" $(git config -f "${CONF}" publicinbox.http) $(git config -f "${CONF}" publicinbox.email); > done) && … > > Are you saying that I can skip the ~/.public-inbox/config entries > setup by public-inbox-init if I set publicinbox.{name}.* in the ssoma > repository's config? That would be nice. Erm, sorry, no, I mean ~/.public-inbox/config as the "git config -f" arg in the above commands. Your original config was meaningless in the context of public-inbox itself; I don't recall public-inbox relies on $GIT_DIR/config much (if at all) outside of standard git things. Using ~/.public-inbox/config is required for multi-inbox lookups (since you normally run MDA w/o args) You can also override ~/.public-inbox/config by setting the PI_CONFIG env (like GIT_CONFIG). > I don't see a point to having {name} in ssoma-config settings though, > since you're already in a single bucket by that point (using > publicinbox.{name}.* makes sense in the multi-bucket > ~/.public-inbox/config). > > > > It's not updating automatically yet, but that will probably look > > > like: > > > > > > 1. Pull new mbox [4]. > > > 2. Import into notmuch-arcives [5]. > > > 3. Re-run public-inbox-index (this could probably be via ‘docker exec …’. > > > > > > But I'll have to test that to confirm. And ideally we'd be using > > > ssoma-mda or similar directly, instead of going through mbox, but I'd > > > rather get the official headers on the stored mail than be efficient > > > ;). > > > > For mirroring existing lists, I started using public-inbox-watch > > which currently watches Maildirs. > > If I had a Maildir locally, I'd just use procmail and push new > messages into ssoma-mda. I'm using the import script because my local > mail has “how we delivered this to Trevor” headers (which I don't want > to add) but the downloaded mbox has “how we delivered this to > notmuch@notmuchmail.org” (which seems like a better fit for a shared > ssoma repo). I don't mind extra/different headers. The majority of messages in public-inbox.org/git/ has messages that were delivered to gmane; recent ones are delivered to me, and some holes were filled in by Jeff King's archives. All of our mail systems add different headers. > > I recommend public-inbox-watch for mirroring existing lists (such as > > what I did with git@vger) but public-inbox-mda for self-hosted lists > > (such as meta@public-inbox.org). > > Why is that? Procmail + public-inbox-mda (or my Python ssoma-mda fork > [1,2]) seems simpler and equally effective if you want to insert a > message that your mail system is delivering locally. -watch is usable for importing big archives or bursts of traffic since it doesn't have to reload Perl/python on every mail (this is probably not a problem for notmuch; but is for vger lists). The defaults are also less-opinionated so it won't reject attachments that passed through the list server. Maildir with your MUA (in case you missed some earlier, and got them from another user or archive). There's also a Filter inferface I added (see lib/PublicInbox/Filter/Vger.pm as an example) for dropping list trailers before SA, so those trailers don't influence Bayes, but you can do that in the MDA stage, too. But *-mda is fine, too :> > [1]: id:20141107190321.GL23609@odin.tremily.us > [2]: id:af679af8257e250ac606e35a1307ad02907b8426.1413663212.git.wking@tremily.us > http://public-inbox.org/meta/af679af8257e250ac606e35a1307ad02907b8426.1413663212.git.wking@tremily.us/t/#u