From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 26D3B1F572; Mon, 15 Jul 2024 21:03:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1721077421; bh=GTAd1MqCJN0q/4Kr4ynIcK18tw1NQ43fiTBlj6rqprc=; h=Date:From:To:Subject:References:In-Reply-To:From; b=Wmk/q7+OVetC5vYqjm+FCsyAWYH12YNbPfqjBYU3gAMurneU2k4mGRJk9UwzjV27X 2iDkHvV9BuzANDKUbFZjdJQ414SS6OT6RzSkT/wwTPtbe5NcluCCz2mlYF1OegWISw XO8mzwfWg6yXAVb8wgkKhUc5IA2piadGSznfUIpw= Date: Mon, 15 Jul 2024 21:03:40 +0000 From: Eric Wong To: meta@public-inbox.org, infra@gentoo.org Subject: Re: public-inbox skipping new inboxes or many mails Message-ID: <20240715210340.M929931@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: "Robin H. Johnson" wrote: > Hi, > > After some long delays, we're trying to roll out public-inbox for > Gentoo's mailing lists. > > This is the latest HEAD at 18f41f5af397f903898154591de2cd1cd514c920 2024/07/07, > plus the AltID patch you sent before. > > It has mostly been smooth so far, but have run into weirdness that it > seems to not read any files for many inboxes, and for other inboxes, it > has recent mail, but refuses to reindex any older mail. > > Even many -vvvv options give no clue why it seems to skip entire folders. /me notes `-v' isn't an option for public-inbox-watch... > Here's one of the lists where previously indexed exactly one file - a > very recent one, and ignored everything else. When working on a > reproduction case for you, it went down to not indexing ANY files. > > The strace is really interesting in that it seems to not even open or stat > anything in the /var/archives path. Yeah, I've mainly used strace or similar tools for diagnostics to avoid having to maintain code for tracing. > The most frustrating variant of the output is this: > $ public-inbox-index -vvvvv --reindex \ > /var/public-inbox/eudev.lists.gentoo.org.git > # indexing /var/public-inbox/eudev.lists.gentoo.org.git ... > > (Nothing about why it seemed to not scan the maildirs at all). public-inbox-index doesn't touch Maildirs (or mbox, MH, etc) at all. -index only exists to handle mail already in git repos; that is -index is intended for freshly cloned inboxes, adding search to old v1 inboxes, and/or changing indexlevel after init. Currently, public-inbox-watch is the only public-inbox-* tool which works directly with Maildirs. > gentoo-releng-autobuilds.lists.gentoo.org.git it indexed a single file - and not more. Hmm... (more below) > Deleting & recreating > /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git make it > go down from 1 file to not indexing any files. > > $ export PI_CONFIG=/etc/public-inbox/config > > $ public-inbox-init --indexlevel full \ > --version 2 --jobs 2 \ > gentoo-releng-autobuilds \ > /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \ > https://public-inbox.gentoo.org/gentoo-releng-autobuilds \ > gentoo-releng-autobuilds@lists.gentoo.org sidenote: `.git' suffix is a bit confusing for v2 inboxes; only v1 used a single bare git repo > $ grep gentoo-releng-autobuilds /etc/public-inbox/config > [publicinbox "gentoo-releng-autobuilds"] > address = gentoo-releng-autobuilds@lists.gentoo.org > url = https://public-inbox.gentoo.org/gentoo-releng-autobuilds > inboxdir = /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git > altid = indexfilter:xarchiveshash:package=XArchivesHash > watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds > watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201101 > watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201102 > ... Those watch= directives are intended for public-inbox-watch. I'm curious how you got a single message indexed, however... is that from public-inbox-mda? Fwiw, I started working on a public-inbox-(import/ctl) tool to quickly import a bunch of messages a while back but got sidetracked. Been busy dealing with personal problems much of this year :< But public-inbox-watch works reasonably well for large imports even if the git history ordering gets a bit wonky from readdir. SIGHUP/SIGUSR1 + strace are useful for reloading and tracing configuration problems with the -watch daemon.