From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org, infra@gentoo.org
Subject: Re: public-inbox skipping new inboxes or many mails
Date: Mon, 15 Jul 2024 21:03:40 +0000 [thread overview]
Message-ID: <20240715210340.M929931@dcvr> (raw)
In-Reply-To: <robbat2-20240715T053654-315499756Z@orbis-terrarum.net>
"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> Hi,
>
> After some long delays, we're trying to roll out public-inbox for
> Gentoo's mailing lists.
>
> This is the latest HEAD at 18f41f5af397f903898154591de2cd1cd514c920 2024/07/07,
> plus the AltID patch you sent before.
>
> It has mostly been smooth so far, but have run into weirdness that it
> seems to not read any files for many inboxes, and for other inboxes, it
> has recent mail, but refuses to reindex any older mail.
>
> Even many -vvvv options give no clue why it seems to skip entire folders.
/me notes `-v' isn't an option for public-inbox-watch...
> Here's one of the lists where previously indexed exactly one file - a
> very recent one, and ignored everything else. When working on a
> reproduction case for you, it went down to not indexing ANY files.
>
> The strace is really interesting in that it seems to not even open or stat
> anything in the /var/archives path.
Yeah, I've mainly used strace or similar tools for diagnostics
to avoid having to maintain code for tracing.
> The most frustrating variant of the output is this:
> $ public-inbox-index -vvvvv --reindex \
> /var/public-inbox/eudev.lists.gentoo.org.git
> # indexing /var/public-inbox/eudev.lists.gentoo.org.git ...
>
> (Nothing about why it seemed to not scan the maildirs at all).
public-inbox-index doesn't touch Maildirs (or mbox, MH, etc) at all.
-index only exists to handle mail already in git repos; that is
-index is intended for freshly cloned inboxes, adding search to
old v1 inboxes, and/or changing indexlevel after init.
Currently, public-inbox-watch is the only public-inbox-* tool which
works directly with Maildirs.
> gentoo-releng-autobuilds.lists.gentoo.org.git it indexed a single file - and not more.
Hmm... (more below)
> Deleting & recreating
> /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git make it
> go down from 1 file to not indexing any files.
>
> $ export PI_CONFIG=/etc/public-inbox/config
>
> $ public-inbox-init --indexlevel full \
> --version 2 --jobs 2 \
> gentoo-releng-autobuilds \
> /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \
> https://public-inbox.gentoo.org/gentoo-releng-autobuilds \
> gentoo-releng-autobuilds@lists.gentoo.org
sidenote: `.git' suffix is a bit confusing for v2 inboxes;
only v1 used a single bare git repo
> $ grep gentoo-releng-autobuilds /etc/public-inbox/config
> [publicinbox "gentoo-releng-autobuilds"]
> address = gentoo-releng-autobuilds@lists.gentoo.org
> url = https://public-inbox.gentoo.org/gentoo-releng-autobuilds
> inboxdir = /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git
> altid = indexfilter:xarchiveshash:package=XArchivesHash
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201101
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201102
> ...
Those watch= directives are intended for public-inbox-watch.
I'm curious how you got a single message indexed, however...
is that from public-inbox-mda?
Fwiw, I started working on a public-inbox-(import/ctl) tool to
quickly import a bunch of messages a while back but got
sidetracked. Been busy dealing with personal problems much of
this year :<
But public-inbox-watch works reasonably well for large imports
even if the git history ordering gets a bit wonky from readdir.
SIGHUP/SIGUSR1 + strace are useful for reloading and tracing
configuration problems with the -watch daemon.
next prev parent reply other threads:[~2024-07-15 21:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 6:15 public-inbox skipping new inboxes or many mails Robin H. Johnson
2024-07-15 21:03 ` Eric Wong [this message]
2024-07-15 21:45 ` Robin H. Johnson
2024-07-15 23:58 ` Eric Wong
2024-07-16 5:45 ` Robin H. Johnson
2024-07-16 19:05 ` Eric Wong
2024-07-17 3:04 ` Robin H. Johnson
2024-07-17 23:25 ` Eric Wong
2024-07-17 23:50 ` Eric Wong
2024-07-18 0:02 ` Robin H. Johnson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240715210340.M929931@dcvr \
--to=e@80x24.org \
--cc=infra@gentoo.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).