unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org, infra@gentoo.org
Subject: Re: public-inbox skipping new inboxes or many mails
Date: Mon, 15 Jul 2024 21:03:40 +0000	[thread overview]
Message-ID: <20240715210340.M929931@dcvr> (raw)
In-Reply-To: <robbat2-20240715T053654-315499756Z@orbis-terrarum.net>

"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> Hi,
> 
> After some long delays, we're trying to roll out public-inbox for
> Gentoo's mailing lists.
> 
> This is the latest HEAD at 18f41f5af397f903898154591de2cd1cd514c920 2024/07/07,
> plus the AltID patch you sent before.
> 
> It has mostly been smooth so far, but have run into weirdness that it
> seems to not read any files for many inboxes, and for other inboxes, it
> has recent mail, but refuses to reindex any older mail.
> 
> Even many -vvvv options give no clue why it seems to skip entire folders.

/me notes `-v' isn't an option for public-inbox-watch...

> Here's one of the lists where previously indexed exactly one file - a
> very recent one, and ignored everything else. When working on a
> reproduction case for you, it went down to not indexing ANY files.
> 
> The strace is really interesting in that it seems to not even open or stat
> anything in the /var/archives path.

Yeah, I've mainly used strace or similar tools for diagnostics
to avoid having to maintain code for tracing.

> The most frustrating variant of the output is this:
> $ public-inbox-index -vvvvv --reindex \
>   /var/public-inbox/eudev.lists.gentoo.org.git
> # indexing /var/public-inbox/eudev.lists.gentoo.org.git ...
> 
> (Nothing about why it seemed to not scan the maildirs at all).

public-inbox-index doesn't touch Maildirs (or mbox, MH, etc) at all.
-index only exists to handle mail already in git repos; that is
-index is intended for freshly cloned inboxes, adding search to
old v1 inboxes, and/or changing indexlevel after init.

Currently, public-inbox-watch is the only public-inbox-* tool which
works directly with Maildirs.

> gentoo-releng-autobuilds.lists.gentoo.org.git it indexed a single file - and not more.

Hmm... (more below)

> Deleting & recreating
> /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git make it
> go down from 1 file to not indexing any files.
> 
> $ export PI_CONFIG=/etc/public-inbox/config
> 
> $ public-inbox-init --indexlevel full \
>   --version 2 --jobs 2 \
>   gentoo-releng-autobuilds \
>   /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \
>   https://public-inbox.gentoo.org/gentoo-releng-autobuilds \
>   gentoo-releng-autobuilds@lists.gentoo.org

sidenote: `.git' suffix is a bit confusing for v2 inboxes;
only v1 used a single bare git repo

> $ grep gentoo-releng-autobuilds /etc/public-inbox/config
> [publicinbox "gentoo-releng-autobuilds"]
> address = gentoo-releng-autobuilds@lists.gentoo.org
> url = https://public-inbox.gentoo.org/gentoo-releng-autobuilds
> inboxdir = /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git
> altid = indexfilter:xarchiveshash:package=XArchivesHash
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201101
> watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201102
> ...

Those watch= directives are intended for public-inbox-watch.

I'm curious how you got a single message indexed, however...
is that from public-inbox-mda?

Fwiw, I started working on a public-inbox-(import/ctl) tool to
quickly import a bunch of messages a while back but got
sidetracked.  Been busy dealing with personal problems much of
this year :<

But public-inbox-watch works reasonably well for large imports
even if the git history ordering gets a bit wonky from readdir.
SIGHUP/SIGUSR1 + strace are useful for reloading and tracing
configuration problems with the -watch daemon.

  reply	other threads:[~2024-07-15 21:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  6:15 public-inbox skipping new inboxes or many mails Robin H. Johnson
2024-07-15 21:03 ` Eric Wong [this message]
2024-07-15 21:45   ` Robin H. Johnson
2024-07-15 23:58     ` Eric Wong
2024-07-16  5:45       ` Robin H. Johnson
2024-07-16 19:05         ` Eric Wong
2024-07-17  3:04           ` Robin H. Johnson
2024-07-17 23:25             ` Eric Wong
2024-07-17 23:50               ` Eric Wong
2024-07-18  0:02               ` Robin H. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240715210340.M929931@dcvr \
    --to=e@80x24.org \
    --cc=infra@gentoo.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).