From: Eric Wong <e@80x24.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: meta@public-inbox.org, infra@gentoo.org
Subject: Re: public-inbox skipping new inboxes or many mails
Date: Wed, 17 Jul 2024 23:25:32 +0000 [thread overview]
Message-ID: <20240717232532.M125694@dcvr> (raw)
In-Reply-To: <robbat2-20240717T025915-124684207Z@orbis-terrarum.net>
"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> On Tue, Jul 16, 2024 at 07:05:50PM +0000, Eric Wong wrote:
> > > It's definitely very busy after scan, but I can't tell if it's the full
> > > set.
> > OK, I think adding stderr diagnostic messages for full scans
> > shouldn't be too noisy.
> Thanks. I think it will be needed...
OK, will add...
> > > At an admin level, is there a way to dump out all of the paths it's
> > > indexes, to compare against the paths on disk?
> > No, path information isn't stored for public-facing inboxes
> > since it's too unstable. It should be possible to reverse map
> > things at real-time and add better diagnostic tools, but the
> > philosophy has always been to store||index as little as possible
> > and be able to infer/regenerate needed data on-the-fly to avoid
> > data consistency problems.
> Can I easily dump out every message-id at least? I can compare that
> against the files, other than the old messages with no message-ids.
$ sqlite3 /path/to/msgmap.sqlite3 'SELECT mid FROM msgmap'
For v2, old messages without Message-IDs or recycled+conflicting
Message-IDs will have Message-IDs synthesized
(<YYYYmmddHHMMSS.$base64_digest@z>) as allowed by RFC 3977.
> > > > How many Maildirs are you watching? I wonder if it's hitting
> > > > RLIMIT_NOFILE... (errors should be logged to stderr).
> > > 6774 Maildirs right now.
> > > I should probably improve the OpenRC script for it, I think we're
> > > throwing away stderr right now for -watch.
> >
> > Yeah, watch stderr is important for diagnosing problems.
> >
> > Fwiw, I run it inside a screen(1) session on one system,
> > and rely on systemd to redirect stderr to syslog on another
> I hacked in stderr: but bad luck, it doesn't dump anything useful before
> it seems to vanish. Nothing in dmesg either, so a mundane crash.
Not having anything in stderr on errors is really bad :x
Any fast_import_crash_* files in the [0-9]+\.git dirs?
-watch really shouldn't just vanish... I'm not familiar with
OpenRC, does/can it wait on processes so it can report exit codes?
> > > > Curious, which architecture is that and is it using
> > > > Linux::Inotify2 or inotify via the `syscall' perlop?
> > > > (I expect 3 is the inotify FD).
> > > x86-64, dev-perl/Linux-Inotify2 is installed on the host, but I can't
> > > tell at a glance if -watch used perlop or package.
> >
> > Probably, yes; but it can/should favor the pure Perl version
> > soon. Since it's Gentoo I trust it's up-to-date with broadcast
> > and overflow support?
> dev-perl/Linux-Inotify2 will be up to date. No guarantees that the
> kernel is up to date - some legacy boxes at sponsors are pretty crufty
> and have been unsafe to reboot to new kernels when we lack any OOB
> management access: to that end, public-inbox's responsiveness is amazing
> even running on a 10+ year old RAID1 HDD spinner setup.
OK. The kernel shouldn't be a problem for inotify, just the
older XS versions lacked some things and the pure Perl version
reduces mmap||vm.max_map_count pressure. But I also noticed a
bug where we were favoring the XS :x.
Fwiw, I've actually struggled a lot with HDDs w/ Xapian||SQLite
but glad it's working out for you. I'm mainly working ~15 year
old systems with SSDs that replaced dead HDDs. Still have
numerous performance and memory optimizations planned :>
next prev parent reply other threads:[~2024-07-17 23:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-15 6:15 public-inbox skipping new inboxes or many mails Robin H. Johnson
2024-07-15 21:03 ` Eric Wong
2024-07-15 21:45 ` Robin H. Johnson
2024-07-15 23:58 ` Eric Wong
2024-07-16 5:45 ` Robin H. Johnson
2024-07-16 19:05 ` Eric Wong
2024-07-17 3:04 ` Robin H. Johnson
2024-07-17 23:25 ` Eric Wong [this message]
2024-07-17 23:50 ` Eric Wong
2024-07-18 0:02 ` Robin H. Johnson
2024-12-15 0:49 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240717232532.M125694@dcvr \
--to=e@80x24.org \
--cc=infra@gentoo.org \
--cc=meta@public-inbox.org \
--cc=robbat2@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).