unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: meta@public-inbox.org, infra@gentoo.org
Subject: Re: public-inbox skipping new inboxes or many mails
Date: Wed, 17 Jul 2024 23:25:32 +0000	[thread overview]
Message-ID: <20240717232532.M125694@dcvr> (raw)
In-Reply-To: <robbat2-20240717T025915-124684207Z@orbis-terrarum.net>

"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> On Tue, Jul 16, 2024 at 07:05:50PM +0000, Eric Wong wrote:
> > > It's definitely very busy after scan, but I can't tell if it's the full
> > > set.
> > OK, I think adding stderr diagnostic messages for full scans
> > shouldn't be too noisy.
> Thanks. I think it will be needed...

OK, will add...

> > > At an admin level, is there a way to dump out all of the paths it's
> > > indexes, to compare against the paths on disk?
> > No, path information isn't stored for public-facing inboxes
> > since it's too unstable.  It should be possible to reverse map
> > things at real-time and add better diagnostic tools, but the
> > philosophy has always been to store||index as little as possible
> > and be able to infer/regenerate needed data on-the-fly to avoid
> > data consistency problems.
> Can I easily dump out every message-id at least? I can compare that
> against the files, other than the old messages with no message-ids.

$ sqlite3 /path/to/msgmap.sqlite3 'SELECT mid FROM msgmap'

For v2, old messages without Message-IDs or recycled+conflicting
Message-IDs will have Message-IDs synthesized
(<YYYYmmddHHMMSS.$base64_digest@z>) as allowed by RFC 3977.

> > > > How many Maildirs are you watching?  I wonder if it's hitting
> > > > RLIMIT_NOFILE... (errors should be logged to stderr).
> > > 6774 Maildirs right now.
> > > I should probably improve the OpenRC script for it, I think we're
> > > throwing away stderr right now for -watch.
> > 
> > Yeah, watch stderr is important for diagnosing problems.
> > 
> > Fwiw, I run it inside a screen(1) session on one system,
> > and rely on systemd to redirect stderr to syslog on another
> I hacked in stderr: but bad luck, it doesn't dump anything useful before
> it seems to vanish. Nothing in dmesg either, so a mundane crash.

Not having anything in stderr on errors is really bad :x

Any fast_import_crash_* files in the [0-9]+\.git dirs?

-watch really shouldn't just vanish...  I'm not familiar with
OpenRC, does/can it wait on processes so it can report exit codes?

> > > > Curious, which architecture is that and is it using
> > > > Linux::Inotify2 or inotify via the `syscall' perlop?
> > > > (I expect 3 is the inotify FD).
> > > x86-64, dev-perl/Linux-Inotify2 is installed on the host, but I can't
> > > tell at a glance if -watch used perlop or package.
> > 
> > Probably, yes; but it can/should favor the pure Perl version
> > soon.  Since it's Gentoo I trust it's up-to-date with broadcast
> > and overflow support?
> dev-perl/Linux-Inotify2 will be up to date. No guarantees that the
> kernel is up to date - some legacy boxes at sponsors are pretty crufty
> and have been unsafe to reboot to new kernels when we lack any OOB
> management access: to that end, public-inbox's responsiveness is amazing
> even running on a 10+ year old RAID1 HDD spinner setup.

OK.  The kernel shouldn't be a problem for inotify, just the
older XS versions lacked some things and the pure Perl version
reduces mmap||vm.max_map_count pressure.  But I also noticed a
bug where we were favoring the XS :x.

Fwiw, I've actually struggled a lot with HDDs w/ Xapian||SQLite
but glad it's working out for you.  I'm mainly working ~15 year
old systems with SSDs that replaced dead HDDs.  Still have
numerous performance and memory optimizations planned :>

  reply	other threads:[~2024-07-17 23:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  6:15 public-inbox skipping new inboxes or many mails Robin H. Johnson
2024-07-15 21:03 ` Eric Wong
2024-07-15 21:45   ` Robin H. Johnson
2024-07-15 23:58     ` Eric Wong
2024-07-16  5:45       ` Robin H. Johnson
2024-07-16 19:05         ` Eric Wong
2024-07-17  3:04           ` Robin H. Johnson
2024-07-17 23:25             ` Eric Wong [this message]
2024-07-17 23:50               ` Eric Wong
2024-07-18  0:02               ` Robin H. Johnson
2024-12-15  0:49             ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240717232532.M125694@dcvr \
    --to=e@80x24.org \
    --cc=infra@gentoo.org \
    --cc=meta@public-inbox.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).