From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 5C9331F572; Tue, 16 Jul 2024 19:05:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1721156750; bh=tR5LD917+k7GqB5yBrQJ2y/z8bEroeezbo6agH/jOfE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=E4Bap/qQt/VSDlpeTSTFRG/WwRoNC1StZrEKuxsQ55Rai6eohS9cUgZ48duwcaRfs McPiafJ+EScTOEeBIapl5DbdpmIhftkw8TilGXITRC50soxPH6l5T5HuDNdHArwtr2 6hM4ouJmOVMeSWI9h4ruzJcC1FuwMqmHEwETQt6E= Date: Tue, 16 Jul 2024 19:05:50 +0000 From: Eric Wong To: "Robin H. Johnson" Cc: meta@public-inbox.org, infra@gentoo.org Subject: Re: public-inbox skipping new inboxes or many mails Message-ID: <20240716190550.M242334@dcvr> References: <20240715210340.M929931@dcvr> <20240715235808.M590962@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: "Robin H. Johnson" wrote: > On Mon, Jul 15, 2024 at 11:58:08PM +0000, Eric Wong wrote: > > > On Mon, Jul 15, 2024 at 09:03:40PM +0000, Eric Wong wrote: > > > > Currently, public-inbox-watch is the only public-inbox-* tool which > > > > works directly with Maildirs. > > > HMm, I had excluded public-inbox-watch initially because it didn't seem > > > to be doing anything after the very long startup. I'm thinking that the > > > inotify is not working as expected, maybe relating to the huge number of > > > folders we watch. > > > > -watch is (or should be) doing a full scan every startup, but it > > switches between inboxes every few messages and tries to > > prioritize new messages from inotify. Curious to see the > > strace immediately after startup to see if it's indeed doing the > > full scan. I should probably add a stderr diagnostic for full > > scan completion... > It's definitely very busy after scan, but I can't tell if it's the full > set. OK, I think adding stderr diagnostic messages for full scans shouldn't be too noisy. > At an admin level, is there a way to dump out all of the paths it's > indexes, to compare against the paths on disk? No, path information isn't stored for public-facing inboxes since it's too unstable. It should be possible to reverse map things at real-time and add better diagnostic tools, but the philosophy has always been to store||index as little as possible and be able to infer/regenerate needed data on-the-fly to avoid data consistency problems. (lei stores path info, but it's been a problematic implementation, too :<) > > How many Maildirs are you watching? I wonder if it's hitting > > RLIMIT_NOFILE... (errors should be logged to stderr). > 6774 Maildirs right now. > I should probably improve the OpenRC script for it, I think we're > throwing away stderr right now for -watch. Yeah, watch stderr is important for diagnosing problems. Fwiw, I run it inside a screen(1) session on one system, and rely on systemd to redirect stderr to syslog on another OK, various limits seem fine. > > Curious, which architecture is that and is it using > > Linux::Inotify2 or inotify via the `syscall' perlop? > > (I expect 3 is the inotify FD). > x86-64, dev-perl/Linux-Inotify2 is installed on the host, but I can't > tell at a glance if -watch used perlop or package. Probably, yes; but it can/should favor the pure Perl version soon. Since it's Gentoo I trust it's up-to-date with broadcast and overflow support?