From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 66D2D1F406; Mon, 20 Nov 2023 00:10:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1700439001; bh=uoz/U/CSA+gKL2RhHKfOSXZS1tDmLxct+tox3UNyPjg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=raGUshhdNFVg99s5/1NedOD9Ut/Wb9G+xfHQcj60mvIkBWkUfmDDcPeX1T/fWH9oz SLEXql/ws1SayorINCKGxnDjxG7M8mUKXj+Tayv1wqWCwM9JTLDpvjLgjmllkxVoGY Z6fS8kLtXiwMV4swIC64Oc0fVRTE+bLdWVHAn7Ww= Date: Mon, 20 Nov 2023 00:10:01 +0000 From: Eric Wong To: "Robin H. Johnson" Cc: meta@public-inbox.org Subject: Re: publicinbox watch path globbing Message-ID: <20231120001001.M311669@dcvr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: List-Id: "Robin H. Johnson" wrote: > Hi! > > Writing to see about work in converting Gentoo's (now-broken) other > archives web interface over into using public-inbox instead. > > This is the first of a few questions/bumps along the way. > > For historical reasons on the scaling side, the archive maildirs are > stored by date: > watch = maildir:$REDACTED/$LISTNAME/.200001/ > watch = maildir:$REDACTED/$LISTNAME/.200102/ > watch = maildir:$REDACTED/$LISTNAME/.YYYYMM/ > watch = maildir:$REDACTED/$LISTNAME/.202311/ > etc. > (over time, directories are moved to stable read-only storage) Is there any reason to expect new messages to appear the /.2000??/ and other old directories? IOW, if somebody with a broken clock sends a message from a past year/month in the Date: header, does it end up in an old bucket or the current one? If your old buckets are frozen, lei in public-inbox.git should be able to start them off with: for d in $REDACTED/$LISTNAME/.?????? do lei convert -o v2:/path/to/inbox-$LISTNAME maildir:$d done lei daemon-kill # optional, stops lei-daemon when done And then you'd only have to watch the latest maildir. I'll try to get public-inbox 2.0 released soon[1]; but the lei convert stuff should be ready. > If a given list is low traffic does NOT get traffic in a given month, > the directory does not exist (it's created when the first mail arrives > during a calendar month). > > Multiply this by ~120 lists, and it gets on the large side for a config > file: 7500+ lines just for the "watch" entries. I agree that sucks. > While I could generate the config file, I'm wondering about better > solution, to allow globbing the path. I wanted to have recursive watches at some point but never got around to it. So I guess something like this could work recursively: watchglob = maildir:$REDACTED/$LISTNAME/** > I tried to locate a single place in the codebase where this would be > applied, but it's not clear enough to me if there's a single place that > it can easily modified. The `new' sub in lib/PublicInbox/Watch.pm sets up maildirs/imap/nntp The glob2re function is better nowadays in public-inbox.git, and the mdre regexp will probably needs to be updated when it sees a new maildir... > If there's a consistent place, I think the cleanest syntax that doesn't > break existing consumers would be something like this: > [publicinbox "$LISTNAME"] > watch = maildirglob:$REDACTED/$LISTNAME/.19????/ > watch = maildirglob:$REDACTED/$LISTNAME/.20????/ I think `watchglob = maildir:...' is preferable since I don't want maildirglob: to be confused as a type. [1] mainly blocked on releasing trying to wrap my head around -cindex