From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7A02E1F452 for ; Mon, 20 Nov 2023 00:17:00 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 3DF4A335CEE for ; Mon, 20 Nov 2023 00:16:59 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) by grubbs.orbis-terrarum.net (Postfix) with ESMTP id 42D8B26019E for ; Mon, 20 Nov 2023 00:16:58 +0000 (UTC) Received: (qmail 2080 invoked by uid 10000); 20 Nov 2023 00:16:57 -0000 Date: Mon, 20 Nov 2023 00:16:57 +0000 From: "Robin H. Johnson" To: Eric Wong Cc: "Robin H. Johnson" , meta@public-inbox.org Subject: Re: publicinbox watch path globbing Message-ID: References: <20231120001001.M311669@dcvr> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="u6/TeCT0z+c2Voeo" Content-Disposition: inline In-Reply-To: <20231120001001.M311669@dcvr> List-Id: --u6/TeCT0z+c2Voeo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Nov 20, 2023 at 12:10:01AM +0000, Eric Wong wrote: > "Robin H. Johnson" wrote: > > Hi! > >=20 > > Writing to see about work in converting Gentoo's (now-broken) other > > archives web interface over into using public-inbox instead. > >=20 > > This is the first of a few questions/bumps along the way. > >=20 > > For historical reasons on the scaling side, the archive maildirs are > > stored by date: > > watch =3D maildir:$REDACTED/$LISTNAME/.200001/ > > watch =3D maildir:$REDACTED/$LISTNAME/.200102/ > > watch =3D maildir:$REDACTED/$LISTNAME/.YYYYMM/ > > watch =3D maildir:$REDACTED/$LISTNAME/.202311/ > > etc. > > (over time, directories are moved to stable read-only storage) >=20 > Is there any reason to expect new messages to appear the /.2000??/ > and other old directories? >=20 > IOW, if somebody with a broken clock sends a message from a past > year/month in the Date: header, does it end up in an old bucket > or the current one? The date is based on arrival time at the archive ingest. For some of the very old lists, we do have a list of message-ids that we know existed but aren't captured in the archive, and those mails have been added to the old locations if they are ever found (maybe once a year). >=20 > If your old buckets are frozen, lei in public-inbox.git should be > able to start them off with: >=20 > for d in $REDACTED/$LISTNAME/.?????? > do > lei convert -o v2:/path/to/inbox-$LISTNAME maildir:$d > done > lei daemon-kill # optional, stops lei-daemon when done >=20 > And then you'd only have to watch the latest maildir. Any concerns during the month rollover period? E.g. making sure the 202310 & 202311 are both watched right as time increments from October to November, because the archive ingest is likely to write to 202311, but it's possible that public-inbox is still run for the last few new messages in 202310 yet? > > While I could generate the config file, I'm wondering about better > > solution, to allow globbing the path. >=20 > I wanted to have recursive watches at some point but never got > around to it. So I guess something like this could work recursively: > watchglob =3D maildir:$REDACTED/$LISTNAME/** >=20 > > I tried to locate a single place in the codebase where this would be > > applied, but it's not clear enough to me if there's a single place that > > it can easily modified. >=20 > The `new' sub in lib/PublicInbox/Watch.pm sets up maildirs/imap/nntp >=20 > The glob2re function is better nowadays in public-inbox.git, > and the mdre regexp will probably needs to be updated when it sees > a new maildir... Thanks. I'd want to explicitly scope the glob to the dates. The spam processing has been to move spam to .spam.YYYYMM. > > If there's a consistent place, I think the cleanest syntax that doesn't > > break existing consumers would be something like this: > > [publicinbox "$LISTNAME"] > > watch =3D maildirglob:$REDACTED/$LISTNAME/.19????/ > > watch =3D maildirglob:$REDACTED/$LISTNAME/.20????/ >=20 > I think `watchglob =3D maildir:...' is preferable since I don't > want maildirglob: to be confused as a type. Agreed, I see concerns there. --=20 Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 --u6/TeCT0z+c2Voeo Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it. iQKTBAABCgB9FiEEveu2pS8Vb98xaNkRGTlfI8WIJsQFAmVapXlfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEJE RUJCNkE1MkYxNTZGREYzMTY4RDkxMTE5Mzk1RjIzQzU4ODI2QzQACgkQGTlfI8WI JsTabRAAlrkERO2Gosyj0OKvIoqTvYYaLfvwzM5QydBP42z74RMY6DPm15NMOK/U KOlOLl2MNAroBs01sliHKCyf2OoleMlXo5eZIuwqFSLR2vO9iN+dsTi3ohOtA/fo XXgO3vKB0IdfcJDCUzLDkvcoZzc2P9eA+noZVi0QDiSRs+2YD2uFBQm+7fcwWbgd VdCKVGOvRdMbuPuEupw02Z2yQhWpU03+hgkjtrQT+eBitdDBbbbIpbJwCQa3uJ8t 3JF1EFtm3O3zyUkWXMn9Ow4PDZoxA4QmrEF65sJdBdf5dAZKBsMWIaRRtfpulm7r 3lg6uLF56Y0hLLgcElMEsGEcHCRJWqKwcXJN9tulavHUl1VGwdQHwSsyydYF7IlJ uQYCUeTKv3rS/VpgpE3ErHNDXvGJL2XJBLCh7vcLudBZnTp7QdpvGKZ9vRjZUHgJ IvjVkgCj6GDN8iGv68tn9+umQRSlKzfC6UOJ03SvMbZn5JQ8Dz4mqrQ3G5i4fAHA BmHGKbFdKiPwyX1u8+sRWGnakFkj7XeRicYxqWN5Qrf4R26qqBKCqSAjvIhBCGjj Kf062OKsoKh1yiOQtlytuMma4JgzOEJU1Kxjp0ElArysY2DpoF68z9DrSg8KVzbL R7KF16WWI8IqBzDoCkyebfSCpj/dU9UpybAdh7svOFRUnY6n7hw= =KA0t -----END PGP SIGNATURE----- --u6/TeCT0z+c2Voeo--