From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from smtp.gentoo.org (woodpecker.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 7212C1F572 for ; Mon, 15 Jul 2024 06:15:43 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 09128335D1C for ; Mon, 15 Jul 2024 06:15:40 +0000 (UTC) Received: from grubbs.orbis-terrarum.net (localhost [127.0.0.1]) by grubbs.orbis-terrarum.net (Postfix) with ESMTP id 41B0F260655 for ; Mon, 15 Jul 2024 06:15:39 +0000 (UTC) Received: (qmail 422342 invoked by uid 10000); 15 Jul 2024 06:15:39 -0000 Date: Mon, 15 Jul 2024 06:15:38 +0000 From: "Robin H. Johnson" To: meta@public-inbox.org Cc: infra@gentoo.org Subject: public-inbox skipping new inboxes or many mails Message-ID: Reply-To: meta@public-inbox.org, infra@gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="vjdcPNx+fd+66dTP" Content-Disposition: inline List-Id: --vjdcPNx+fd+66dTP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, After some long delays, we're trying to roll out public-inbox for Gentoo's mailing lists. This is the latest HEAD at 18f41f5af397f903898154591de2cd1cd514c920 2024/07= /07, plus the AltID patch you sent before. It has mostly been smooth so far, but have run into weirdness that it seems to not read any files for many inboxes, and for other inboxes, it has recent mail, but refuses to reindex any older mail. Even many -vvvv options give no clue why it seems to skip entire folders. Here's one of the lists where previously indexed exactly one file - a very recent one, and ignored everything else. When working on a reproduction case for you, it went down to not indexing ANY files. The strace is really interesting in that it seems to not even open or stat anything in the /var/archives path. The most frustrating variant of the output is this: $ public-inbox-index -vvvvv --reindex \ /var/public-inbox/eudev.lists.gentoo.org.git # indexing /var/public-inbox/eudev.lists.gentoo.org.git ... (Nothing about why it seemed to not scan the maildirs at all). gentoo-releng-autobuilds.lists.gentoo.org.git it indexed a single file - an= d not more. Deleting & recreating /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git make it go down from 1 file to not indexing any files. $ export PI_CONFIG=3D/etc/public-inbox/config $ public-inbox-init --indexlevel full \ --version 2 --jobs 2 \ gentoo-releng-autobuilds \ /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \ https://public-inbox.gentoo.org/gentoo-releng-autobuilds \ gentoo-releng-autobuilds@lists.gentoo.org $ grep gentoo-releng-autobuilds /etc/public-inbox/config [publicinbox "gentoo-releng-autobuilds"] address =3D gentoo-releng-autobuilds@lists.gentoo.org url =3D https://public-inbox.gentoo.org/gentoo-releng-autobuilds inboxdir =3D /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git altid =3D indexfilter:xarchiveshash:package=3DXArchivesHash watch =3D maildir:/var/archives/.maildir/.gentoo-releng-autobuilds watch =3D maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201101 watch =3D maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201102 =2E.. watch =3D maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.202406 watch =3D maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.202407 $ public-inbox-index -vvvvv --reindex /var/public-inbox/gentoo-releng-auto= builds.lists.gentoo.org.git=20 # indexing /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git = =2E.. # 0.git indexing all of b0ecbb6f63ab5505707fbba7079980c9f7fc6e51 # gentoo-releng-autobuilds.lists.gentoo.org.git 0.git counting b0ecbb6f63ab= 5505707fbba7079980c9f7fc6e51 ... # 1 # all.git 1/1 $ find /var/archives/.maildir/.gentoo-releng-autobuilds/ -type f -printf '%= h\n' |sort | uniq -c=20 14 /var/archives/.maildir/.gentoo-releng-autobuilds/.201101/cur 34 /var/archives/.maildir/.gentoo-releng-autobuilds/.201102/cur =2E.. 113 /var/archives/.maildir/.gentoo-releng-autobuilds/.202406/new 48 /var/archives/.maildir/.gentoo-releng-autobuilds/.202407/new 39 /var/archives/.maildir/.gentoo-releng-autobuilds/new $ find /var/archives/.maildir/.gentoo-releng-autobuilds/ -type f |wc -l 14146 $ sqlite3 /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git/m= sgmap.sqlite3 SQLite version 3.45.3 2024-04-15 13:34:05 Enter ".help" for usage hints. sqlite> .tables meta msgmap sqlite> select * from meta; created_at|1721012200 num_highwater|1 last_xap15-0|b0ecbb6f63ab5505707fbba7079980c9f7fc6e51 sqlite> select * from msgmap; 1|20240715052316.61817748FCA@milou.amd64.dev.gentoo.org $ strace -s 65535 -ff \ public-inbox-index -vvvvv --reindex /var/public-inbox/gentoo-releng-autob= uilds.lists.gentoo.org.git \ 2>&1 |grep -e /var/archives -e /etc/public-inbox \ |grep -v -e ' read(' -e ' write(' -e 'read resumed'=20 newfstatat(AT_FDCWD, "/var/archives/.cache/public-inbox/inline-c", 0x7fdb0b= ae2840, 0) =3D -1 ENOENT (No such file or directory) newfstatat(AT_FDCWD, "/etc/public-inbox/config", {st_mode=3DS_IFREG|0644, s= t_size=3D464073, ...}, 0) =3D 0 newfstatat(AT_FDCWD, "/etc/public-inbox/config", {st_mode=3DS_IFREG|0644, s= t_size=3D464073, ...}, 0) =3D 0 [pid 226525] execve("/usr/bin/git", ["/usr/bin/git", "config", "-z", "-l", = "--includes", "-f", "/etc/public-inbox/config"], 0x562cd071a960 /* 32 vars = */ [pid 226525] access("/var/archives/.config/git/config", R_OK) =3D -1 ENOENT= (No such file or directory) [pid 226525] access("/var/archives/.gitconfig", R_OK) =3D -1 ENOENT (No suc= h file or directory) [pid 226525] access("/var/archives/.config/git/config", R_OK) =3D -1 ENOENT= (No such file or directory) [pid 226525] access("/var/archives/.gitconfig", R_OK) =3D -1 ENOENT (No suc= h file or directory) [pid 226525] openat(AT_FDCWD, "/etc/public-inbox/config", O_RDONLY) =3D 3 --=20 Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 --vjdcPNx+fd+66dTP Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it. iQKTBAABCgB9FiEEveu2pS8Vb98xaNkRGTlfI8WIJsQFAmaUvolfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEJE RUJCNkE1MkYxNTZGREYzMTY4RDkxMTE5Mzk1RjIzQzU4ODI2QzQACgkQGTlfI8WI JsTs/A/+NGlbnTHXWNYZgfB48cMmQkdpCt360Rl6idqPuNf1h7TqJm9nxLHGwZ9p kEL0fG8BoByLHIKR6IZ0+40FjEuDYI7MdjJgLAro8RGovovMurVl8KN4ROlNGp7+ UZYR8R1LBk4ENU4NaA4NAVBf7QWzlJhg80S9Z+UApJsEW1v9tXqPZ9+uNuYyy4dS p/+IbMiDg50uuc/AywGb60Sj+BmnHKy9e5Sk9k5LKtep10IQlUunPiBG5bekDelC PPtb46FVExLN0pjTHmuD7xbJnHBrj1GevxhnhgZjZtqNAGN/2rS9M8YH8c5DjFjX LWc+IzNYjKXCnLtC4HBbdRIfS2vrfBy7pmV0JoBAHcbqsviEAucXoOXfiAahTge3 K8ZNcDPBr4IROddE+ib/wT4QZMDBvyotrD3w7Aocl7dVI8562TUJQGTFapwCl3pd /BFb9EZwfMutEH+gxbaUTS5m0HJs64huzrh2hxmoNTmiUprQw7hK4pUCYkHrgW5t G/hQBnecdF0VOFdfxD1H0nYUR525Fe8TwWdhMDqBGpRtghjB+wsx06ovGn7LUctA 9bvDhJggrA5y0rMTkhxNUak+aSyGbngmh9hcD/DygQIf5T7Htjydk8wmz+fXsmHS jPzmW0U1lydEsLopEX9ykiBK8JUvhGOmOxXfkW0ALFBB+r+VMSA= =rmuw -----END PGP SIGNATURE----- --vjdcPNx+fd+66dTP--