unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* public-inbox skipping new inboxes or many mails
@ 2024-07-15  6:15 Robin H. Johnson
  2024-07-15 21:03 ` Eric Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Robin H. Johnson @ 2024-07-15  6:15 UTC (permalink / raw)
  To: meta; +Cc: infra

[-- Attachment #1: Type: text/plain, Size: 5124 bytes --]

Hi,

After some long delays, we're trying to roll out public-inbox for
Gentoo's mailing lists.

This is the latest HEAD at 18f41f5af397f903898154591de2cd1cd514c920 2024/07/07,
plus the AltID patch you sent before.

It has mostly been smooth so far, but have run into weirdness that it
seems to not read any files for many inboxes, and for other inboxes, it
has recent mail, but refuses to reindex any older mail.

Even many -vvvv options give no clue why it seems to skip entire folders.

Here's one of the lists where previously indexed exactly one file - a
very recent one, and ignored everything else. When working on a
reproduction case for you, it went down to not indexing ANY files.

The strace is really interesting in that it seems to not even open or stat
anything in the /var/archives path.

The most frustrating variant of the output is this:
$ public-inbox-index -vvvvv --reindex \
  /var/public-inbox/eudev.lists.gentoo.org.git
# indexing /var/public-inbox/eudev.lists.gentoo.org.git ...

(Nothing about why it seemed to not scan the maildirs at all).

gentoo-releng-autobuilds.lists.gentoo.org.git it indexed a single file - and not more.

Deleting & recreating
/var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git make it
go down from 1 file to not indexing any files.

$ export PI_CONFIG=/etc/public-inbox/config

$ public-inbox-init --indexlevel full \
  --version 2 --jobs 2 \
  gentoo-releng-autobuilds \
  /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \
  https://public-inbox.gentoo.org/gentoo-releng-autobuilds \
  gentoo-releng-autobuilds@lists.gentoo.org

$ grep gentoo-releng-autobuilds /etc/public-inbox/config
[publicinbox "gentoo-releng-autobuilds"]
address = gentoo-releng-autobuilds@lists.gentoo.org
url = https://public-inbox.gentoo.org/gentoo-releng-autobuilds
inboxdir = /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git
altid = indexfilter:xarchiveshash:package=XArchivesHash
watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds
watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201101
watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.201102
...
watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.202406
watch = maildir:/var/archives/.maildir/.gentoo-releng-autobuilds/.202407

$ public-inbox-index -vvvvv  --reindex /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git 
# indexing /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git ...
# 0.git indexing all of b0ecbb6f63ab5505707fbba7079980c9f7fc6e51
# gentoo-releng-autobuilds.lists.gentoo.org.git 0.git counting b0ecbb6f63ab5505707fbba7079980c9f7fc6e51 ... # 1
# all.git  1/1

$ find /var/archives/.maildir/.gentoo-releng-autobuilds/ -type f -printf '%h\n' |sort | uniq -c 
 14 /var/archives/.maildir/.gentoo-releng-autobuilds/.201101/cur
 34 /var/archives/.maildir/.gentoo-releng-autobuilds/.201102/cur
...
113 /var/archives/.maildir/.gentoo-releng-autobuilds/.202406/new
 48 /var/archives/.maildir/.gentoo-releng-autobuilds/.202407/new
 39 /var/archives/.maildir/.gentoo-releng-autobuilds/new

$ find /var/archives/.maildir/.gentoo-releng-autobuilds/ -type f |wc -l
14146

$ sqlite3 /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git/msgmap.sqlite3
SQLite version 3.45.3 2024-04-15 13:34:05
Enter ".help" for usage hints.
sqlite> .tables
meta    msgmap
sqlite> select * from meta;
created_at|1721012200
num_highwater|1
last_xap15-0|b0ecbb6f63ab5505707fbba7079980c9f7fc6e51
sqlite> select * from msgmap;
1|20240715052316.61817748FCA@milou.amd64.dev.gentoo.org

$ strace -s 65535 -ff \
  public-inbox-index -vvvvv --reindex /var/public-inbox/gentoo-releng-autobuilds.lists.gentoo.org.git \
  2>&1 |grep -e /var/archives -e /etc/public-inbox \
 |grep -v -e ' read(' -e ' write(' -e 'read resumed' 

newfstatat(AT_FDCWD, "/var/archives/.cache/public-inbox/inline-c", 0x7fdb0bae2840, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/etc/public-inbox/config", {st_mode=S_IFREG|0644, st_size=464073, ...}, 0) = 0
newfstatat(AT_FDCWD, "/etc/public-inbox/config", {st_mode=S_IFREG|0644, st_size=464073, ...}, 0) = 0
[pid 226525] execve("/usr/bin/git", ["/usr/bin/git", "config", "-z", "-l", "--includes", "-f", "/etc/public-inbox/config"], 0x562cd071a960 /* 32 vars */ <unfinished ...>
[pid 226525] access("/var/archives/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
[pid 226525] access("/var/archives/.gitconfig", R_OK) = -1 ENOENT (No such file or directory)
[pid 226525] access("/var/archives/.config/git/config", R_OK) = -1 ENOENT (No such file or directory)
[pid 226525] access("/var/archives/.gitconfig", R_OK) = -1 ENOENT (No such file or directory)
[pid 226525] openat(AT_FDCWD, "/etc/public-inbox/config", O_RDONLY) = 3

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-12-15  0:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-15  6:15 public-inbox skipping new inboxes or many mails Robin H. Johnson
2024-07-15 21:03 ` Eric Wong
2024-07-15 21:45   ` Robin H. Johnson
2024-07-15 23:58     ` Eric Wong
2024-07-16  5:45       ` Robin H. Johnson
2024-07-16 19:05         ` Eric Wong
2024-07-17  3:04           ` Robin H. Johnson
2024-07-17 23:25             ` Eric Wong
2024-07-17 23:50               ` Eric Wong
2024-07-18  0:02               ` Robin H. Johnson
2024-12-15  0:49             ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).