unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* more debugging for gentoo usage & supporting feature requests
@ 2024-07-22  1:27 Robin H. Johnson
  2024-07-22  6:23 ` more debugging for gentoo usage & supporting feature requests - infinite loop on disk full Robin H. Johnson
  2024-07-22 19:40 ` more debugging for gentoo usage & supporting feature requests Eric Wong
  0 siblings, 2 replies; 5+ messages in thread
From: Robin H. Johnson @ 2024-07-22  1:27 UTC (permalink / raw)
  To: meta

[-- Attachment #1: Type: text/plain, Size: 2425 bytes --]

Hi,

I moved the Gentoo instance to a much beefier machine & newer kernel,
ingest is a lot faster; but there's still some hiccups.

1. Request for more debugging details about mails: Seems that many of
our oldest mails don't get ingested - and there's no output about why.
I don't know if -watch actually scanned that folder or not.

1.1. Possibly related:
Intended config is that the mail should be ingested regardless of the
email address on the headers. Way back in time, the Gentoo lists were
renamed a few times, and the files are sorted into the correct folders.
I think this impacted any attempted ingest via -mda because there's no
other way to override what list a given mail on stdin should be
associated with.

The headers may be inconsistent, changed style, name, or even be absent
in a few cases.

2.
What's the intended way for public-inbox-mda to function with no
SpamAssassin installed at all? "spamcheck = " doesn't seem to do it.

3.
As a formal feature request:
Change the arguments of: public-inbox-watch 
- Add --all to mean all lists in the config
- no arguments => implicit --all
- $LISTNAME/$INBOXPATH => one *OR* more inboxes manually specified.

I did a hacky split of the configuration for Gentoo, and things are a
LOT more stable with 120 instances; but it's a little wasteful: I'd like
to give the high-traffic lists their own instance, and group the
low-traffic instances together.

Downside of my hacky code is that I have 120 processes that just say
"/usr/bin/public-inbox-watch", and I have to be creative to see which
list a given process is linked to.

4.
Make public-inbox-init NOT attempt to write to any configuration files.

Trying to implement segregation of roles:
- config files owned by root only; readable by public-inbox users.
- source maildirs read-only to user running public-inbox-watch
- public-inbox dirs writable to user running public-inbox-watch
- public-inbox dirs readable to user running public-inbox-httpd

Intent:
- public-inbox-httpd CANNOT read or write the source maildirs.
- public-inbox-watch CANNOT write the source maildirs.
- neither process can write the config file

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more debugging for gentoo usage & supporting feature requests - infinite loop on disk full
  2024-07-22  1:27 more debugging for gentoo usage & supporting feature requests Robin H. Johnson
@ 2024-07-22  6:23 ` Robin H. Johnson
  2024-07-22 19:44   ` Eric Wong
  2024-07-23 21:30   ` Eric Wong
  2024-07-22 19:40 ` more debugging for gentoo usage & supporting feature requests Eric Wong
  1 sibling, 2 replies; 5+ messages in thread
From: Robin H. Johnson @ 2024-07-22  6:23 UTC (permalink / raw)
  To: meta

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

On Mon, Jul 22, 2024 at 01:27:41AM +0000, Robin H. Johnson wrote:
> Hi,
> 
> I moved the Gentoo instance to a much beefier machine & newer kernel,
> ingest is a lot faster; but there's still some hiccups.
Another weird case I ran into... infinite loops when the destination for the
$INBOX_DIR got full, and starting spamming the logs heavily (leading to
/var/log also getting full).

```
$ f=/var/www/archives.gentoo.org/.maildir/.gentoo-dev/.200209/cur/1032012623.011914.mbox\:2\,\:2\,S
$ PI_CONFIG=/etc/public-inbox/config \
ORIGINAL_RECIPIENT=gentoo-dev@lists.gentoo.org \
public-inbox-mda --no-precheck <$f
...
using random Message-ID <20020914141022.8EILh0MRbQpdjOwovs1hfDrKNO9nmyvsLZh_8lkm3Go@z> as fallback
using random Message-ID <20020914141022.VX0VHB0f3Miq3hvnAl17lMRVALJ4oQNp9dgfpsljr2k@z> as fallback
using random Message-ID <20020914141022.ZGTncZHNC4jy3Yvj9iKCguTYRvwikTj5R6CWgPHvDj4@z> as fallback
using random Message-ID <20020914141022.RzOBkiht44TmjOYhRZydnv8QyUD1aAegYFXbQJpXMZA@z> as fallback
using random Message-ID <20020914141022.aieLHBsuflGLngvMmnlj0PnUAm7kg5OyuvWQ2cY8bgs@z> as fallback
using random Message-ID <20020914141022.PoYIWpcZ7JIP_hlF1Y8YwkVV8e2VjvHQJdUNmS741Bc@z> as fallback
using random Message-ID <20020914141022.DAqenKa6o_xSvrpv1sJSik1iPO0RrauMojnK6F2c0-c@z> as fallback
using random Message-ID <20020914141022.Hz5gNFKUIMWg66QgTQpBeTlcjPmuXgNF4C1lfJ3o_FY@z> as fallback
...
```

But the mail DID include a message-id:
```
$ f=/var/www/archives.gentoo.org/.maildir/.gentoo-dev/.200209/cur/1032012623.011914.mbox\:2\,\:2\,S
$ grep -i -e message-id $f
Message-ID: <Pine.LNX.4.44.0209141505570.24517-100000@kitt.york.ac.uk>
$ spamc <$f |diff $f -
1a2,5
> X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on
> finch.gentoo.org
> X-Spam-Level: 
> X-Spam-Status: No, score=0.8 required=5.0 tests=DMARC_REJECT,
>	MAILING_LIST_MULTI autolearn=no autolearn_force=no version=4.0.0
```

The -watch instance kept looping as well, printing the random Message-Id between mails.
```
# grep path: ./public-inbox-watch.gentoo-commits.stderr.log  |uniq -c | tail -n4
      1 path: /etc/public-inbox/maildir-root/.gentoo-commits/.202101/new/1609861031.6347_0.finch
      1 path: /etc/public-inbox/maildir-root/.gentoo-commits/.202309/new/1694522732.28713_0.finch
29556540 path: /etc/public-inbox/maildir-root/.gentoo-commits/.202210/new/1665418739.8290_0.finch
4785340 path: /etc/public-inbox/maildir-root/.gentoo-commits/.202407/new/1721627006.150418_0.finch
```

-- 
Robin Hugh Johnson
Pronouns   : They/he
E-Mail     : robbat2@orbis-terrarum.net

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more debugging for gentoo usage & supporting feature requests
  2024-07-22  1:27 more debugging for gentoo usage & supporting feature requests Robin H. Johnson
  2024-07-22  6:23 ` more debugging for gentoo usage & supporting feature requests - infinite loop on disk full Robin H. Johnson
@ 2024-07-22 19:40 ` Eric Wong
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-07-22 19:40 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: meta

"Robin H. Johnson" <robbat2@gentoo.org> wrote:
> Hi,
> 
> I moved the Gentoo instance to a much beefier machine & newer kernel,
> ingest is a lot faster; but there's still some hiccups.
> 
> 1. Request for more debugging details about mails: Seems that many of
> our oldest mails don't get ingested - and there's no output about why.
> I don't know if -watch actually scanned that folder or not.

I started working on it, but got sidetracked with some bugs in the
FakeInotify implementation on low-time-resolution FS :x

> 1.1. Possibly related:
> Intended config is that the mail should be ingested regardless of the
> email address on the headers. Way back in time, the Gentoo lists were
> renamed a few times, and the files are sorted into the correct folders.
> I think this impacted any attempted ingest via -mda because there's no
> other way to override what list a given mail on stdin should be
> associated with.
> 
> The headers may be inconsistent, changed style, name, or even be absent
> in a few cases.

Fwiw, there can be multiple publicinbox.*.address directives for
a given inbox.  You can also use publicinbox.*.watchheader to
match arbitrary headers (e.g. List-Id, X-BeenThere, etc...)
I think "public-inbox-ctl import" will be needed to handle
odd messages without any matching headers

> 2.
> What's the intended way for public-inbox-mda to function with no
> SpamAssassin installed at all? "spamcheck = " doesn't seem to do it.

spamcheck=none

You can also use --no-precheck to disabl some builtin rules.

> 3.
> As a formal feature request:
> Change the arguments of: public-inbox-watch 
> - Add --all to mean all lists in the config
> - no arguments => implicit --all
> - $LISTNAME/$INBOXPATH => one *OR* more inboxes manually specified.
> 
> I did a hacky split of the configuration for Gentoo, and things are a
> LOT more stable with 120 instances; but it's a little wasteful: I'd like
> to give the high-traffic lists their own instance, and group the
> low-traffic instances together.

Fwiw, the IMAP code for watch is already 1:1
process:IMAP-mailbox because of the Mail::IMAPClient API.

How about making that an option for Maildirs, too and at least get
some benefit from copy-on-write memory savings...

> 4.
> Make public-inbox-init NOT attempt to write to any configuration files.
> 
> Trying to implement segregation of roles:
> - config files owned by root only; readable by public-inbox users.

OK, it'd probably have to write a $INBOX_DIR/config.snippet.sample
file with comments, then..

> - source maildirs read-only to user running public-inbox-watch
> - public-inbox dirs writable to user running public-inbox-watch
> - public-inbox dirs readable to user running public-inbox-httpd

The last 3 has been what I've been doing since the beginning.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more debugging for gentoo usage & supporting feature requests - infinite loop on disk full
  2024-07-22  6:23 ` more debugging for gentoo usage & supporting feature requests - infinite loop on disk full Robin H. Johnson
@ 2024-07-22 19:44   ` Eric Wong
  2024-07-23 21:30   ` Eric Wong
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-07-22 19:44 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: meta

"Robin H. Johnson" <robbat2@orbis-terrarum.net> wrote:
> On Mon, Jul 22, 2024 at 01:27:41AM +0000, Robin H. Johnson wrote:
> > Hi,
> > 
> > I moved the Gentoo instance to a much beefier machine & newer kernel,
> > ingest is a lot faster; but there's still some hiccups.
> Another weird case I ran into... infinite loops when the destination for the
> $INBOX_DIR got full, and starting spamming the logs heavily (leading to
> /var/log also getting full).

Eep.  Unfortunately, none of the code has been tested under
ENOSPC conditions...  Will try to get to that first.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: more debugging for gentoo usage & supporting feature requests - infinite loop on disk full
  2024-07-22  6:23 ` more debugging for gentoo usage & supporting feature requests - infinite loop on disk full Robin H. Johnson
  2024-07-22 19:44   ` Eric Wong
@ 2024-07-23 21:30   ` Eric Wong
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2024-07-23 21:30 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: meta

"Robin H. Johnson" <robbat2@orbis-terrarum.net> wrote:
> Another weird case I ran into... infinite loops when the destination for the
> $INBOX_DIR got full, and starting spamming the logs heavily (leading to
> /var/log also getting full).

Should be fixed in 2/4 of this series (oops I forgot to Cc: :x)

https://public-inbox.org/meta/20240723212837.3931413-1-e@80x24.org/T/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-23 21:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22  1:27 more debugging for gentoo usage & supporting feature requests Robin H. Johnson
2024-07-22  6:23 ` more debugging for gentoo usage & supporting feature requests - infinite loop on disk full Robin H. Johnson
2024-07-22 19:44   ` Eric Wong
2024-07-23 21:30   ` Eric Wong
2024-07-22 19:40 ` more debugging for gentoo usage & supporting feature requests Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).