unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* Showcasing lei at Linux Plumbers
@ 2021-09-02 21:12 Konstantin Ryabitsev
  2021-09-02 21:58 ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-02 21:12 UTC (permalink / raw)
  To: meta

Eric:

I am getting ready for my presentation to the Linux Plumbers (happening in a
few weeks, eek), which is based around lore, lei (I see what you did there)
and search-based subscriptions. I want to make it hands-on with practical
examples, which is what developers would appreciate more than just dry
manpages.

I am in the process of wrapping my head around lei tooling, but I may have
some questions in the process, so I wanted to start this thread as a record of
my poking at it. :)

What I currently have:

- an imap mailbox 
- lei configured and installed locally (in a debian container)

The goal is to illustrate how to use this to start "receiving" mail for a
subsystem without subscribing to any of the lists. The example I have in mind
is the LANDLOCK subsystem, and the reason I picked it is because it already
has a well-defined set of search criteria we can use:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS#n10462

    LANDLOCK SECURITY MODULE
    ...
    F:  Documentation/security/landlock.rst
    F:  Documentation/userspace-api/landlock.rst
    F:  include/uapi/linux/landlock.h
    F:  samples/landlock/
    F:  security/landlock/
    F:  tools/testing/selftests/landlock/
    K:  landlock
    K:  LANDLOCK

This means we want to configure lei to grab any mail from lore.kernel.org/all/
that matches this query:

    dfn:Documentation/security/landlock.rst OR
    dfn:Documentation/userspace-api/landlock.rst OR
    dfn:include/uapi/linux/landlock.h OR
    dfn:samples/landlock/ OR
    dfn:security/landlock/ OR
    dfn:tools/testing/selftests/landlock/ OR
    dfhh:landlock

https://lore.kernel.org/all/?q=dfn%3ADocumentation%2Fsecurity%2Flandlock.rst+OR+dfn%3ADocumentation%2Fuserspace-api%2Flandlock.rst+OR+dfn%3Ainclude%2Fuapi%2Flinux%2Flandlock.h+OR+dfn%3Asamples%2Flandlock%2F+OR+dfn%3Asecurity%2Flandlock%2F+OR+dfn%3Atools%2Ftesting%2Fselftests%2Flandlock%2F+OR+dfhh%3Alandlock

I'll want to retrieve any threads and follow-ups and upload them to my imap
landlock folder -- and then run in the background and just continuously update
things as more mail comes in, so I don't have to remember to run anything
manually.

What succession of lei commands would accomplish this?

Thanks for your continued help.

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-02 21:12 Showcasing lei at Linux Plumbers Konstantin Ryabitsev
@ 2021-09-02 21:58 ` Eric Wong
  2021-09-03 15:15   ` Konstantin Ryabitsev
  2021-09-07 21:33   ` Showcasing lei at Linux Plumbers Konstantin Ryabitsev
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Wong @ 2021-09-02 21:58 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Eric:
> 
> I am getting ready for my presentation to the Linux Plumbers (happening in a
> few weeks, eek), which is based around lore, lei (I see what you did there)
> and search-based subscriptions. I want to make it hands-on with practical
> examples, which is what developers would appreciate more than just dry
> manpages.
> 
> I am in the process of wrapping my head around lei tooling, but I may have
> some questions in the process, so I wanted to start this thread as a record of
> my poking at it. :)

Yeah, I'm still trying to figure out how some things are
supposed to work myself...

> What I currently have:
> 
> - an imap mailbox 
> - lei configured and installed locally (in a debian container)

Fwiw, most of the functionality works much better with Maildir
because of potential password prompts needed for IMAP and
interactivity required.

> The goal is to illustrate how to use this to start "receiving" mail for a
> subsystem without subscribing to any of the lists. The example I have in mind
> is the LANDLOCK subsystem, and the reason I picked it is because it already
> has a well-defined set of search criteria we can use:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/MAINTAINERS#n10462
> 
>     LANDLOCK SECURITY MODULE
>     ...
>     F:  Documentation/security/landlock.rst
>     F:  Documentation/userspace-api/landlock.rst
>     F:  include/uapi/linux/landlock.h
>     F:  samples/landlock/
>     F:  security/landlock/
>     F:  tools/testing/selftests/landlock/
>     K:  landlock
>     K:  LANDLOCK
> 
> This means we want to configure lei to grab any mail from lore.kernel.org/all/
> that matches this query:
> 
>     dfn:Documentation/security/landlock.rst OR
>     dfn:Documentation/userspace-api/landlock.rst OR
>     dfn:include/uapi/linux/landlock.h OR
>     dfn:samples/landlock/ OR
>     dfn:security/landlock/ OR
>     dfn:tools/testing/selftests/landlock/ OR
>     dfhh:landlock
> 
> https://lore.kernel.org/all/?q=dfn%3ADocumentation%2Fsecurity%2Flandlock.rst+OR+dfn%3ADocumentation%2Fuserspace-api%2Flandlock.rst+OR+dfn%3Ainclude%2Fuapi%2Flinux%2Flandlock.h+OR+dfn%3Asamples%2Flandlock%2F+OR+dfn%3Asecurity%2Flandlock%2F+OR+dfn%3Atools%2Ftesting%2Fselftests%2Flandlock%2F+OR+dfhh%3Alandlock

For HTTP(S)-based queries, I would add rt: (received-time)
around the whole thing and maybe use "lei edit-search" to tweak
for subsequent runs.  Not sure if the rt: handling should be
automatic for HTTP(S) (local Xapian searches track max docid, instead)

> I'll want to retrieve any threads and follow-ups and upload them to my imap
> landlock folder -- and then run in the background and just continuously update
> things as more mail comes in, so I don't have to remember to run anything
> manually.
> 
> What succession of lei commands would accomplish this?

OK, there's two main commands, "lei q" and "lei up".
Both of which may prompt for passwords depending on how
git-credential is set up:

	# the destination, could be Maildir
	MFOLDER=imaps://user@example.com/INBOX.landlock

	# initial search:
	lei q -o $MFOLDER -t -I https://lore.kernel.org/all/ --stdin <<EOF
	(
		dfn:Documentation/security/landlock.rst OR
		dfn:Documentation/userspace-api/landlock.rst OR
		dfn:include/uapi/linux/landlock.h OR
		dfn:samples/landlock/ OR
		dfn:security/landlock/ OR
		dfn:tools/testing/selftests/landlock/ OR
		dfhh:landlock
	) AND rt:2.months.ago..
	EOF

	# update whenever, may prompt for IMAP password, but could be
	# cron-ed or similar if passwords are cached
	lei up $MFOLDER

	# Optional: tweaking the search parameters can be done via
	lei edit-search $MFOLDER

For Maildirs, "lei up --all=local" works as it should.

"lei up --all" and "lei up --all=remote" don't work, yet,
because prompting for multiple IMAP folders (with potentially
different accounts) can get a bit complicated.  But
"lei up $ONE_IMAP_FOLDER" already works.

> Thanks for your continued help.

No problem, thanks for your patience since everything seems
overwhelming :<

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-02 21:58 ` Eric Wong
@ 2021-09-03 15:15   ` Konstantin Ryabitsev
  2021-09-04 21:36     ` [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails Eric Wong
  2021-09-07 21:33   ` Showcasing lei at Linux Plumbers Konstantin Ryabitsev
  1 sibling, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-03 15:15 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Sep 02, 2021 at 09:58:50PM +0000, Eric Wong wrote:
> Fwiw, most of the functionality works much better with Maildir
> because of potential password prompts needed for IMAP and
> interactivity required.

Okay, I'll try this out with maildir for now -- it's easy to hook mbsync into
the process if desired.

> OK, there's two main commands, "lei q" and "lei up".
> Both of which may prompt for passwords depending on how
> git-credential is set up:
> 
> 	# the destination, could be Maildir
> 	MFOLDER=imaps://user@example.com/INBOX.landlock
> 
> 	# initial search:
> 	lei q -o $MFOLDER -t -I https://lore.kernel.org/all/ --stdin <<EOF
> 	(
> 		dfn:Documentation/security/landlock.rst OR
> 		dfn:Documentation/userspace-api/landlock.rst OR
> 		dfn:include/uapi/linux/landlock.h OR
> 		dfn:samples/landlock/ OR
> 		dfn:security/landlock/ OR
> 		dfn:tools/testing/selftests/landlock/ OR
> 		dfhh:landlock
> 	) AND rt:2.months.ago..
> 	EOF
> 
> 	# update whenever, may prompt for IMAP password, but could be
> 	# cron-ed or similar if passwords are cached
> 	lei up $MFOLDER
> 
> 	# Optional: tweaking the search parameters can be done via
> 	lei edit-search $MFOLDER

Yep, that seems to work fine. Question -- I noticed that lei just issues a
regular query, retrieves results with curl and then parses the output. Is
there a danger of potentially running into issues with parsing the regular
HTML output if it changes in the future?

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails
  2021-09-03 15:15   ` Konstantin Ryabitsev
@ 2021-09-04 21:36     ` Eric Wong
  2021-09-07 18:17       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2021-09-04 21:36 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Yep, that seems to work fine. Question -- I noticed that lei just issues a
> regular query, retrieves results with curl and then parses the output. Is
> there a danger of potentially running into issues with parsing the regular
> HTML output if it changes in the future?

It's actually parsing gzipped mboxrd (&x=m).  But you're right
we could use stronger safeguards in case we see gzipped HTML or
something else...

----------8<---------
Subject: [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails

We may be handling invalid mboxes, so just return no objects in
that case.  While "lei q" on HTTP(S) externals expects a gzipped
mboxrd, there's always a chance something else gzipped can be
sent to us.

There's also changes to lei_to_mail to better handle emails
which lack a body and/or headers (e.g. t/solve/bare.patch)

Link: https://public-inbox.org/meta/20210903151500.h72mzcpqixgtytjs@meerkat.local/
---
 lib/PublicInbox/Eml.pm        |  8 ++++++++
 lib/PublicInbox/LeiToMail.pm  | 21 +++++++--------------
 lib/PublicInbox/MboxReader.pm |  3 ++-
 t/mbox_reader.t               | 23 +++++++++++++++++++++++
 4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 955d6a96..0867a016 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -480,6 +480,14 @@ sub charset_set {
 
 sub crlf { $_[0]->{crlf} // "\n" }
 
+sub raw_size {
+	my ($self) = @_;
+	my $len = length(${$self->{hdr}});
+	defined($self->{bdy}) and
+		$len += length(${$self->{bdy}}) + length($self->{crlf});
+	$len;
+}
+
 # warnings to ignore when handling spam mailboxes and maybe other places
 sub warn_ignore {
 	my $s = "@_";
diff --git a/lib/PublicInbox/LeiToMail.pm b/lib/PublicInbox/LeiToMail.pm
index 6e102a1d..1221d3c7 100644
--- a/lib/PublicInbox/LeiToMail.pm
+++ b/lib/PublicInbox/LeiToMail.pm
@@ -109,32 +109,25 @@ sub _mboxcl_common ($$$) {
 	$$buf .= 'Content-Length: '.length($$bdy).$crlf.
 		'Lines: '.$lines.$crlf.$crlf;
 	substr($$bdy, 0, 0, $$buf); # prepend header
-	$_[0] = $bdy;
+	$$bdy .= $crlf;
+	$bdy;
 }
 
 # mboxcl still escapes "From " lines
 sub eml2mboxcl {
 	my ($eml, $smsg) = @_;
 	my $buf = _mbox_hdr_buf($eml, 'mboxcl', $smsg);
-	my $crlf = $eml->{crlf};
-	if (my $bdy = delete $eml->{bdy}) {
-		$$bdy =~ s/^From />From /gm;
-		_mboxcl_common($buf, $bdy, $crlf);
-	}
-	$$buf .= $crlf;
-	$buf;
+	my $bdy = delete($eml->{bdy}) // \(my $empty = '');
+	$$bdy =~ s/^From />From /gm;
+	_mboxcl_common($buf, $bdy, $eml->{crlf});
 }
 
 # mboxcl2 has no "From " escaping
 sub eml2mboxcl2 {
 	my ($eml, $smsg) = @_;
 	my $buf = _mbox_hdr_buf($eml, 'mboxcl2', $smsg);
-	my $crlf = $eml->{crlf};
-	if (my $bdy = delete $eml->{bdy}) {
-		_mboxcl_common($buf, $bdy, $crlf);
-	}
-	$$buf .= $crlf;
-	$buf;
+	my $bdy = delete($eml->{bdy}) // \(my $empty = '');
+	_mboxcl_common($buf, $bdy, $eml->{crlf});
 }
 
 sub git_to_mail { # git->cat_async callback
diff --git a/lib/PublicInbox/MboxReader.pm b/lib/PublicInbox/MboxReader.pm
index 9291f00b..5a754cb8 100644
--- a/lib/PublicInbox/MboxReader.pm
+++ b/lib/PublicInbox/MboxReader.pm
@@ -41,7 +41,7 @@ sub _mbox_from {
 			$raw =~ s/^\r?\n\z//ms;
 			$raw =~ s/$from_re/$1/gms;
 			my $eml = PublicInbox::Eml->new(\$raw);
-			$eml_cb->($eml, @arg);
+			$eml_cb->($eml, @arg) if $eml->raw_size;
 		}
 		return if $r == 0; # EOF
 	}
@@ -96,6 +96,7 @@ sub _mbox_cl ($$$;@) {
 			$$hdr =~ s/\A[\r\n]*From [^\n]*\n//s or
 				die "E: no 'From ' line in:\n", Dumper($hdr);
 			my $eml = PublicInbox::Eml->new($hdr);
+			next unless $eml->raw_size;
 			my @cl = $eml->header_raw('Content-Length');
 			my $n = scalar(@cl);
 			$n == 0 and die "E: Content-Length missing in:\n",
diff --git a/t/mbox_reader.t b/t/mbox_reader.t
index da0ce7f1..e5f57d7b 100644
--- a/t/mbox_reader.t
+++ b/t/mbox_reader.t
@@ -71,6 +71,12 @@ my $check_fmt = sub {
 				"Content-Length is correct $fmt $cur");
 			# clobber for ->as_string comparison below
 			$eml->header_set('Content-Length');
+
+			# special case for t/solve/bare.patch, not sure if we
+			# should even handle it...
+			if ($cl[0] eq '0' && ${$eml->{hdr}} eq '') {
+				delete $eml->{bdy};
+			}
 		} else {
 			is(scalar(@cl), 0, "Content-Length unset $fmt $cur");
 		}
@@ -121,4 +127,21 @@ exit 1
 	is(scalar(grep(/Final/, @x)), 0, 'no incomplete bit');
 }
 
+{
+	my $html = <<EOM;
+<html><head><title>hi,</title></head><body>how are you</body></html>
+EOM
+	for my $m (qw(mboxrd mboxcl mboxcl2 mboxo)) {
+		my (@w, @x);
+		local $SIG{__WARN__} = sub { push @w, @_ };
+		open my $fh, '<', \$html or xbail 'PerlIO::scalar';
+		PublicInbox::MboxReader->$m($fh, sub {
+			push @x, $_[0]->as_string
+		});
+		is_deeply(\@x, [], "messages in invalid $m");
+		is_deeply([grep(!/^W: leftover/, @w)], [],
+			"no extra warnings besides leftover ($m)");
+	}
+}
+
 done_testing;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails
  2021-09-04 21:36     ` [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails Eric Wong
@ 2021-09-07 18:17       ` Konstantin Ryabitsev
  2021-09-07 20:56         ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-07 18:17 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Sat, Sep 04, 2021 at 09:36:58PM +0000, Eric Wong wrote:
> Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > Yep, that seems to work fine. Question -- I noticed that lei just issues a
> > regular query, retrieves results with curl and then parses the output. Is
> > there a danger of potentially running into issues with parsing the regular
> > HTML output if it changes in the future?
> 
> It's actually parsing gzipped mboxrd (&x=m).  But you're right
> we could use stronger safeguards in case we see gzipped HTML or
> something else...

Ooh, okay, I guess I should actually look at the output of the curl call. :)
The questions I have, then:

1. this means that each "lei up" call will be increasingly larger and larger,
   since when we init the search with rt:, it gets resolved into a datestamp
   (e.g. rt:2.weeks.ago becomes rt:1625699031). I'm worried that this will be
   increasingly hard on the server side, especially if someone
   fires-and-forgets a cronjob that ends up downloading ever-growing mboxes
   every 5 minutes.
2. is there some sanity limit on the server side that would prevent someone's
   overly broad search query from gzipping and downloading gigabytes of mail?

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails
  2021-09-07 18:17       ` Konstantin Ryabitsev
@ 2021-09-07 20:56         ` Eric Wong
  2021-09-07 21:20           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2021-09-07 20:56 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Sat, Sep 04, 2021 at 09:36:58PM +0000, Eric Wong wrote:
> > Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> > > Yep, that seems to work fine. Question -- I noticed that lei just issues a
> > > regular query, retrieves results with curl and then parses the output. Is
> > > there a danger of potentially running into issues with parsing the regular
> > > HTML output if it changes in the future?
> > 
> > It's actually parsing gzipped mboxrd (&x=m).  But you're right
> > we could use stronger safeguards in case we see gzipped HTML or
> > something else...
> 
> Ooh, okay, I guess I should actually look at the output of the curl call. :)
> The questions I have, then:
> 
> 1. this means that each "lei up" call will be increasingly larger and larger,
>    since when we init the search with rt:, it gets resolved into a datestamp
>    (e.g. rt:2.weeks.ago becomes rt:1625699031). I'm worried that this will be
>    increasingly hard on the server side, especially if someone
>    fires-and-forgets a cronjob that ends up downloading ever-growing mboxes
>    every 5 minutes.

"rt:2.weeks.ago" stays "rt:2.weeks.ago" in saved searches :>

It was one of my primary annoyances when I initially implemented
this and commit 2e4e4b0d6f30d9d4612066395ba694c7c7d61e6e solved it.
https://public-inbox.org/meta/20210416231035.31807-2-e@80x24.org/
("lei q: --save preserves relative time queries")

> 2. is there some sanity limit on the server side that would prevent someone's
>    overly broad search query from gzipping and downloading gigabytes of mail?

Not right now.  With public-inbox-httpd, the actual git fetches
are handled fairly w.r.t to other requests (and I could
deprioritize them further, if needed...).  The Xapian query OTOH...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails
  2021-09-07 20:56         ` Eric Wong
@ 2021-09-07 21:20           ` Konstantin Ryabitsev
  2021-09-07 22:22             ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-07 21:20 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, Sep 07, 2021 at 08:56:17PM +0000, Eric Wong wrote:
> > 1. this means that each "lei up" call will be increasingly larger and larger,
> >    since when we init the search with rt:, it gets resolved into a datestamp
> >    (e.g. rt:2.weeks.ago becomes rt:1625699031). I'm worried that this will be
> >    increasingly hard on the server side, especially if someone
> >    fires-and-forgets a cronjob that ends up downloading ever-growing mboxes
> >    every 5 minutes.
> 
> "rt:2.weeks.ago" stays "rt:2.weeks.ago" in saved searches :>

Oh, you're right. Apologies for not digging deeper.

> > 2. is there some sanity limit on the server side that would prevent someone's
> >    overly broad search query from gzipping and downloading gigabytes of mail?
> 
> Not right now.  With public-inbox-httpd, the actual git fetches
> are handled fairly w.r.t to other requests (and I could
> deprioritize them further, if needed...).  The Xapian query OTOH...

Okay, I guess it's not any different from someone doing the same thing over
the web interface. It would be nice to have a way to limit how many messages
are returned for gzipped mailbox downloads, seeing as they cannot be paginated
in the same way web views are, but it's not a priority right away.

Thanks,
-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-02 21:58 ` Eric Wong
  2021-09-03 15:15   ` Konstantin Ryabitsev
@ 2021-09-07 21:33   ` Konstantin Ryabitsev
  2021-09-07 22:14     ` Eric Wong
  1 sibling, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-07 21:33 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Sep 02, 2021 at 09:58:50PM +0000, Eric Wong wrote:
> OK, there's two main commands, "lei q" and "lei up".
> Both of which may prompt for passwords depending on how
> git-credential is set up:
> 
> 	# the destination, could be Maildir
> 	MFOLDER=imaps://user@example.com/INBOX.landlock
> 
> 	# initial search:
> 	lei q -o $MFOLDER -t -I https://lore.kernel.org/all/ --stdin <<EOF
> 	(
> 		dfn:Documentation/security/landlock.rst OR
> 		dfn:Documentation/userspace-api/landlock.rst OR
> 		dfn:include/uapi/linux/landlock.h OR
> 		dfn:samples/landlock/ OR
> 		dfn:security/landlock/ OR
> 		dfn:tools/testing/selftests/landlock/ OR
> 		dfhh:landlock
> 	) AND rt:2.months.ago..
> 	EOF
> 
> 	# update whenever, may prompt for IMAP password, but could be
> 	# cron-ed or similar if passwords are cached
> 	lei up $MFOLDER
> 
> 	# Optional: tweaking the search parameters can be done via
> 	lei edit-search $MFOLDER

If I had a local mirror with extindex and I wanted to do the same thing, would
I just modify the -I flag to point at the extindex location? One of the
options I want to investigate is making IMAP/POP3 accessible individual
mailboxes fed by lei, such that a new subsystem maintainer could have a
ready-made mailbox available to them without needing to subscribe/unsubscribe
to a bunch of mailing lists. (This would be different from read-only imap
mailboxes offered by public-inbox-imapd, since we'll be tracking individual
message state. The POP3 bit would allow them to plug it into something like
Gmail which allows sucking down remote POPs.)

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-07 21:33   ` Showcasing lei at Linux Plumbers Konstantin Ryabitsev
@ 2021-09-07 22:14     ` Eric Wong
  2021-09-08 13:36       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2021-09-07 22:14 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Sep 02, 2021 at 09:58:50PM +0000, Eric Wong wrote:
> > 	# the destination, could be Maildir
> > 	MFOLDER=imaps://user@example.com/INBOX.landlock
> > 
> > 	# initial search:
> > 	lei q -o $MFOLDER -t -I https://lore.kernel.org/all/ --stdin <<EOF
> 
> If I had a local mirror with extindex and I wanted to do the same thing, would
> I just modify the -I flag to point at the extindex location?

Yes.  For local stuff that's permanently mounted, I tend to do
"lei add-external $PATHNAME" so it's included by default.

> One of the
> options I want to investigate is making IMAP/POP3 accessible individual
> mailboxes fed by lei, such that a new subsystem maintainer could have a
> ready-made mailbox available to them without needing to subscribe/unsubscribe
> to a bunch of mailing lists. (This would be different from read-only imap
> mailboxes offered by public-inbox-imapd, since we'll be tracking individual
> message state. The POP3 bit would allow them to plug it into something like
> Gmail which allows sucking down remote POPs.)

I think using the "-o v2:..." option for now would be the way to
go for making a v2 inbox available via -imapd (and it'll get
JMAP/POP3 support in the future).

We don't have POP3 support in client nor server form, yet.  Not
sure how account/state management would work, nor how to
prioritize it vs JMAP support.  I'm thinking POP3 takes priority
since there's more clients for it...

Existing POP3 servers would work, too; since lei can output
to Maildir/mbox* which can work with them.


On a side note, I'm not aware of IMAP sync tools accounting for
read-only IMAP servers well, since they attempt bidirectional
sync.  "lei import" seems alright in that regard, treating IMAP
the same way it will (eventually) treat POP3.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails
  2021-09-07 21:20           ` Konstantin Ryabitsev
@ 2021-09-07 22:22             ` Eric Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2021-09-07 22:22 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Okay, I guess it's not any different from someone doing the same thing over
> the web interface. It would be nice to have a way to limit how many messages
> are returned for gzipped mailbox downloads, seeing as they cannot be paginated
> in the same way web views are, but it's not a priority right away.

I'm thinking pagination would cause unnecessary hardship for
legitimate users.

The mbox.gz streaming doesn't hurt -httpd any more than
aggressive bots do.  HTML pagination is mainly needed to avoid
performance problems on the client/rendering side.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-07 22:14     ` Eric Wong
@ 2021-09-08 13:36       ` Konstantin Ryabitsev
  2021-09-08 14:49         ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-08 13:36 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Tue, Sep 07, 2021 at 10:14:04PM +0000, Eric Wong wrote:
> > One of the
> > options I want to investigate is making IMAP/POP3 accessible individual
> > mailboxes fed by lei, such that a new subsystem maintainer could have a
> > ready-made mailbox available to them without needing to subscribe/unsubscribe
> > to a bunch of mailing lists. (This would be different from read-only imap
> > mailboxes offered by public-inbox-imapd, since we'll be tracking individual
> > message state. The POP3 bit would allow them to plug it into something like
> > Gmail which allows sucking down remote POPs.)
> 
> I think using the "-o v2:..." option for now would be the way to
> go for making a v2 inbox available via -imapd (and it'll get
> JMAP/POP3 support in the future).

I'm worried that read-only imap folders are going to cause problems for dumber
imap clients, including mbsync. My goal is to make it easy for folks to use
existing tools to which they are already accustomed, since my experience is
that if the learning curve is too steep or requires too much fiddling to
configure, the uptake is going to be extremely limited.

On the other hand, a service that offers full search-based imap/pop3 folders
is going to be an easy sell:

- it works with any imap client as a simple extra account
- it can be mirrored locally and synced two-ways via mbsync
- it can be incorporated into existing services like gmail, so people can
  monitor things on the go
- I can do clever things like suspend "lei up" runs if there was no access to
  the folder for over N weeks
- we can use FS dedupe features since all messages are going to be
  identical after writing them out to maildirs

The slightly harder part is making it easy for people to configure their
search parameters, but I'm hoping to expose this via a git repo.

I'm not implementing this right away, but I'm going to float this idea at
plumbers to see what the reception is going to be. I believe this will be of
interest to many devs, since this would allow them to no longer depend on
their corporate mail servers and their mail-mangling ways.

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-08 13:36       ` Konstantin Ryabitsev
@ 2021-09-08 14:49         ` Eric Wong
  2021-09-08 17:17           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Wong @ 2021-09-08 14:49 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Sep 07, 2021 at 10:14:04PM +0000, Eric Wong wrote:
> > > One of the
> > > options I want to investigate is making IMAP/POP3 accessible individual
> > > mailboxes fed by lei, such that a new subsystem maintainer could have a
> > > ready-made mailbox available to them without needing to subscribe/unsubscribe
> > > to a bunch of mailing lists. (This would be different from read-only imap
> > > mailboxes offered by public-inbox-imapd, since we'll be tracking individual
> > > message state. The POP3 bit would allow them to plug it into something like
> > > Gmail which allows sucking down remote POPs.)
> > 
> > I think using the "-o v2:..." option for now would be the way to
> > go for making a v2 inbox available via -imapd (and it'll get
> > JMAP/POP3 support in the future).
> 
> I'm worried that read-only imap folders are going to cause problems for dumber
> imap clients, including mbsync. My goal is to make it easy for folks to use
> existing tools to which they are already accustomed, since my experience is
> that if the learning curve is too steep or requires too much fiddling to
> configure, the uptake is going to be extremely limited.

Agreed with read-only IMAP being a problem for existing clients.

lei is gradual in that approach in you can pick and choose which
parts to use.  It's actually close to being able to offer
<mbsync||offlineimap> functionality, but it's a bit clumsy
usage-wise atm:

	lei import imaps://example.com/folder

	# lei <q|lcat> results dumped to Maildir
	# inotify reads Maildir keyword updates done by MUA

	lei export-kw imaps://example.com/folder

I'm working on making the "export-kw" part transparent like it
mostly is with Maildirs.

The one thing lei doesn't do right now is deleting messages
from IMAP folders (unless it's overwriting search results).
That will probably be a separate command:

	lei prune-mfolder [--expire=...]

I hope to stop using <mbsync||offlineimap> myself, soon...

> On the other hand, a service that offers full search-based imap/pop3 folders
> is going to be an easy sell:
> 
> - it works with any imap client as a simple extra account
> - it can be mirrored locally and synced two-ways via mbsync

POP3 would be significantly easier to support server-side with
multiple users since it won't need to store per-user keywords.

Since lei is a daemon and can support multiple users, it could
have an R/W JMAP||IMAP front-end, though...

> - it can be incorporated into existing services like gmail, so people can
>   monitor things on the go

POP3 seems excellent for integrating into large mail providers.
I mainly haven't gotten around to implementing it, nor figuring
out how to deal with account management...

> - I can do clever things like suspend "lei up" runs if there was no access to
>   the folder for over N weeks
> - we can use FS dedupe features since all messages are going to be
>   identical after writing them out to maildirs

I've been thinking of making lei storage accessible as Maildirs
via FUSE, as well.

> The slightly harder part is making it easy for people to configure their
> search parameters, but I'm hoping to expose this via a git repo.

*shrug*  I've been trying to keep the learning curve as low as
possible by using most of the same prefixes as mairix (lei only
adds L: and kw: for labels and keywords).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-08 14:49         ` Eric Wong
@ 2021-09-08 17:17           ` Konstantin Ryabitsev
  2021-09-08 17:32             ` Eric Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Konstantin Ryabitsev @ 2021-09-08 17:17 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Wed, Sep 08, 2021 at 02:49:48PM +0000, Eric Wong wrote:
> > On the other hand, a service that offers full search-based imap/pop3 folders
> > is going to be an easy sell:
> > 
> > - it works with any imap client as a simple extra account
> > - it can be mirrored locally and synced two-ways via mbsync
> 
> POP3 would be significantly easier to support server-side with
> multiple users since it won't need to store per-user keywords.

Okay, then perhaps I should sit on my hands for a bit. I'll showcase lei with
remote searches as a feature preview, but buffer it with the following
statements:

- We're working on making it easy to add search-based inboxes that would allow
  developers to closely match subsystem MAINTAINERS entries. In fact, we can
  probably automate the creation of such feeds by watching the MAINTAINERS
  file and automatically converting F:/X: lines into queries (not so easily
  done for K: and N: lines unless they aren't using actual regex expressions).
 
- Developers will be able to easily access these feeds via multiple ways, e.g:

  - read-only imap folders
  - pseudo mailing list subscriptions
  - nntp groups
  - pop3 mailboxes (coming in the future)

The goal is to solve the following several problems:

- remove content-mangling corporate mail gateways out of the picture
- make it unnecessary for patch submitters to know where they should send the
  patches ("just send them to patches@linux.dev").
- reduce the need for new mailing lists as new subsystems are introduced
  ("just send email to discuss@linux.dev with somekeyword: in the subject")

I think that sounds pretty reasonable and I can get most of it done by EOY.

> > - I can do clever things like suspend "lei up" runs if there was no access to
> >   the folder for over N weeks
> > - we can use FS dedupe features since all messages are going to be
> >   identical after writing them out to maildirs
> 
> I've been thinking of making lei storage accessible as Maildirs
> via FUSE, as well.

That's a pretty cool idea, actually -- would that be readonly or with full
deletes/renames support?

-K

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Showcasing lei at Linux Plumbers
  2021-09-08 17:17           ` Konstantin Ryabitsev
@ 2021-09-08 17:32             ` Eric Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Wong @ 2021-09-08 17:32 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Wed, Sep 08, 2021 at 02:49:48PM +0000, Eric Wong wrote:
> > I've been thinking of making lei storage accessible as Maildirs
> > via FUSE, as well.
> 
> That's a pretty cool idea, actually -- would that be readonly or with full
> deletes/renames support?

Renames for sure.  Likely deletes, at least on a per-label basis.
Haven't thought much about deletes/purge...

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-08 17:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-02 21:12 Showcasing lei at Linux Plumbers Konstantin Ryabitsev
2021-09-02 21:58 ` Eric Wong
2021-09-03 15:15   ` Konstantin Ryabitsev
2021-09-04 21:36     ` [PATCH] lei_to_mail+mbox_reader: fix handling of empty/bogus emails Eric Wong
2021-09-07 18:17       ` Konstantin Ryabitsev
2021-09-07 20:56         ` Eric Wong
2021-09-07 21:20           ` Konstantin Ryabitsev
2021-09-07 22:22             ` Eric Wong
2021-09-07 21:33   ` Showcasing lei at Linux Plumbers Konstantin Ryabitsev
2021-09-07 22:14     ` Eric Wong
2021-09-08 13:36       ` Konstantin Ryabitsev
2021-09-08 14:49         ` Eric Wong
2021-09-08 17:17           ` Konstantin Ryabitsev
2021-09-08 17:32             ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).