I have figured out IMAP IDLE

unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed

* I have figured out IMAP IDLE
@ 2019-10-29 14:40 Eric W. Biederman
  2019-10-29 22:31 ` Eric Wong
  2020-05-13 19:31 ` Eric Wong
  0 siblings, 2 replies; 17+ messages in thread
From: Eric W. Biederman @ 2019-10-29 14:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

A few days ago I stumbled upon this magic decoder ring for IMAP.
The "Ten Commandments of How to Write an IMAP client"

https://www.washington.edu/imap/documentation/commndmt.txt

The part I was most clearly missing was that for IMAP it is better to
open multiple sockets (one per mail folder on the server) than it is to
switch between folders with the same ssl socket.

The core loop of my imap code now does:

	for (;;) {
		my $more;
		do {
			$more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity);
		} while ($more > 0);

		my @idle_data;
		do {
			my $max_idle = 15;
			$client->idle() or die("idle failed!\n");
			@idle_data = $client->idle_data($max_idle*60);
			$client->done();
		} while (scalar(@idle_data) == 0);
	}

Where all I do with idle is wait until it gives me something, and
reissue it every 15 minutes so the connection doesn't time out.

The only filter I have found useful to add is when imap idle returns
nothing to just loop instead of trying to fetch mail.  I have not found
anything in the data idle returns that would save me work in the mailbox
fetching loop.

In my limited testing firing up a bunch of ssl sockets and fetching from
the all simultaneously appears much faster, and maybe a little more
stable.  Than switching between mailboxes.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman
@ 2019-10-29 22:31 ` Eric Wong
  2019-10-29 23:12   ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong
  2019-11-03 16:28   ` I have figured out IMAP IDLE Eric W. Biederman
  2020-05-13 19:31 ` Eric Wong
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-29 22:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> A few days ago I stumbled upon this magic decoder ring for IMAP.
> The "Ten Commandments of How to Write an IMAP client"
> 
> https://www.washington.edu/imap/documentation/commndmt.txt
> 
> The part I was most clearly missing was that for IMAP it is better to
> open multiple sockets (one per mail folder on the server) than it is to
> switch between folders with the same ssl socket.
> 
> The core loop of my imap code now does:
> 
> 	for (;;) {
> 		my $more;
> 		do {
> 			$more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity);
> 		} while ($more > 0);
> 
> 		my @idle_data;
> 		do {
> 			my $max_idle = 15;
> 			$client->idle() or die("idle failed!\n");

			b) window closed

> 			@idle_data = $client->idle_data($max_idle*60);
> 			$client->done();

			a) window opens

> 		} while (scalar(@idle_data) == 0);
> 	}

It seems between a) and b), there's a small window where a message
can appear and not be noticed right away.

I think the only reliable way to do this without experiencing
delays or an opportunity for missed messages is to have two(!)
connections per folder:

1) fetcher - this connection only fetches, either at startup or
   when triggered by the idler described below.

2) idler - just runs ->idle, ->idle_data and ->done
   If ->idle times out or sees ->idle_data, it triggers
   fetcher.  The trigger should probably have a timestamp
   so the fetcher can merge overlapping triggers.

Unfortunately, using widely-available Perl IMAP client libraries
means two Perl processes for every folder.  But I suppose
fetcher can be lazy-started to not waste resources.

I wonder if WWW::Curl can do it all in one Perl process with an
event loop.  curl can issue arbitrary commands for HTTP and I've
been using it to learn some XS, so maybe it can do IDLE.

WWW::Curl is also packaged for CentOS/RHEL7, so it should not be
tough to install.

> Where all I do with idle is wait until it gives me something, and
> reissue it every 15 minutes so the connection doesn't time out.

Btw, why not 29 minutes? (accounting for 30 minute timeout).

> The only filter I have found useful to add is when imap idle returns
> nothing to just loop instead of trying to fetch mail.  I have not found
> anything in the data idle returns that would save me work in the mailbox
> fetching loop.

Yeah, I recall that being the case when I last worked with
IMAP in other places.

> In my limited testing firing up a bunch of ssl sockets and fetching from
> the all simultaneously appears much faster, and maybe a little more
> stable.  Than switching between mailboxes.

Thanks for the info!

Fwiw, I favor using stunnel for developing/testing client-side
mail tools with TLS.  It's easier to debug/trace a tool that's
talking over loopback to stunnel using tools like strace or tcpdump.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* WWW::Curl [was: I have figured out IMAP IDLE]
  2019-10-29 22:31 ` Eric Wong
@ 2019-10-29 23:12   ` Eric Wong
  2019-11-03 16:28   ` I have figured out IMAP IDLE Eric W. Biederman
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-29 23:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

Eric Wong <e@80x24.org> wrote:
> WWW::Curl is also packaged for CentOS/RHEL7, so it should not be
> tough to install.

But still a pain to build from source, if need be:

	https://bugs.debian.org/843432

And it looks like upstream's been missing for a few years and
distros have mostly kept it alive...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2019-10-29 22:31 ` Eric Wong
  2019-10-29 23:12   ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong
@ 2019-11-03 16:28   ` Eric W. Biederman
  1 sibling, 0 replies; 17+ messages in thread
From: Eric W. Biederman @ 2019-11-03 16:28 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> A few days ago I stumbled upon this magic decoder ring for IMAP.
>> The "Ten Commandments of How to Write an IMAP client"
>> 
>> https://www.washington.edu/imap/documentation/commndmt.txt
>> 
>> The part I was most clearly missing was that for IMAP it is better to
>> open multiple sockets (one per mail folder on the server) than it is to
>> switch between folders with the same ssl socket.
>> 
>> The core loop of my imap code now does:
>> 
>> 	for (;;) {
>> 		my $more;
>> 		do {
>> 			$more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity);
>> 		} while ($more > 0);
>> 
>> 		my @idle_data;
>> 		do {
>> 			my $max_idle = 15;
>> 			$client->idle() or die("idle failed!\n");
>
> 			b) window closed
>
>> 			@idle_data = $client->idle_data($max_idle*60);
>> 			$client->done();
>
> 			a) window opens
>
>> 		} while (scalar(@idle_data) == 0);
>> 	}
>
> It seems between a) and b), there's a small window where a message
> can appear and not be noticed right away.

Hmm.

Looking at the source of IMAPClient point (b) can be moved
to after "$client->done()" by reading the output "$client->done()".

About the window for unsolicited untagged responses before the
"$client->idle()" command.  Those unsolicited responses should queue
up in the imap socket until the next read.  That next read should
happen after the "$client->idle()" command.  So in practice unsolicited
untagged responses should just look like a reply to imap idle.

So the race you have noticed does not look fundamental rather it looks
like it simply takes bug fixing to make go away.

> I think the only reliable way to do this without experiencing
> delays or an opportunity for missed messages is to have two(!)
> connections per folder:

That sounds nice.  But it is also _not_ recommended.  Quoting from
the "Ten Commandments of How to Write an IMAP Client"

    1. Thou shalt not assume that it is alright to open multiple IMAP
    sessions selected on the same mailbox simultaneously, lest thou face
    the righteous wrath of mail stores that doth not permit such access.
    Instead, thou shalt labor mightily, even unto having to use thy brain
    to thinketh the matter through, such that thy client use existing
    sessions that are already open.

> 1) fetcher - this connection only fetches, either at startup or
>    when triggered by the idler described below.
>
> 2) idler - just runs ->idle, ->idle_data and ->done
>    If ->idle times out or sees ->idle_data, it triggers
>    fetcher.  The trigger should probably have a timestamp
>    so the fetcher can merge overlapping triggers.
>
> Unfortunately, using widely-available Perl IMAP client libraries
> means two Perl processes for every folder.  But I suppose
> fetcher can be lazy-started to not waste resources.

>
> I wonder if WWW::Curl can do it all in one Perl process with an
> event loop.  curl can issue arbitrary commands for HTTP and I've
> been using it to learn some XS, so maybe it can do IDLE.
>
> WWW::Curl is also packaged for CentOS/RHEL7, so it should not be
> tough to install.
>
>> Where all I do with idle is wait until it gives me something, and
>> reissue it every 15 minutes so the connection doesn't time out.
>
> Btw, why not 29 minutes? (accounting for 30 minute timeout).

Silliness.  In the example I started from someone had a comment
about firewalls not honoring a full 30 minutes so I picked 15 minutes
arbitrarily.

>> The only filter I have found useful to add is when imap idle returns
>> nothing to just loop instead of trying to fetch mail.  I have not found
>> anything in the data idle returns that would save me work in the mailbox
>> fetching loop.
>
> Yeah, I recall that being the case when I last worked with
> IMAP in other places.
>
>> In my limited testing firing up a bunch of ssl sockets and fetching from
>> the all simultaneously appears much faster, and maybe a little more
>> stable.  Than switching between mailboxes.
>
> Thanks for the info!
>
> Fwiw, I favor using stunnel for developing/testing client-side
> mail tools with TLS.  It's easier to debug/trace a tool that's
> talking over loopback to stunnel using tools like strace or tcpdump.

I may check that out.  Certainly something that gives you the raw
protocol messages is a little bit cleaner.

I haven't had enough challenges that I have been looking for the raw
protocol messages yet.

Eric




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman
  2019-10-29 22:31 ` Eric Wong
@ 2020-05-13 19:31 ` Eric Wong
  2020-05-13 21:48   ` Eric W. Biederman
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-05-13 19:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> A few days ago I stumbled upon this magic decoder ring for IMAP.
> The "Ten Commandments of How to Write an IMAP client"
> 
> https://www.washington.edu/imap/documentation/commndmt.txt

That URL already expired, fortunately archive.org has it:

https://web.archive.org/web/20191029013002/https://www.washington.edu/imap/documentation/commndmt.txt.html

> The part I was most clearly missing was that for IMAP it is better to
> open multiple sockets (one per mail folder on the server) than it is to
> switch between folders with the same ssl socket.
> 
> The core loop of my imap code now does:

Is your stuff based on Mail::IMAPClient still working well?

I'm starting to integrate Mail::IMAPClient support into
public-inbox-watch, but not sure how to go about with
automated test cases...

I guess it's reason to implement a read-only IMAP server along
the lines of public-inbox-nntpd :>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2020-05-13 19:31 ` Eric Wong
@ 2020-05-13 21:48   ` Eric W. Biederman
  2020-05-13 22:17     ` Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-13 21:48 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> 
>> A few days ago I stumbled upon this magic decoder ring for IMAP.
>> The "Ten Commandments of How to Write an IMAP client"
>> 
>> https://www.washington.edu/imap/documentation/commndmt.txt
>
> That URL already expired, fortunately archive.org has it:
>
> https://web.archive.org/web/20191029013002/https://www.washington.edu/imap/documentation/commndmt.txt.html

Wow.  It looks like all of their imap related documentation went away.

It looks like wikipedia has the story:
https://en.wikipedia.org/wiki/UW_IMAP

In short UW_IMAP written by Mark Crispin was the reference IMAP
implementation but he died in 2012.  So the fact everything stayed up
for as long as it did was a bit of testament to it's importance.

>> The part I was most clearly missing was that for IMAP it is better to
>> open multiple sockets (one per mail folder on the server) than it is to
>> switch between folders with the same ssl socket.
>> 
>> The core loop of my imap code now does:
>
> Is your stuff based on Mail::IMAPClient still working well?

Yes.  I have fixed a few things to make the code more robust.
Which is mostly me learning how Mail::IMAPClient works.

I also keep tweaking the ckde to make it more generic/simpler
so it has the potential to be usable by someone else.

Would be interested if I rebased onto the lastest release and
posted my code?  Perhaps you could put into contrib until you
have something that gets properly tested? 

> I'm starting to integrate Mail::IMAPClient support into
> public-inbox-watch, but not sure how to go about with
> automated test cases...
>
> I guess it's reason to implement a read-only IMAP server along
> the lines of public-inbox-nntpd :>

That sounds like a pain but it does sound like a good plan.  I think
there are a lot of people that if there was even a simple IMAP server
would find it easier to get started using public-inbox.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2020-05-13 21:48   ` Eric W. Biederman
@ 2020-05-13 22:17     ` Eric Wong
  2020-05-14 12:32       ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-05-13 22:17 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@yhbt.net> writes:
> > Is your stuff based on Mail::IMAPClient still working well?
> 
> Yes.  I have fixed a few things to make the code more robust.
> Which is mostly me learning how Mail::IMAPClient works.
> 
> I also keep tweaking the ckde to make it more generic/simpler
> so it has the potential to be usable by someone else.
> 
> Would be interested if I rebased onto the lastest release and
> posted my code?  Perhaps you could put into contrib until you
> have something that gets properly tested? 

Definitely.  Just having it here in the archives as an
attachment ought to be enough.  Thanks in advance.

> > I'm starting to integrate Mail::IMAPClient support into
> > public-inbox-watch, but not sure how to go about with
> > automated test cases...
> >
> > I guess it's reason to implement a read-only IMAP server along
> > the lines of public-inbox-nntpd :>
> 
> That sounds like a pain but it does sound like a good plan.  I think
> there are a lot of people that if there was even a simple IMAP server
> would find it easier to get started using public-inbox.

Agree on the pain vs payoff part; I wonder if I can just
implement the UID commands and ignore the unstable sequence
number madness.  Mail::IMAPClient does the right thing by
defaulting to Uid=>1, at least.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2020-05-13 22:17     ` Eric Wong
@ 2020-05-14 12:32       ` Eric W. Biederman
  2020-05-14 16:15         ` Eric Wong
  2020-05-15 21:00         ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
  0 siblings, 2 replies; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-14 12:32 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong <e@yhbt.net> writes:
>> > Is your stuff based on Mail::IMAPClient still working well?
>> 
>> Yes.  I have fixed a few things to make the code more robust.
>> Which is mostly me learning how Mail::IMAPClient works.
>> 
>> I also keep tweaking the ckde to make it more generic/simpler
>> so it has the potential to be usable by someone else.
>> 
>> Would be interested if I rebased onto the lastest release and
>> posted my code?  Perhaps you could put into contrib until you
>> have something that gets properly tested? 
>
> Definitely.  Just having it here in the archives as an
> attachment ought to be enough.  Thanks in advance.
>
>> > I'm starting to integrate Mail::IMAPClient support into
>> > public-inbox-watch, but not sure how to go about with
>> > automated test cases...
>> >
>> > I guess it's reason to implement a read-only IMAP server along
>> > the lines of public-inbox-nntpd :>
>> 
>> That sounds like a pain but it does sound like a good plan.  I think
>> there are a lot of people that if there was even a simple IMAP server
>> would find it easier to get started using public-inbox.
>
> Agree on the pain vs payoff part; I wonder if I can just
> implement the UID commands and ignore the unstable sequence
> number madness.  Mail::IMAPClient does the right thing by
> defaulting to Uid=>1, at least.

Just implementing the UID=>1 commands definitely sounds like a place to
start.  At least that should be enough for testing.

I will rebase and post my code in the next day or so, depending on how
my schedule goes.

Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: I have figured out IMAP IDLE
  2020-05-14 12:32       ` Eric W. Biederman
@ 2020-05-14 16:15         ` Eric Wong
  2020-05-15 21:00         ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-05-14 16:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Just implementing the UID=>1 commands definitely sounds like a place to
> start.  At least that should be enough for testing.

Heh, except "UID FETCH" returns sequence numbers :<

Since we're only doing read-only IMAP, I think it's safe to make
sequence numbers equivalent to UIDs (which are NNTP article
numbers).  Dummy messages can be returned if somebody starts
trying to retrieve by non-existent sequence numbers.

> I will rebase and post my code in the next day or so, depending on how
> my schedule goes.

Thanks.  For the purposes of public-inbox-watch, I think the
only item UID FETCH really needs is the full message
("RFC822" aka "BODY[]").

I'll have to see what other IMAP clients out there expect;
but I'm not looking forward to translating all search queries
to SQL or Xapian; just "ALL", $UID_LO:$UID_HI and maybe some
other simple cases which don't require a recursive descent
parser.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes
  2020-05-14 12:32       ` Eric W. Biederman
  2020-05-14 16:15         ` Eric Wong
@ 2020-05-15 21:00         ` Eric W. Biederman
  2020-05-15 21:02           ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman
  1 sibling, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-15 21:00 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


By tracking which messages in an imap mailbox have been already downloaded
they imap messages do not need to be deleted immediately.

It is recommended to implement one connection per IMAP mailbox,
so this tracker caches the entire url for a mailbox at creation time.

If you need to process multiple IMAP mailboxes you will need multiple trackers.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 lib/PublicInbox/IMAPTracker.pm

diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm
new file mode 100644
index 000000000000..d445555c0fd9
--- /dev/null
+++ b/lib/PublicInbox/IMAPTracker.pm
@@ -0,0 +1,73 @@
+# Copyright (C) 2018 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+
+package PublicInbox::IMAPTracker;
+use strict;
+use warnings;
+use DBI;
+use DBD::SQLite;
+use PublicInbox::Config;
+
+sub create_tables ($)
+{
+	my ($dbh) = @_;
+
+	$dbh->do(<<'');
+CREATE TABLE IF NOT EXISTS imap_last (
+	url VARCHAR PRIMARY KEY NOT NULL,
+	uid_validity INTEGER NOT NULL,
+	uid INTEGER NOT NULL,
+	UNIQUE (url)
+)
+
+}
+
+
+sub dbh_new ($)
+{
+	my ($dbname) = @_;
+	my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", '', '', {
+		AutoCommit => 1,
+		RaiseError => 1,
+		PrintError => 0,
+		ReadOnly => 0,
+		sqlite_use_immediate_transaction => 1,
+	});
+	$dbh->{sqlite_unicode} = 1;
+	$dbh->do('PRAGMA journal_mode = TRUNCATE');
+	$dbh->do('PRAGMA cache_size = 80000');
+	create_tables($dbh);
+	$dbh;
+}
+
+sub get_last($)
+{
+	my ($self) = @_;
+	my $dbh = $self->{dbh};
+
+	my $sth = $dbh->prepare_cached(<<'', undef, 1);
+SELECT uid_validity, uid FROM imap_last WHERE url = ?
+
+	$sth->execute($self->{url});
+	$sth->fetchrow_array;
+}
+
+sub update_last($$$)
+{
+	my ($self, $validity, $last) = @_;
+	my $dbh = $self->{dbh};
+	my $sth = $dbh->prepare_cached(<<'');
+INSERT OR REPLACE INTO imap_last (url, uid_validity, uid)
+VALUES (?, ?, ?)
+
+	$sth->execute($self->{url}, $validity, $last);
+}
+
+sub new ($) {
+	my ($class, $url) = @_;
+	my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3";
+	my $dbh = dbh_new($dbname);
+	bless { dbname => $dbname, dbh => $dbh, url => $url }, $class;
+}
+
+1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-15 21:00         ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
@ 2020-05-15 21:02           ` Eric W. Biederman
  2020-05-15 21:26             ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-15 21:02 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


The command imap_fetch connects to the specified imap mailbox and
fetches any unfetch messages than waits with imap idle until there are
more messages to fetch.

By default messages are placed in the specified public inbox mailbox.
The value of List-ID is consulted and if it is present used to
select an alternate public-inbox mailbox to place the messages in.

The email messages are placed without modification into the public
inbox repository so minimize changes of corruption or of loosing
valuable information.  I use the command imap_fetch for all of my
email and not just a mailling list mirror so I don't want automation
to accidentally cause something important to be lost.

No email messages are deleted from the server instead IMAPTracker
is used to remember which messages were downloaded.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 scripts/imap_fetch | 336 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)
 create mode 100755 scripts/imap_fetch

diff --git a/scripts/imap_fetch b/scripts/imap_fetch
new file mode 100755
index 000000000000..007f78a71b52
--- /dev/null
+++ b/scripts/imap_fetch
@@ -0,0 +1,336 @@
+#!/usr/bin/perl -w
+# Script to fetch IMAP messages and put then into a public-inbox
+=begin usage
+	./imap_fetch imap://username@hostname/mailbox inbox
+=cut
+use strict;
+use warnings;
+use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
+use Mail::IMAPClient;
+use IO::Socket;
+use IO::Socket::SSL;
+use File::Sync qw(sync);
+use Term::ReadKey;
+use PublicInbox::IMAPTracker;
+use PublicInbox::InboxWritable;
+use POSIX qw(strftime);
+sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) }
+my $verify_ssl = 1;
+my %opts = (
+	'--verify-ssl!' => \$verify_ssl,
+);
+GetOptions(%opts) or die usage();
+
+my $mail_url = shift @ARGV or die usage();
+my $inbox_name = shift @ARGV or die usage();
+my $mail_hostname;
+my $mail_username;
+my $mail_password;
+my $mailbox;
+if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)/(.+)\z$) {
+	$mail_username = $1;
+	$mail_hostname = $2;
+	$mailbox = $3;
+} else {
+	die usage();
+}
+
+my $url = 'imap://' . $mail_username . '@' . $mail_hostname . '/' . $mailbox ;
+
+sub list_hdr_ibx($$)
+{
+	my ($config, $list_hdr) = @_;
+	my $list_id;
+	if ($list_hdr =~ m/\0/) {
+		warn("Bad List-ID: $list_hdr contains a null\n");
+		return undef;
+	} elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) {
+		$list_id = $1;
+	} else {
+		warn("Bad List-ID: $list_hdr\n");
+		return undef;
+	}
+	my $ibx = $config->lookup_list_id($list_id);
+	if (!defined($ibx)) {
+		warn("Cound not find inbox for List-ID: $list_id\n");
+	}
+
+	print(" List-ID: $list_id\n");
+	$ibx;
+}
+
+sub email_dest($$)
+{
+	my ($config, $mime) = @_;
+	my %ibxs;
+	my $hdr = $mime->header_obj;
+	my @list_hdrs = $hdr->header_raw('List-ID');
+	for my $list_hdr (@list_hdrs) {
+		my $ibx = list_hdr_ibx($config, $list_hdr);
+		if (defined($ibx)) {
+			$ibxs{$ibx->{name}} = $ibx;
+		}
+	}
+	my @ibxs = values %ibxs;
+	return @ibxs;
+}
+
+if (-t STDIN) {
+	print("Enter your imap password: ");
+	ReadMode('noecho');
+	$mail_password = ReadLine(0);
+	ReadMode('normal');
+	print("\n");
+} else {
+	print("Not a tty\n");
+	$mail_password = readline();
+}
+die("No password") unless defined($mail_password);
+chomp($mail_password);
+
+sub imap_ssl_client()
+{
+	my %ca = eval { IO::Socket::SSL::default_ca(); };
+	my $socket = IO::Socket::SSL->new(
+		PeerAddr => $mail_hostname,
+		PeerPort => 993,
+		Timeout  => 5,
+		SSL_verify_mode => $verify_ssl ? SSL_VERIFY_PEER : SSL_VERIFY_NONE,
+		%ca,
+	);
+	if (!defined($socket)) {
+		warn("Could not open socket to mailserver: $@\n");
+		return undef;
+	}
+	my $client = Mail::IMAPClient->new(
+		Socket   => $socket,
+		User     => $mail_username,
+		Password => $mail_password,
+		Timeout  => 5,
+	);
+	if (!defined($client)) {
+		warn("Could not initialize imap client $@\n");
+		$socket->close();
+		return undef;
+	}
+	if (!$client->IsConnected()) {
+		warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n");
+		warn("LastError: " . $client->LastError . "\n");
+		warn("Could not connect to the IMAP server: $@\n");
+		$client = undef;
+		$socket->close();
+		return undef;
+	}
+	if (!$client->IsAuthenticated()) {
+		warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n");
+		warn("LastError: " . $client->LastError . "\n");
+		warn("Could not authenticate against IMAP: $@\n");
+		$client->logout();
+		$client = undef;
+		$socket->close();
+		return undef;
+	}
+
+	return $client;
+}
+
+sub setup_mailbox($$)
+{
+	my ($client, $mailbox) = @_;
+
+	$client->Peek(1);
+	$client->Uid(1);
+
+	$client->select($mailbox);
+	my @results = $client->Results();
+	my $validity = undef;
+	foreach (@results) {
+		if ($_ =~ /^\* OK \[UIDVALIDITY ([0-9]+)\].*$/) {
+			$validity = $1;
+			last;
+		}
+	}
+	if (!defined($validity) && $client->IsConnected()) {
+		$validity = $client->uidvalidity($mailbox);
+	}
+	die("No uid validity for $mailbox") unless $validity;
+
+	return ($validity);
+}
+
+sub fetch_mailbox ($$$$$$)
+{
+	my ($config, $tracker, $client, $mailbox, $validity, $default_ibx) = @_;
+	my $now = time();
+	print("mailbox: $mailbox @ " .
+	      strftime("%Y-%m-%d %H:%M:%S %z", localtime(time()))
+	      . "\n");
+
+	my %importers;
+	my ($last_validity, $last_uid) = $tracker->get_last();
+
+	if (defined($last_validity) and ($validity ne $last_validity)) {
+		die ("Unexpected uid validity $validity expected $last_validity");
+	}
+
+	my $search_str="ALL";
+	if (defined($last_uid)) {
+		# Find the last seen and all higher articles
+		$search_str = "UID $last_uid:*";
+	}
+	my $uids = $client->search($search_str);
+	if (!defined($uids) || (scalar(@$uids) == 0)) {
+		print("$mailbox: No uids found for '$search_str'! $@\n");
+		return 0;
+	}
+
+	my $last = undef;
+	my @sorted_uids = sort { $a <=> $b } @$uids;
+	# Cap the number of uids to process at once
+	my $more = 0;
+	my $uid_count = scalar(@sorted_uids);
+	if ($uid_count > 100) {
+		@sorted_uids = @sorted_uids[0..99];
+		$more = $uid_count - 100;
+	}
+	for my $uid (@sorted_uids) {
+		last unless $client->IsConnected();
+
+		print("$mailbox UID: $validity $uid\n");
+		if (defined($last_uid)) {
+			if ($uid == $last_uid) {
+				next;
+			}
+			if ($uid < $last_uid) {
+				print("$mailbox: uid $uid not below last $last_uid, skipping.\n");
+				next;
+			}
+		}
+		my $email_str = $client->message_string($uid) or die "Could not message_string $@\n";
+		my $email_len = length($email_str);
+		my $mime = Email::MIME->new($email_str);
+		$mime->{-public_inbox_raw} = $email_str;
+
+		my @dests = email_dest($config, $mime);
+		if (scalar(@dests) == 0) {
+			push(@dests, $default_ibx);
+		}
+		die ("no destination for the email") unless scalar(@dests) > 0;
+		#printf("$mailbox dests: %d\n", scalar(@dests));
+		for my $ibx (@dests) {
+			my $name = $ibx->{name};
+			my $im;
+			if (exists($importers{$name})) {
+				$im = $importers{$name}->[0];
+			} else {
+				my $wibx = PublicInbox::InboxWritable->new($ibx);
+				die "no wibx" unless defined($wibx);
+				$im = $wibx->importer(1);
+				die "no im" unless defined($im);
+				my @arr = ( $im, $ibx );
+				$importers{$name} = \@arr;
+			}
+			$im->add($mime);
+		}
+		$last = $uid;
+	}
+
+	if ($last) {
+		die ("no ibx's for $tracker->{url}") unless scalar(keys %importers) > 0;
+		for my $name (keys %importers) {
+			my $ref = delete $importers{$name};
+			my ($im, $ibx) = @$ref;
+			$im->done();
+		}
+		print("updating tracker for $tracker->{url}...\n");
+		$tracker->update_last($validity, $last);
+	}
+
+	return $more;
+}
+
+
+sub fetch_mailbox_loop($)
+{
+	my ($mailbox) = @_;
+	my $config = eval { PublicInbox::Config->new };
+	die("No public inbox config found!") unless $config;
+
+	my $ibx = $config->lookup_name($inbox_name);
+	die("Public inbox $inbox_name not found!") unless defined($ibx);
+
+	my $tracker = PublicInbox::IMAPTracker->new($url);
+	my $client = imap_ssl_client() || die("No imap connection");
+	my $validity = setup_mailbox($client, $mailbox);
+
+	for (;;) {
+		return unless $client->IsConnected();
+		my $more;
+		do {
+			$more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity, $ibx);
+			return unless $client->IsConnected();
+		} while ($more > 0);
+
+		my @untagged;
+		do {
+			my $max_idle = 15;
+			$client->idle() or die("idle failed!\n");
+			my @results = $client->idle_data($max_idle*60);
+			return unless $client->IsConnected();
+			my @ret = $client->done();
+			push(@results, @ret);
+			for my $line (@results) {
+				next if (!defined($line));
+				if ($line =~ m/^[*].*$/) {
+					push(@untagged, $line);
+				}
+			}
+		} while (scalar(@untagged) == 0);
+		print("$mailbox: untagged: '@untagged'\n");
+	}
+
+	$client->close($mailbox);
+	$client->logout();
+
+}
+
+sub handle_mailbox($)
+{
+	# Run fetch_mailbox_loop in it's own separate process so
+	# that if something goes wrong the process exits
+	# and everything cleans up properly.
+	#
+	# Running fetch_mailbox_loop in an eval block is not enough
+	# to prevent leaks of locks and other resources.
+	my ($mailbox) = @_;
+
+	for (;;) {
+		my $child = fork();
+		if (!defined($child)) {
+			warn("fork failed: for $mailbox\n");
+			continue;
+		}
+		elsif ($child == 0) {
+			# in the child
+			fetch_mailbox_loop($mailbox);
+			exit(0);
+		}
+		else {
+			my $sleep = 5;
+			warn("------------------------- CHILD: $child $mailbox -------------------------\n");
+			my $pid = waitpid($child, 0);
+			if ($pid != $child) {
+				exit(1);
+			}
+			print("$mailbox done\n");
+			sync();
+			print("\n$mailbox: Sleeping for $sleep minutes\n\n");
+			sleep($sleep*60);
+		}
+	}
+	exit(2);
+}
+
+handle_mailbox($mailbox);
+
+1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-15 21:02           ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman
@ 2020-05-15 21:26             ` Eric W. Biederman
  2020-05-15 22:56               ` Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-15 21:26 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> The command imap_fetch connects to the specified imap mailbox and
> fetches any unfetch messages than waits with imap idle until there are
> more messages to fetch.
>
> By default messages are placed in the specified public inbox mailbox.
> The value of List-ID is consulted and if it is present used to
> select an alternate public-inbox mailbox to place the messages in.
>
> The email messages are placed without modification into the public
> inbox repository so minimize changes of corruption or of loosing
> valuable information.  I use the command imap_fetch for all of my
> email and not just a mailling list mirror so I don't want automation
> to accidentally cause something important to be lost.
>
> No email messages are deleted from the server instead IMAPTracker
> is used to remember which messages were downloaded.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---


Bah.  I sent this a little too soon.  The patch needs this small
incremental fix to actually work.

Anyway I hope this is enough to help you with your integration of imap
functionality into public-inbox-watch.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-15 21:26             ` Eric W. Biederman
@ 2020-05-15 22:56               ` Eric Wong
  2020-05-16 10:47                 ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-05-15 22:56 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> > The command imap_fetch connects to the specified imap mailbox and
> > fetches any unfetch messages than waits with imap idle until there are
> > more messages to fetch.
> >
> > By default messages are placed in the specified public inbox mailbox.
> > The value of List-ID is consulted and if it is present used to
> > select an alternate public-inbox mailbox to place the messages in.
> >
> > The email messages are placed without modification into the public
> > inbox repository so minimize changes of corruption or of loosing
> > valuable information.  I use the command imap_fetch for all of my
> > email and not just a mailling list mirror so I don't want automation
> > to accidentally cause something important to be lost.

Btw, Email::MIME usage is gone from 1.5.0 due to nasty
performance problems and replaced by PublicInbox::Eml.  Eml
should be completely non-destructive unless somebody sends an
abusive message which exceeds the new safety limits; in which
case it won't OOM or burn CPU like E::M did.

That said, {-public_inbox_raw} still works and Eml looks
like a drop-in replacement as far as imap_fetch is concerned.

> > No email messages are deleted from the server instead IMAPTracker
> > is used to remember which messages were downloaded.

Yup.  I've integrated IMAPTracker into a local branch, already.
I think it could be reused for tracking NNTP fetches, too,
since UIDs and NNTP article numbers seem interchangeable.

> Bah.  I sent this a little too soon.  The patch needs this small
> incremental fix to actually work.

No worries.

Btw, any reason you create the SSLSocket yourself instead of
passing (Ssl => \@SSL_Socket_options) to IMAPClient->new?

Creating and maintaining client-side config options for TLS is
another thing that'll need to be done for -watch.  I guess we'll
steal configuration names/switches from git config/CLI tools to
ease the learning curve.

I'm already reusing "git credential" for retrieving/storing
login information and it seems nice.

> Anyway I hope this is enough to help you with your integration of imap
> functionality into public-inbox-watch.

Yup, thanks again!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-15 22:56               ` Eric Wong
@ 2020-05-16 10:47                 ` Eric W. Biederman
  2020-05-16 19:12                   ` Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-16 10:47 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> > The command imap_fetch connects to the specified imap mailbox and
>> > fetches any unfetch messages than waits with imap idle until there are
>> > more messages to fetch.
>> >
>> > By default messages are placed in the specified public inbox mailbox.
>> > The value of List-ID is consulted and if it is present used to
>> > select an alternate public-inbox mailbox to place the messages in.
>> >
>> > The email messages are placed without modification into the public
>> > inbox repository so minimize changes of corruption or of loosing
>> > valuable information.  I use the command imap_fetch for all of my
>> > email and not just a mailling list mirror so I don't want automation
>> > to accidentally cause something important to be lost.
>
> Btw, Email::MIME usage is gone from 1.5.0 due to nasty
> performance problems and replaced by PublicInbox::Eml.  Eml
> should be completely non-destructive unless somebody sends an
> abusive message which exceeds the new safety limits; in which
> case it won't OOM or burn CPU like E::M did.
>
> That said, {-public_inbox_raw} still works and Eml looks
> like a drop-in replacement as far as imap_fetch is concerned.

I almost did that. But I looked and saw PublicInbox::MIME still present
and a number of other references to Email::MIME so I wasn't certain
exactly how that was being handled.  But since Email::MIME still
worked I didn't mess with that.

>> > No email messages are deleted from the server instead IMAPTracker
>> > is used to remember which messages were downloaded.
>
> Yup.  I've integrated IMAPTracker into a local branch, already.
> I think it could be reused for tracking NNTP fetches, too,
> since UIDs and NNTP article numbers seem interchangeable.

True.  I guess it is possible to make the equivalent of my imap_fetch
for NNTP queries as well.  Does NNTP have the equivalent of IDLE where
it is possible to get timely updates of new messages?

I suppose I should see if the git protocol does as well.  I am trying to
figure out what the best way to track other public inbox git repositories.

Currently I have a script that every N minutes does git remote update.
Which is ok.  But I think something like imap_fetch might be nicer.

>> Bah.  I sent this a little too soon.  The patch needs this small
>> incremental fix to actually work.
>
> No worries.
>
> Btw, any reason you create the SSLSocket yourself instead of
> passing (Ssl => \@SSL_Socket_options) to IMAPClient->new?

When I read the documentation it looked like that was the way to do
things.  Even now when I reread the documentation that looks like the
way to go.  Especially if I wanted to be certain the connection was
encrypted.

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-16 10:47                 ` Eric W. Biederman
@ 2020-05-16 19:12                   ` Eric Wong
  2020-05-16 20:09                     ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2020-05-16 19:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@yhbt.net> writes:
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> > The email messages are placed without modification into the public
> >> > inbox repository so minimize changes of corruption or of loosing
> >> > valuable information.  I use the command imap_fetch for all of my
> >> > email and not just a mailling list mirror so I don't want automation
> >> > to accidentally cause something important to be lost.
> >
> > Btw, Email::MIME usage is gone from 1.5.0 due to nasty
> > performance problems and replaced by PublicInbox::Eml.  Eml
> > should be completely non-destructive unless somebody sends an
> > abusive message which exceeds the new safety limits; in which
> > case it won't OOM or burn CPU like E::M did.
> >
> > That said, {-public_inbox_raw} still works and Eml looks
> > like a drop-in replacement as far as imap_fetch is concerned.
> 
> I almost did that. But I looked and saw PublicInbox::MIME still present
> and a number of other references to Email::MIME so I wasn't certain
> exactly how that was being handled.  But since Email::MIME still
> worked I didn't mess with that.

I think the Import .pod documentation is the only place aside
from some random comments and maintainer tests in xt/*, right?

I'll definitely keep the import + indexing parts compatible with
Email::MIME.

I'm not sure how much of a Perl API I want to expose in case
this gets reimplemented/migrated to another language someday,
but maybe exposing the most heavily-used parts is OK.

I also don't think another language is likely, at this point.

> >> > No email messages are deleted from the server instead IMAPTracker
> >> > is used to remember which messages were downloaded.
> >
> > Yup.  I've integrated IMAPTracker into a local branch, already.
> > I think it could be reused for tracking NNTP fetches, too,
> > since UIDs and NNTP article numbers seem interchangeable.
> 
> True.  I guess it is possible to make the equivalent of my imap_fetch
> for NNTP queries as well.  Does NNTP have the equivalent of IDLE where
> it is possible to get timely updates of new messages?

I haven't seen anything in NNTP like IDLE.  Maybe there's
something in the draft stages, but AFAIK nothing widely
implemented.

> I suppose I should see if the git protocol does as well.  I am trying to
> figure out what the best way to track other public inbox git repositories.

I'll likely implement a custom HTTP endpoint and usable with
(curl -XIDLE ...); but the timeout will need to vary depending
on firewalls.

> Currently I have a script that every N minutes does git remote update.
> Which is ok.  But I think something like imap_fetch might be nicer.

Definitely.

> >> Bah.  I sent this a little too soon.  The patch needs this small
> >> incremental fix to actually work.
> >
> > No worries.
> >
> > Btw, any reason you create the SSLSocket yourself instead of
> > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new?
> 
> When I read the documentation it looked like that was the way to do
> things.  Even now when I reread the documentation that looks like the
> way to go.  Especially if I wanted to be certain the connection was
> encrypted.

There seems more than one way to do it, but `Starttls' and `Ssl'
are just as documented from what I tell (in v3.38).
Socket/RawSocket seem useful for using an external command to
connect/launch an IMAP tunnel or server; so it'll be used to
mimic the `imap.tunnel' support of git-imap-send.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox
  2020-05-16 19:12                   ` Eric Wong
@ 2020-05-16 20:09                     ` Eric W. Biederman
  2020-05-16 22:53                       ` [PATCH] confine Email::MIME use even further Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2020-05-16 20:09 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> Eric Wong <e@yhbt.net> writes:
>> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> >> > The email messages are placed without modification into the public
>> >> > inbox repository so minimize changes of corruption or of loosing
>> >> > valuable information.  I use the command imap_fetch for all of my
>> >> > email and not just a mailling list mirror so I don't want automation
>> >> > to accidentally cause something important to be lost.
>> >
>> > Btw, Email::MIME usage is gone from 1.5.0 due to nasty
>> > performance problems and replaced by PublicInbox::Eml.  Eml
>> > should be completely non-destructive unless somebody sends an
>> > abusive message which exceeds the new safety limits; in which
>> > case it won't OOM or burn CPU like E::M did.
>> >
>> > That said, {-public_inbox_raw} still works and Eml looks
>> > like a drop-in replacement as far as imap_fetch is concerned.
>> 
>> I almost did that. But I looked and saw PublicInbox::MIME still present
>> and a number of other references to Email::MIME so I wasn't certain
>> exactly how that was being handled.  But since Email::MIME still
>> worked I didn't mess with that.
>
> I think the Import .pod documentation is the only place aside
> from some random comments and maintainer tests in xt/*, right?

I am looking at 1.5.0 so you may have made a bit more progress
but Import.pm still uses Email::MIME,
and PublicInbox::MIME still uses Email::MIME as a base class.

> I'll definitely keep the import + indexing parts compatible with
> Email::MIME.
>
> I'm not sure how much of a Perl API I want to expose in case
> this gets reimplemented/migrated to another language someday,
> but maybe exposing the most heavily-used parts is OK.
>
> I also don't think another language is likely, at this point.


>> >> > No email messages are deleted from the server instead IMAPTracker
>> >> > is used to remember which messages were downloaded.
>> >
>> > Yup.  I've integrated IMAPTracker into a local branch, already.
>> > I think it could be reused for tracking NNTP fetches, too,
>> > since UIDs and NNTP article numbers seem interchangeable.
>> 
>> True.  I guess it is possible to make the equivalent of my imap_fetch
>> for NNTP queries as well.  Does NNTP have the equivalent of IDLE where
>> it is possible to get timely updates of new messages?
>
> I haven't seen anything in NNTP like IDLE.  Maybe there's
> something in the draft stages, but AFAIK nothing widely
> implemented.
>
>> I suppose I should see if the git protocol does as well.  I am trying to
>> figure out what the best way to track other public inbox git repositories.
>
> I'll likely implement a custom HTTP endpoint and usable with
> (curl -XIDLE ...); but the timeout will need to vary depending
> on firewalls.
>
>> Currently I have a script that every N minutes does git remote update.
>> Which is ok.  But I think something like imap_fetch might be nicer.
>
> Definitely.
>
>> >> Bah.  I sent this a little too soon.  The patch needs this small
>> >> incremental fix to actually work.
>> >
>> > No worries.
>> >
>> > Btw, any reason you create the SSLSocket yourself instead of
>> > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new?
>> 
>> When I read the documentation it looked like that was the way to do
>> things.  Even now when I reread the documentation that looks like the
>> way to go.  Especially if I wanted to be certain the connection was
>> encrypted.
>
> There seems more than one way to do it, but `Starttls' and `Ssl'
> are just as documented from what I tell (in v3.38).
> Socket/RawSocket seem useful for using an external command to
> connect/launch an IMAP tunnel or server; so it'll be used to
> mimic the `imap.tunnel' support of git-imap-send.

Now that you point it out I can see it.  Commands like starttls
are a bit dangerous as they are subject to man in the middle attacks.

But I think that is the difference of just tossing something together
for yourself versus making something that works with everyone's setup.

The one challenge I ran into was getting ssl verification to work on
RHEL7.  Apparently IO::Socket::SSL::default_ca() does not exist in
the old version of perl that comes with RHEL7.  Which is why I have
the %ca and the eval.

Eric


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] confine Email::MIME use even further
  2020-05-16 20:09                     ` Eric W. Biederman
@ 2020-05-16 22:53                       ` Eric Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2020-05-16 22:53 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@yhbt.net> writes:
> 
> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> Eric Wong <e@yhbt.net> writes:
> >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> >> > The email messages are placed without modification into the public
> >> >> > inbox repository so minimize changes of corruption or of loosing
> >> >> > valuable information.  I use the command imap_fetch for all of my
> >> >> > email and not just a mailling list mirror so I don't want automation
> >> >> > to accidentally cause something important to be lost.
> >> >
> >> > Btw, Email::MIME usage is gone from 1.5.0 due to nasty
> >> > performance problems and replaced by PublicInbox::Eml.  Eml
> >> > should be completely non-destructive unless somebody sends an
> >> > abusive message which exceeds the new safety limits; in which
> >> > case it won't OOM or burn CPU like E::M did.
> >> >
> >> > That said, {-public_inbox_raw} still works and Eml looks
> >> > like a drop-in replacement as far as imap_fetch is concerned.
> >> 
> >> I almost did that. But I looked and saw PublicInbox::MIME still present
> >> and a number of other references to Email::MIME so I wasn't certain
> >> exactly how that was being handled.  But since Email::MIME still
> >> worked I didn't mess with that.
> >
> > I think the Import .pod documentation is the only place aside
> > from some random comments and maintainer tests in xt/*, right?
> 
> I am looking at 1.5.0 so you may have made a bit more progress
> but Import.pm still uses Email::MIME,
> and PublicInbox::MIME still uses Email::MIME as a base class.

Yeah, PublicInbox::MIME only existed to workaround old bugs in
Email::MIME, so we used it everywhere for years and and will
keep it in old tests.

Below is a patch to remove most references to Email::MIME;
but I guess PublicInbox::Eml will need POD docs at some point...

> >> > Btw, any reason you create the SSLSocket yourself instead of
> >> > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new?
> >> 
> >> When I read the documentation it looked like that was the way to do
> >> things.  Even now when I reread the documentation that looks like the
> >> way to go.  Especially if I wanted to be certain the connection was
> >> encrypted.
> >
> > There seems more than one way to do it, but `Starttls' and `Ssl'
> > are just as documented from what I tell (in v3.38).
> > Socket/RawSocket seem useful for using an external command to
> > connect/launch an IMAP tunnel or server; so it'll be used to
> > mimic the `imap.tunnel' support of git-imap-send.
> 
> Now that you point it out I can see it.  Commands like starttls
> are a bit dangerous as they are subject to man in the middle attacks.
> 
> But I think that is the difference of just tossing something together
> for yourself versus making something that works with everyone's setup.
> 
> The one challenge I ran into was getting ssl verification to work on
> RHEL7.  Apparently IO::Socket::SSL::default_ca() does not exist in
> the old version of perl that comes with RHEL7.  Which is why I have
> the %ca and the eval.

Ouch.  Yes, I remember that being a problem for testing NNTPS,
too.  Net::NNTP doesn't support old IO::Socket::SSL, either.

Don't feel obligated to figure this out; but how did
IO::Socket::SSL work before it got default_ca()?

Did it force the user to configure that on their own,
set it behind-the-scenes as a default, or did it (*gasp*)
skip verification?

-----------8<-----------
Subject: [PATCH] confine Email::MIME use even further

To avoid confusing future readers and users, recommend
PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml
comments at the top of PublicInbox::MIME.

mime_load() confined to t/eml.t, since we won't be using
it anywhere else in our tests.
---
 lib/PublicInbox/Import.pm     | 22 +++++++++++++---------
 lib/PublicInbox/MIME.pm       |  4 +++-
 lib/PublicInbox/TestCommon.pm | 10 +---------
 t/eml.t                       |  6 ++++++
 4 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index fc61d062..792570c8 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -648,7 +648,10 @@ version 1.0
 
 =head1 SYNOPSIS
 
-	use Email::MIME;
+	use PublicInbox::Eml;
+	# PublicInbox::Eml exists as of public-inbox 1.5.0,
+	# Email::MIME was used in older versions
+
 	use PublicInbox::Git;
 	use PublicInbox::Import;
 
@@ -664,7 +667,7 @@ version 1.0
 		"Date: Thu, 01 Jan 1970 00:00:00 +0000\n" .
 		"Message-ID: <m\@example.org>\n".
 		"\ntest message";
-	my $parsed = Email::MIME->new($message);
+	my $parsed = PublicInbox::Eml->new($message);
 	my $ret = $im->add($parsed);
 	if (!defined $ret) {
 		warn "duplicate: ",
@@ -675,7 +678,7 @@ version 1.0
 	$im->done;
 
 	# to remove a message
-	my $junk = Email::MIME->new($message);
+	my $junk = PublicInbox::Eml->new($message);
 	my ($mark, $orig) = $im->remove($junk);
 	if ($mark eq 'MISSING') {
 		print "not found\n";
@@ -690,8 +693,8 @@ version 1.0
 
 =head1 DESCRIPTION
 
-An importer and remover for public-inboxes which takes L<Email::MIME>
-messages as input and stores them in a git repository as
+An importer and remover for public-inboxes which takes C<PublicInbox::Eml>
+or L<Email::MIME> messages as input and stores them in a git repository as
 documented in L<https://public-inbox.org/public-inbox-v1-format.txt>,
 except it does not allow duplicate Message-IDs.
 
@@ -709,7 +712,7 @@ Initialize a new PublicInbox::Import object.
 
 =head2 add
 
-	my $parsed = Email::MIME->new($message);
+	my $parsed = PublicInbox::Eml->new($message);
 	$im->add($parsed);
 
 Adds a message to to the git repository.  This will acquire
@@ -720,12 +723,13 @@ is called, but L</remove> may be called on them.
 
 =head2 remove
 
-	my $junk = Email::MIME->new($message);
+	my $junk = PublicInbox::Eml->new($message);
 	my ($code, $orig) = $im->remove($junk);
 
 Removes a message from the repository.  On success, it returns
 a ':'-prefixed numeric code representing the git-fast-import
-mark and the original messages as an Email::MIME object.
+mark and the original messages as a PublicInbox::Eml
+(or Email::MIME) object.
 If the message could not be found, the code is "MISSING"
 and the original message is undef.  If there is a mismatch where
 the "Message-ID" is matched but the subject and body do not match,
@@ -749,7 +753,7 @@ The mail archives are hosted at L<https://public-inbox.org/meta/>
 
 =head1 COPYRIGHT
 
-Copyright (C) 2016 all contributors L<mailto:meta@public-inbox.org>
+Copyright (C) 2016-2020 all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt>
 
diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm
index 9077386a..831a3d19 100644
--- a/lib/PublicInbox/MIME.pm
+++ b/lib/PublicInbox/MIME.pm
@@ -4,7 +4,9 @@
 # The license for this file differs from the rest of public-inbox.
 #
 # We no longer load this in any of our code outside of maintainer
-# tests for compatibility.
+# tests for compatibility.  PublicInbox::Eml is favored throughout
+# our codebase for performance and safety reasons, though we maintain
+# Email::MIME-compatibility in mail injection and indexing code paths.
 #
 # It monkey patches the "parts_multipart" subroutine with patches
 # from Matthew Horsfall <wolfsage@gmail.com> at:
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index d952ee6d..79e597f5 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -9,15 +9,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
 use POSIX qw(dup2);
 use IO::Socket::INET;
 our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
-	run_script start_script key2sub xsys xqx mime_load eml_load);
-
-sub mime_load ($) {
-	my ($path) = @_;
-	open(my $fh, '<', $path) or die "open $path: $!";
-	# test should've called: require_mods('Email::MIME')
-	require PublicInbox::MIME;
-	PublicInbox::MIME->new(\(do { local $/; <$fh> }));
-}
+	run_script start_script key2sub xsys xqx eml_load);
 
 sub eml_load ($) {
 	my ($path, $cb) = @_;
diff --git a/t/eml.t b/t/eml.t
index b7f58ac7..1892b001 100644
--- a/t/eml.t
+++ b/t/eml.t
@@ -12,6 +12,12 @@ SKIP: {
 };
 use_ok $_ for @classes;
 
+sub mime_load ($) {
+	my ($path) = @_;
+	open(my $fh, '<', $path) or die "open $path: $!";
+	PublicInbox::MIME->new(\(do { local $/; <$fh> }));
+}
+
 {
 	my $eml = PublicInbox::Eml->new(\(my $str = "a: b\n\nhi\n"));
 	is($str, "hi\n", '->new modified body like Email::Simple');

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-05-16 22:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman
2019-10-29 22:31 ` Eric Wong
2019-10-29 23:12   ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong
2019-11-03 16:28   ` I have figured out IMAP IDLE Eric W. Biederman
2020-05-13 19:31 ` Eric Wong
2020-05-13 21:48   ` Eric W. Biederman
2020-05-13 22:17     ` Eric Wong
2020-05-14 12:32       ` Eric W. Biederman
2020-05-14 16:15         ` Eric Wong
2020-05-15 21:00         ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman
2020-05-15 21:02           ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman
2020-05-15 21:26             ` Eric W. Biederman
2020-05-15 22:56               ` Eric Wong
2020-05-16 10:47                 ` Eric W. Biederman
2020-05-16 19:12                   ` Eric Wong
2020-05-16 20:09                     ` Eric W. Biederman
2020-05-16 22:53                       ` [PATCH] confine Email::MIME use even further Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).