* I have figured out IMAP IDLE @ 2019-10-29 14:40 Eric W. Biederman 2019-10-29 22:31 ` Eric Wong 2020-05-13 19:31 ` Eric Wong 0 siblings, 2 replies; 17+ messages in thread From: Eric W. Biederman @ 2019-10-29 14:40 UTC (permalink / raw) To: Eric Wong; +Cc: meta A few days ago I stumbled upon this magic decoder ring for IMAP. The "Ten Commandments of How to Write an IMAP client" https://www.washington.edu/imap/documentation/commndmt.txt The part I was most clearly missing was that for IMAP it is better to open multiple sockets (one per mail folder on the server) than it is to switch between folders with the same ssl socket. The core loop of my imap code now does: for (;;) { my $more; do { $more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity); } while ($more > 0); my @idle_data; do { my $max_idle = 15; $client->idle() or die("idle failed!\n"); @idle_data = $client->idle_data($max_idle*60); $client->done(); } while (scalar(@idle_data) == 0); } Where all I do with idle is wait until it gives me something, and reissue it every 15 minutes so the connection doesn't time out. The only filter I have found useful to add is when imap idle returns nothing to just loop instead of trying to fetch mail. I have not found anything in the data idle returns that would save me work in the mailbox fetching loop. In my limited testing firing up a bunch of ssl sockets and fetching from the all simultaneously appears much faster, and maybe a little more stable. Than switching between mailboxes. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman @ 2019-10-29 22:31 ` Eric Wong 2019-10-29 23:12 ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong 2019-11-03 16:28 ` I have figured out IMAP IDLE Eric W. Biederman 2020-05-13 19:31 ` Eric Wong 1 sibling, 2 replies; 17+ messages in thread From: Eric Wong @ 2019-10-29 22:31 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > A few days ago I stumbled upon this magic decoder ring for IMAP. > The "Ten Commandments of How to Write an IMAP client" > > https://www.washington.edu/imap/documentation/commndmt.txt > > The part I was most clearly missing was that for IMAP it is better to > open multiple sockets (one per mail folder on the server) than it is to > switch between folders with the same ssl socket. > > The core loop of my imap code now does: > > for (;;) { > my $more; > do { > $more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity); > } while ($more > 0); > > my @idle_data; > do { > my $max_idle = 15; > $client->idle() or die("idle failed!\n"); b) window closed > @idle_data = $client->idle_data($max_idle*60); > $client->done(); a) window opens > } while (scalar(@idle_data) == 0); > } It seems between a) and b), there's a small window where a message can appear and not be noticed right away. I think the only reliable way to do this without experiencing delays or an opportunity for missed messages is to have two(!) connections per folder: 1) fetcher - this connection only fetches, either at startup or when triggered by the idler described below. 2) idler - just runs ->idle, ->idle_data and ->done If ->idle times out or sees ->idle_data, it triggers fetcher. The trigger should probably have a timestamp so the fetcher can merge overlapping triggers. Unfortunately, using widely-available Perl IMAP client libraries means two Perl processes for every folder. But I suppose fetcher can be lazy-started to not waste resources. I wonder if WWW::Curl can do it all in one Perl process with an event loop. curl can issue arbitrary commands for HTTP and I've been using it to learn some XS, so maybe it can do IDLE. WWW::Curl is also packaged for CentOS/RHEL7, so it should not be tough to install. > Where all I do with idle is wait until it gives me something, and > reissue it every 15 minutes so the connection doesn't time out. Btw, why not 29 minutes? (accounting for 30 minute timeout). > The only filter I have found useful to add is when imap idle returns > nothing to just loop instead of trying to fetch mail. I have not found > anything in the data idle returns that would save me work in the mailbox > fetching loop. Yeah, I recall that being the case when I last worked with IMAP in other places. > In my limited testing firing up a bunch of ssl sockets and fetching from > the all simultaneously appears much faster, and maybe a little more > stable. Than switching between mailboxes. Thanks for the info! Fwiw, I favor using stunnel for developing/testing client-side mail tools with TLS. It's easier to debug/trace a tool that's talking over loopback to stunnel using tools like strace or tcpdump. ^ permalink raw reply [flat|nested] 17+ messages in thread
* WWW::Curl [was: I have figured out IMAP IDLE] 2019-10-29 22:31 ` Eric Wong @ 2019-10-29 23:12 ` Eric Wong 2019-11-03 16:28 ` I have figured out IMAP IDLE Eric W. Biederman 1 sibling, 0 replies; 17+ messages in thread From: Eric Wong @ 2019-10-29 23:12 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta Eric Wong <e@80x24.org> wrote: > WWW::Curl is also packaged for CentOS/RHEL7, so it should not be > tough to install. But still a pain to build from source, if need be: https://bugs.debian.org/843432 And it looks like upstream's been missing for a few years and distros have mostly kept it alive... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2019-10-29 22:31 ` Eric Wong 2019-10-29 23:12 ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong @ 2019-11-03 16:28 ` Eric W. Biederman 1 sibling, 0 replies; 17+ messages in thread From: Eric W. Biederman @ 2019-11-03 16:28 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@80x24.org> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> A few days ago I stumbled upon this magic decoder ring for IMAP. >> The "Ten Commandments of How to Write an IMAP client" >> >> https://www.washington.edu/imap/documentation/commndmt.txt >> >> The part I was most clearly missing was that for IMAP it is better to >> open multiple sockets (one per mail folder on the server) than it is to >> switch between folders with the same ssl socket. >> >> The core loop of my imap code now does: >> >> for (;;) { >> my $more; >> do { >> $more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity); >> } while ($more > 0); >> >> my @idle_data; >> do { >> my $max_idle = 15; >> $client->idle() or die("idle failed!\n"); > > b) window closed > >> @idle_data = $client->idle_data($max_idle*60); >> $client->done(); > > a) window opens > >> } while (scalar(@idle_data) == 0); >> } > > It seems between a) and b), there's a small window where a message > can appear and not be noticed right away. Hmm. Looking at the source of IMAPClient point (b) can be moved to after "$client->done()" by reading the output "$client->done()". About the window for unsolicited untagged responses before the "$client->idle()" command. Those unsolicited responses should queue up in the imap socket until the next read. That next read should happen after the "$client->idle()" command. So in practice unsolicited untagged responses should just look like a reply to imap idle. So the race you have noticed does not look fundamental rather it looks like it simply takes bug fixing to make go away. > I think the only reliable way to do this without experiencing > delays or an opportunity for missed messages is to have two(!) > connections per folder: That sounds nice. But it is also _not_ recommended. Quoting from the "Ten Commandments of How to Write an IMAP Client" 1. Thou shalt not assume that it is alright to open multiple IMAP sessions selected on the same mailbox simultaneously, lest thou face the righteous wrath of mail stores that doth not permit such access. Instead, thou shalt labor mightily, even unto having to use thy brain to thinketh the matter through, such that thy client use existing sessions that are already open. > 1) fetcher - this connection only fetches, either at startup or > when triggered by the idler described below. > > 2) idler - just runs ->idle, ->idle_data and ->done > If ->idle times out or sees ->idle_data, it triggers > fetcher. The trigger should probably have a timestamp > so the fetcher can merge overlapping triggers. > > Unfortunately, using widely-available Perl IMAP client libraries > means two Perl processes for every folder. But I suppose > fetcher can be lazy-started to not waste resources. > > I wonder if WWW::Curl can do it all in one Perl process with an > event loop. curl can issue arbitrary commands for HTTP and I've > been using it to learn some XS, so maybe it can do IDLE. > > WWW::Curl is also packaged for CentOS/RHEL7, so it should not be > tough to install. > >> Where all I do with idle is wait until it gives me something, and >> reissue it every 15 minutes so the connection doesn't time out. > > Btw, why not 29 minutes? (accounting for 30 minute timeout). Silliness. In the example I started from someone had a comment about firewalls not honoring a full 30 minutes so I picked 15 minutes arbitrarily. >> The only filter I have found useful to add is when imap idle returns >> nothing to just loop instead of trying to fetch mail. I have not found >> anything in the data idle returns that would save me work in the mailbox >> fetching loop. > > Yeah, I recall that being the case when I last worked with > IMAP in other places. > >> In my limited testing firing up a bunch of ssl sockets and fetching from >> the all simultaneously appears much faster, and maybe a little more >> stable. Than switching between mailboxes. > > Thanks for the info! > > Fwiw, I favor using stunnel for developing/testing client-side > mail tools with TLS. It's easier to debug/trace a tool that's > talking over loopback to stunnel using tools like strace or tcpdump. I may check that out. Certainly something that gives you the raw protocol messages is a little bit cleaner. I haven't had enough challenges that I have been looking for the raw protocol messages yet. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman 2019-10-29 22:31 ` Eric Wong @ 2020-05-13 19:31 ` Eric Wong 2020-05-13 21:48 ` Eric W. Biederman 1 sibling, 1 reply; 17+ messages in thread From: Eric Wong @ 2020-05-13 19:31 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > > A few days ago I stumbled upon this magic decoder ring for IMAP. > The "Ten Commandments of How to Write an IMAP client" > > https://www.washington.edu/imap/documentation/commndmt.txt That URL already expired, fortunately archive.org has it: https://web.archive.org/web/20191029013002/https://www.washington.edu/imap/documentation/commndmt.txt.html > The part I was most clearly missing was that for IMAP it is better to > open multiple sockets (one per mail folder on the server) than it is to > switch between folders with the same ssl socket. > > The core loop of my imap code now does: Is your stuff based on Mail::IMAPClient still working well? I'm starting to integrate Mail::IMAPClient support into public-inbox-watch, but not sure how to go about with automated test cases... I guess it's reason to implement a read-only IMAP server along the lines of public-inbox-nntpd :> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2020-05-13 19:31 ` Eric Wong @ 2020-05-13 21:48 ` Eric W. Biederman 2020-05-13 22:17 ` Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-13 21:48 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@yhbt.net> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> A few days ago I stumbled upon this magic decoder ring for IMAP. >> The "Ten Commandments of How to Write an IMAP client" >> >> https://www.washington.edu/imap/documentation/commndmt.txt > > That URL already expired, fortunately archive.org has it: > > https://web.archive.org/web/20191029013002/https://www.washington.edu/imap/documentation/commndmt.txt.html Wow. It looks like all of their imap related documentation went away. It looks like wikipedia has the story: https://en.wikipedia.org/wiki/UW_IMAP In short UW_IMAP written by Mark Crispin was the reference IMAP implementation but he died in 2012. So the fact everything stayed up for as long as it did was a bit of testament to it's importance. >> The part I was most clearly missing was that for IMAP it is better to >> open multiple sockets (one per mail folder on the server) than it is to >> switch between folders with the same ssl socket. >> >> The core loop of my imap code now does: > > Is your stuff based on Mail::IMAPClient still working well? Yes. I have fixed a few things to make the code more robust. Which is mostly me learning how Mail::IMAPClient works. I also keep tweaking the ckde to make it more generic/simpler so it has the potential to be usable by someone else. Would be interested if I rebased onto the lastest release and posted my code? Perhaps you could put into contrib until you have something that gets properly tested? > I'm starting to integrate Mail::IMAPClient support into > public-inbox-watch, but not sure how to go about with > automated test cases... > > I guess it's reason to implement a read-only IMAP server along > the lines of public-inbox-nntpd :> That sounds like a pain but it does sound like a good plan. I think there are a lot of people that if there was even a simple IMAP server would find it easier to get started using public-inbox. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2020-05-13 21:48 ` Eric W. Biederman @ 2020-05-13 22:17 ` Eric Wong 2020-05-14 12:32 ` Eric W. Biederman 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2020-05-13 22:17 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@yhbt.net> writes: > > Is your stuff based on Mail::IMAPClient still working well? > > Yes. I have fixed a few things to make the code more robust. > Which is mostly me learning how Mail::IMAPClient works. > > I also keep tweaking the ckde to make it more generic/simpler > so it has the potential to be usable by someone else. > > Would be interested if I rebased onto the lastest release and > posted my code? Perhaps you could put into contrib until you > have something that gets properly tested? Definitely. Just having it here in the archives as an attachment ought to be enough. Thanks in advance. > > I'm starting to integrate Mail::IMAPClient support into > > public-inbox-watch, but not sure how to go about with > > automated test cases... > > > > I guess it's reason to implement a read-only IMAP server along > > the lines of public-inbox-nntpd :> > > That sounds like a pain but it does sound like a good plan. I think > there are a lot of people that if there was even a simple IMAP server > would find it easier to get started using public-inbox. Agree on the pain vs payoff part; I wonder if I can just implement the UID commands and ignore the unstable sequence number madness. Mail::IMAPClient does the right thing by defaulting to Uid=>1, at least. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2020-05-13 22:17 ` Eric Wong @ 2020-05-14 12:32 ` Eric W. Biederman 2020-05-14 16:15 ` Eric Wong 2020-05-15 21:00 ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman 0 siblings, 2 replies; 17+ messages in thread From: Eric W. Biederman @ 2020-05-14 12:32 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@yhbt.net> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@yhbt.net> writes: >> > Is your stuff based on Mail::IMAPClient still working well? >> >> Yes. I have fixed a few things to make the code more robust. >> Which is mostly me learning how Mail::IMAPClient works. >> >> I also keep tweaking the ckde to make it more generic/simpler >> so it has the potential to be usable by someone else. >> >> Would be interested if I rebased onto the lastest release and >> posted my code? Perhaps you could put into contrib until you >> have something that gets properly tested? > > Definitely. Just having it here in the archives as an > attachment ought to be enough. Thanks in advance. > >> > I'm starting to integrate Mail::IMAPClient support into >> > public-inbox-watch, but not sure how to go about with >> > automated test cases... >> > >> > I guess it's reason to implement a read-only IMAP server along >> > the lines of public-inbox-nntpd :> >> >> That sounds like a pain but it does sound like a good plan. I think >> there are a lot of people that if there was even a simple IMAP server >> would find it easier to get started using public-inbox. > > Agree on the pain vs payoff part; I wonder if I can just > implement the UID commands and ignore the unstable sequence > number madness. Mail::IMAPClient does the right thing by > defaulting to Uid=>1, at least. Just implementing the UID=>1 commands definitely sounds like a place to start. At least that should be enough for testing. I will rebase and post my code in the next day or so, depending on how my schedule goes. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: I have figured out IMAP IDLE 2020-05-14 12:32 ` Eric W. Biederman @ 2020-05-14 16:15 ` Eric Wong 2020-05-15 21:00 ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman 1 sibling, 0 replies; 17+ messages in thread From: Eric Wong @ 2020-05-14 16:15 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Just implementing the UID=>1 commands definitely sounds like a place to > start. At least that should be enough for testing. Heh, except "UID FETCH" returns sequence numbers :< Since we're only doing read-only IMAP, I think it's safe to make sequence numbers equivalent to UIDs (which are NNTP article numbers). Dummy messages can be returned if somebody starts trying to retrieve by non-existent sequence numbers. > I will rebase and post my code in the next day or so, depending on how > my schedule goes. Thanks. For the purposes of public-inbox-watch, I think the only item UID FETCH really needs is the full message ("RFC822" aka "BODY[]"). I'll have to see what other IMAP clients out there expect; but I'm not looking forward to translating all search queries to SQL or Xapian; just "ALL", $UID_LO:$UID_HI and maybe some other simple cases which don't require a recursive descent parser. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes 2020-05-14 12:32 ` Eric W. Biederman 2020-05-14 16:15 ` Eric Wong @ 2020-05-15 21:00 ` Eric W. Biederman 2020-05-15 21:02 ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman 1 sibling, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-15 21:00 UTC (permalink / raw) To: Eric Wong; +Cc: meta By tracking which messages in an imap mailbox have been already downloaded they imap messages do not need to be deleted immediately. It is recommended to implement one connection per IMAP mailbox, so this tracker caches the entire url for a mailbox at creation time. If you need to process multiple IMAP mailboxes you will need multiple trackers. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- lib/PublicInbox/IMAPTracker.pm | 73 ++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 lib/PublicInbox/IMAPTracker.pm diff --git a/lib/PublicInbox/IMAPTracker.pm b/lib/PublicInbox/IMAPTracker.pm new file mode 100644 index 000000000000..d445555c0fd9 --- /dev/null +++ b/lib/PublicInbox/IMAPTracker.pm @@ -0,0 +1,73 @@ +# Copyright (C) 2018 all contributors <meta@public-inbox.org> +# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt> + +package PublicInbox::IMAPTracker; +use strict; +use warnings; +use DBI; +use DBD::SQLite; +use PublicInbox::Config; + +sub create_tables ($) +{ + my ($dbh) = @_; + + $dbh->do(<<''); +CREATE TABLE IF NOT EXISTS imap_last ( + url VARCHAR PRIMARY KEY NOT NULL, + uid_validity INTEGER NOT NULL, + uid INTEGER NOT NULL, + UNIQUE (url) +) + +} + + +sub dbh_new ($) +{ + my ($dbname) = @_; + my $dbh = DBI->connect("dbi:SQLite:dbname=$dbname", '', '', { + AutoCommit => 1, + RaiseError => 1, + PrintError => 0, + ReadOnly => 0, + sqlite_use_immediate_transaction => 1, + }); + $dbh->{sqlite_unicode} = 1; + $dbh->do('PRAGMA journal_mode = TRUNCATE'); + $dbh->do('PRAGMA cache_size = 80000'); + create_tables($dbh); + $dbh; +} + +sub get_last($) +{ + my ($self) = @_; + my $dbh = $self->{dbh}; + + my $sth = $dbh->prepare_cached(<<'', undef, 1); +SELECT uid_validity, uid FROM imap_last WHERE url = ? + + $sth->execute($self->{url}); + $sth->fetchrow_array; +} + +sub update_last($$$) +{ + my ($self, $validity, $last) = @_; + my $dbh = $self->{dbh}; + my $sth = $dbh->prepare_cached(<<''); +INSERT OR REPLACE INTO imap_last (url, uid_validity, uid) +VALUES (?, ?, ?) + + $sth->execute($self->{url}, $validity, $last); +} + +sub new ($) { + my ($class, $url) = @_; + my $dbname = PublicInbox::Config->config_dir() . "/imap.sqlite3"; + my $dbh = dbh_new($dbname); + bless { dbname => $dbname, dbh => $dbh, url => $url }, $class; +} + +1; -- 2.20.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-15 21:00 ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman @ 2020-05-15 21:02 ` Eric W. Biederman 2020-05-15 21:26 ` Eric W. Biederman 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-15 21:02 UTC (permalink / raw) To: Eric Wong; +Cc: meta The command imap_fetch connects to the specified imap mailbox and fetches any unfetch messages than waits with imap idle until there are more messages to fetch. By default messages are placed in the specified public inbox mailbox. The value of List-ID is consulted and if it is present used to select an alternate public-inbox mailbox to place the messages in. The email messages are placed without modification into the public inbox repository so minimize changes of corruption or of loosing valuable information. I use the command imap_fetch for all of my email and not just a mailling list mirror so I don't want automation to accidentally cause something important to be lost. No email messages are deleted from the server instead IMAPTracker is used to remember which messages were downloaded. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- scripts/imap_fetch | 336 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 336 insertions(+) create mode 100755 scripts/imap_fetch diff --git a/scripts/imap_fetch b/scripts/imap_fetch new file mode 100755 index 000000000000..007f78a71b52 --- /dev/null +++ b/scripts/imap_fetch @@ -0,0 +1,336 @@ +#!/usr/bin/perl -w +# Script to fetch IMAP messages and put then into a public-inbox +=begin usage + ./imap_fetch imap://username@hostname/mailbox inbox +=cut +use strict; +use warnings; +use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev); +use Mail::IMAPClient; +use IO::Socket; +use IO::Socket::SSL; +use File::Sync qw(sync); +use Term::ReadKey; +use PublicInbox::IMAPTracker; +use PublicInbox::InboxWritable; +use POSIX qw(strftime); +sub usage { "Usage:\n".join('', grep(/\t/, `head -n 24 $0`)) } +my $verify_ssl = 1; +my %opts = ( + '--verify-ssl!' => \$verify_ssl, +); +GetOptions(%opts) or die usage(); + +my $mail_url = shift @ARGV or die usage(); +my $inbox_name = shift @ARGV or die usage(); +my $mail_hostname; +my $mail_username; +my $mail_password; +my $mailbox; +if ($mail_url =~ m$\Aimap://([^@]+)[@]([^@]+)/(.+)\z$) { + $mail_username = $1; + $mail_hostname = $2; + $mailbox = $3; +} else { + die usage(); +} + +my $url = 'imap://' . $mail_username . '@' . $mail_hostname . '/' . $mailbox ; + +sub list_hdr_ibx($$) +{ + my ($config, $list_hdr) = @_; + my $list_id; + if ($list_hdr =~ m/\0/) { + warn("Bad List-ID: $list_hdr contains a null\n"); + return undef; + } elsif ($list_hdr =~ m/\A[^<>]*<(\S*)>\s*\z/) { + $list_id = $1; + } else { + warn("Bad List-ID: $list_hdr\n"); + return undef; + } + my $ibx = $config->lookup_list_id($list_id); + if (!defined($ibx)) { + warn("Cound not find inbox for List-ID: $list_id\n"); + } + + print(" List-ID: $list_id\n"); + $ibx; +} + +sub email_dest($$) +{ + my ($config, $mime) = @_; + my %ibxs; + my $hdr = $mime->header_obj; + my @list_hdrs = $hdr->header_raw('List-ID'); + for my $list_hdr (@list_hdrs) { + my $ibx = list_hdr_ibx($config, $list_hdr); + if (defined($ibx)) { + $ibxs{$ibx->{name}} = $ibx; + } + } + my @ibxs = values %ibxs; + return @ibxs; +} + +if (-t STDIN) { + print("Enter your imap password: "); + ReadMode('noecho'); + $mail_password = ReadLine(0); + ReadMode('normal'); + print("\n"); +} else { + print("Not a tty\n"); + $mail_password = readline(); +} +die("No password") unless defined($mail_password); +chomp($mail_password); + +sub imap_ssl_client() +{ + my %ca = eval { IO::Socket::SSL::default_ca(); }; + my $socket = IO::Socket::SSL->new( + PeerAddr => $mail_hostname, + PeerPort => 993, + Timeout => 5, + SSL_verify_mode => $verify_ssl ? SSL_VERIFY_PEER : SSL_VERIFY_NONE, + %ca, + ); + if (!defined($socket)) { + warn("Could not open socket to mailserver: $@\n"); + return undef; + } + my $client = Mail::IMAPClient->new( + Socket => $socket, + User => $mail_username, + Password => $mail_password, + Timeout => 5, + ); + if (!defined($client)) { + warn("Could not initialize imap client $@\n"); + $socket->close(); + return undef; + } + if (!$client->IsConnected()) { + warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n"); + warn("LastError: " . $client->LastError . "\n"); + warn("Could not connect to the IMAP server: $@\n"); + $client = undef; + $socket->close(); + return undef; + } + if (!$client->IsAuthenticated()) { + warn("LastIMAPCommand: " . $client->LastIMAPCommand . "\n"); + warn("LastError: " . $client->LastError . "\n"); + warn("Could not authenticate against IMAP: $@\n"); + $client->logout(); + $client = undef; + $socket->close(); + return undef; + } + + return $client; +} + +sub setup_mailbox($$) +{ + my ($client, $mailbox) = @_; + + $client->Peek(1); + $client->Uid(1); + + $client->select($mailbox); + my @results = $client->Results(); + my $validity = undef; + foreach (@results) { + if ($_ =~ /^\* OK \[UIDVALIDITY ([0-9]+)\].*$/) { + $validity = $1; + last; + } + } + if (!defined($validity) && $client->IsConnected()) { + $validity = $client->uidvalidity($mailbox); + } + die("No uid validity for $mailbox") unless $validity; + + return ($validity); +} + +sub fetch_mailbox ($$$$$$) +{ + my ($config, $tracker, $client, $mailbox, $validity, $default_ibx) = @_; + my $now = time(); + print("mailbox: $mailbox @ " . + strftime("%Y-%m-%d %H:%M:%S %z", localtime(time())) + . "\n"); + + my %importers; + my ($last_validity, $last_uid) = $tracker->get_last(); + + if (defined($last_validity) and ($validity ne $last_validity)) { + die ("Unexpected uid validity $validity expected $last_validity"); + } + + my $search_str="ALL"; + if (defined($last_uid)) { + # Find the last seen and all higher articles + $search_str = "UID $last_uid:*"; + } + my $uids = $client->search($search_str); + if (!defined($uids) || (scalar(@$uids) == 0)) { + print("$mailbox: No uids found for '$search_str'! $@\n"); + return 0; + } + + my $last = undef; + my @sorted_uids = sort { $a <=> $b } @$uids; + # Cap the number of uids to process at once + my $more = 0; + my $uid_count = scalar(@sorted_uids); + if ($uid_count > 100) { + @sorted_uids = @sorted_uids[0..99]; + $more = $uid_count - 100; + } + for my $uid (@sorted_uids) { + last unless $client->IsConnected(); + + print("$mailbox UID: $validity $uid\n"); + if (defined($last_uid)) { + if ($uid == $last_uid) { + next; + } + if ($uid < $last_uid) { + print("$mailbox: uid $uid not below last $last_uid, skipping.\n"); + next; + } + } + my $email_str = $client->message_string($uid) or die "Could not message_string $@\n"; + my $email_len = length($email_str); + my $mime = Email::MIME->new($email_str); + $mime->{-public_inbox_raw} = $email_str; + + my @dests = email_dest($config, $mime); + if (scalar(@dests) == 0) { + push(@dests, $default_ibx); + } + die ("no destination for the email") unless scalar(@dests) > 0; + #printf("$mailbox dests: %d\n", scalar(@dests)); + for my $ibx (@dests) { + my $name = $ibx->{name}; + my $im; + if (exists($importers{$name})) { + $im = $importers{$name}->[0]; + } else { + my $wibx = PublicInbox::InboxWritable->new($ibx); + die "no wibx" unless defined($wibx); + $im = $wibx->importer(1); + die "no im" unless defined($im); + my @arr = ( $im, $ibx ); + $importers{$name} = \@arr; + } + $im->add($mime); + } + $last = $uid; + } + + if ($last) { + die ("no ibx's for $tracker->{url}") unless scalar(keys %importers) > 0; + for my $name (keys %importers) { + my $ref = delete $importers{$name}; + my ($im, $ibx) = @$ref; + $im->done(); + } + print("updating tracker for $tracker->{url}...\n"); + $tracker->update_last($validity, $last); + } + + return $more; +} + + +sub fetch_mailbox_loop($) +{ + my ($mailbox) = @_; + my $config = eval { PublicInbox::Config->new }; + die("No public inbox config found!") unless $config; + + my $ibx = $config->lookup_name($inbox_name); + die("Public inbox $inbox_name not found!") unless defined($ibx); + + my $tracker = PublicInbox::IMAPTracker->new($url); + my $client = imap_ssl_client() || die("No imap connection"); + my $validity = setup_mailbox($client, $mailbox); + + for (;;) { + return unless $client->IsConnected(); + my $more; + do { + $more = fetch_mailbox($config, $tracker, $client, $mailbox, $validity, $ibx); + return unless $client->IsConnected(); + } while ($more > 0); + + my @untagged; + do { + my $max_idle = 15; + $client->idle() or die("idle failed!\n"); + my @results = $client->idle_data($max_idle*60); + return unless $client->IsConnected(); + my @ret = $client->done(); + push(@results, @ret); + for my $line (@results) { + next if (!defined($line)); + if ($line =~ m/^[*].*$/) { + push(@untagged, $line); + } + } + } while (scalar(@untagged) == 0); + print("$mailbox: untagged: '@untagged'\n"); + } + + $client->close($mailbox); + $client->logout(); + +} + +sub handle_mailbox($) +{ + # Run fetch_mailbox_loop in it's own separate process so + # that if something goes wrong the process exits + # and everything cleans up properly. + # + # Running fetch_mailbox_loop in an eval block is not enough + # to prevent leaks of locks and other resources. + my ($mailbox) = @_; + + for (;;) { + my $child = fork(); + if (!defined($child)) { + warn("fork failed: for $mailbox\n"); + continue; + } + elsif ($child == 0) { + # in the child + fetch_mailbox_loop($mailbox); + exit(0); + } + else { + my $sleep = 5; + warn("------------------------- CHILD: $child $mailbox -------------------------\n"); + my $pid = waitpid($child, 0); + if ($pid != $child) { + exit(1); + } + print("$mailbox done\n"); + sync(); + print("\n$mailbox: Sleeping for $sleep minutes\n\n"); + sleep($sleep*60); + } + } + exit(2); +} + +handle_mailbox($mailbox); + +1; -- 2.20.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-15 21:02 ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman @ 2020-05-15 21:26 ` Eric W. Biederman 2020-05-15 22:56 ` Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-15 21:26 UTC (permalink / raw) To: Eric Wong; +Cc: meta ebiederm@xmission.com (Eric W. Biederman) writes: > The command imap_fetch connects to the specified imap mailbox and > fetches any unfetch messages than waits with imap idle until there are > more messages to fetch. > > By default messages are placed in the specified public inbox mailbox. > The value of List-ID is consulted and if it is present used to > select an alternate public-inbox mailbox to place the messages in. > > The email messages are placed without modification into the public > inbox repository so minimize changes of corruption or of loosing > valuable information. I use the command imap_fetch for all of my > email and not just a mailling list mirror so I don't want automation > to accidentally cause something important to be lost. > > No email messages are deleted from the server instead IMAPTracker > is used to remember which messages were downloaded. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- Bah. I sent this a little too soon. The patch needs this small incremental fix to actually work. Anyway I hope this is enough to help you with your integration of imap functionality into public-inbox-watch. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-15 21:26 ` Eric W. Biederman @ 2020-05-15 22:56 ` Eric Wong 2020-05-16 10:47 ` Eric W. Biederman 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2020-05-15 22:56 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > ebiederm@xmission.com (Eric W. Biederman) writes: > > > The command imap_fetch connects to the specified imap mailbox and > > fetches any unfetch messages than waits with imap idle until there are > > more messages to fetch. > > > > By default messages are placed in the specified public inbox mailbox. > > The value of List-ID is consulted and if it is present used to > > select an alternate public-inbox mailbox to place the messages in. > > > > The email messages are placed without modification into the public > > inbox repository so minimize changes of corruption or of loosing > > valuable information. I use the command imap_fetch for all of my > > email and not just a mailling list mirror so I don't want automation > > to accidentally cause something important to be lost. Btw, Email::MIME usage is gone from 1.5.0 due to nasty performance problems and replaced by PublicInbox::Eml. Eml should be completely non-destructive unless somebody sends an abusive message which exceeds the new safety limits; in which case it won't OOM or burn CPU like E::M did. That said, {-public_inbox_raw} still works and Eml looks like a drop-in replacement as far as imap_fetch is concerned. > > No email messages are deleted from the server instead IMAPTracker > > is used to remember which messages were downloaded. Yup. I've integrated IMAPTracker into a local branch, already. I think it could be reused for tracking NNTP fetches, too, since UIDs and NNTP article numbers seem interchangeable. > Bah. I sent this a little too soon. The patch needs this small > incremental fix to actually work. No worries. Btw, any reason you create the SSLSocket yourself instead of passing (Ssl => \@SSL_Socket_options) to IMAPClient->new? Creating and maintaining client-side config options for TLS is another thing that'll need to be done for -watch. I guess we'll steal configuration names/switches from git config/CLI tools to ease the learning curve. I'm already reusing "git credential" for retrieving/storing login information and it seems nice. > Anyway I hope this is enough to help you with your integration of imap > functionality into public-inbox-watch. Yup, thanks again! ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-15 22:56 ` Eric Wong @ 2020-05-16 10:47 ` Eric W. Biederman 2020-05-16 19:12 ` Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-16 10:47 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@yhbt.net> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> ebiederm@xmission.com (Eric W. Biederman) writes: >> >> > The command imap_fetch connects to the specified imap mailbox and >> > fetches any unfetch messages than waits with imap idle until there are >> > more messages to fetch. >> > >> > By default messages are placed in the specified public inbox mailbox. >> > The value of List-ID is consulted and if it is present used to >> > select an alternate public-inbox mailbox to place the messages in. >> > >> > The email messages are placed without modification into the public >> > inbox repository so minimize changes of corruption or of loosing >> > valuable information. I use the command imap_fetch for all of my >> > email and not just a mailling list mirror so I don't want automation >> > to accidentally cause something important to be lost. > > Btw, Email::MIME usage is gone from 1.5.0 due to nasty > performance problems and replaced by PublicInbox::Eml. Eml > should be completely non-destructive unless somebody sends an > abusive message which exceeds the new safety limits; in which > case it won't OOM or burn CPU like E::M did. > > That said, {-public_inbox_raw} still works and Eml looks > like a drop-in replacement as far as imap_fetch is concerned. I almost did that. But I looked and saw PublicInbox::MIME still present and a number of other references to Email::MIME so I wasn't certain exactly how that was being handled. But since Email::MIME still worked I didn't mess with that. >> > No email messages are deleted from the server instead IMAPTracker >> > is used to remember which messages were downloaded. > > Yup. I've integrated IMAPTracker into a local branch, already. > I think it could be reused for tracking NNTP fetches, too, > since UIDs and NNTP article numbers seem interchangeable. True. I guess it is possible to make the equivalent of my imap_fetch for NNTP queries as well. Does NNTP have the equivalent of IDLE where it is possible to get timely updates of new messages? I suppose I should see if the git protocol does as well. I am trying to figure out what the best way to track other public inbox git repositories. Currently I have a script that every N minutes does git remote update. Which is ok. But I think something like imap_fetch might be nicer. >> Bah. I sent this a little too soon. The patch needs this small >> incremental fix to actually work. > > No worries. > > Btw, any reason you create the SSLSocket yourself instead of > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new? When I read the documentation it looked like that was the way to do things. Even now when I reread the documentation that looks like the way to go. Especially if I wanted to be certain the connection was encrypted. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-16 10:47 ` Eric W. Biederman @ 2020-05-16 19:12 ` Eric Wong 2020-05-16 20:09 ` Eric W. Biederman 0 siblings, 1 reply; 17+ messages in thread From: Eric Wong @ 2020-05-16 19:12 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@yhbt.net> writes: > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> > The email messages are placed without modification into the public > >> > inbox repository so minimize changes of corruption or of loosing > >> > valuable information. I use the command imap_fetch for all of my > >> > email and not just a mailling list mirror so I don't want automation > >> > to accidentally cause something important to be lost. > > > > Btw, Email::MIME usage is gone from 1.5.0 due to nasty > > performance problems and replaced by PublicInbox::Eml. Eml > > should be completely non-destructive unless somebody sends an > > abusive message which exceeds the new safety limits; in which > > case it won't OOM or burn CPU like E::M did. > > > > That said, {-public_inbox_raw} still works and Eml looks > > like a drop-in replacement as far as imap_fetch is concerned. > > I almost did that. But I looked and saw PublicInbox::MIME still present > and a number of other references to Email::MIME so I wasn't certain > exactly how that was being handled. But since Email::MIME still > worked I didn't mess with that. I think the Import .pod documentation is the only place aside from some random comments and maintainer tests in xt/*, right? I'll definitely keep the import + indexing parts compatible with Email::MIME. I'm not sure how much of a Perl API I want to expose in case this gets reimplemented/migrated to another language someday, but maybe exposing the most heavily-used parts is OK. I also don't think another language is likely, at this point. > >> > No email messages are deleted from the server instead IMAPTracker > >> > is used to remember which messages were downloaded. > > > > Yup. I've integrated IMAPTracker into a local branch, already. > > I think it could be reused for tracking NNTP fetches, too, > > since UIDs and NNTP article numbers seem interchangeable. > > True. I guess it is possible to make the equivalent of my imap_fetch > for NNTP queries as well. Does NNTP have the equivalent of IDLE where > it is possible to get timely updates of new messages? I haven't seen anything in NNTP like IDLE. Maybe there's something in the draft stages, but AFAIK nothing widely implemented. > I suppose I should see if the git protocol does as well. I am trying to > figure out what the best way to track other public inbox git repositories. I'll likely implement a custom HTTP endpoint and usable with (curl -XIDLE ...); but the timeout will need to vary depending on firewalls. > Currently I have a script that every N minutes does git remote update. > Which is ok. But I think something like imap_fetch might be nicer. Definitely. > >> Bah. I sent this a little too soon. The patch needs this small > >> incremental fix to actually work. > > > > No worries. > > > > Btw, any reason you create the SSLSocket yourself instead of > > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new? > > When I read the documentation it looked like that was the way to do > things. Even now when I reread the documentation that looks like the > way to go. Especially if I wanted to be certain the connection was > encrypted. There seems more than one way to do it, but `Starttls' and `Ssl' are just as documented from what I tell (in v3.38). Socket/RawSocket seem useful for using an external command to connect/launch an IMAP tunnel or server; so it'll be used to mimic the `imap.tunnel' support of git-imap-send. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox 2020-05-16 19:12 ` Eric Wong @ 2020-05-16 20:09 ` Eric W. Biederman 2020-05-16 22:53 ` [PATCH] confine Email::MIME use even further Eric Wong 0 siblings, 1 reply; 17+ messages in thread From: Eric W. Biederman @ 2020-05-16 20:09 UTC (permalink / raw) To: Eric Wong; +Cc: meta Eric Wong <e@yhbt.net> writes: > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> Eric Wong <e@yhbt.net> writes: >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: >> >> > The email messages are placed without modification into the public >> >> > inbox repository so minimize changes of corruption or of loosing >> >> > valuable information. I use the command imap_fetch for all of my >> >> > email and not just a mailling list mirror so I don't want automation >> >> > to accidentally cause something important to be lost. >> > >> > Btw, Email::MIME usage is gone from 1.5.0 due to nasty >> > performance problems and replaced by PublicInbox::Eml. Eml >> > should be completely non-destructive unless somebody sends an >> > abusive message which exceeds the new safety limits; in which >> > case it won't OOM or burn CPU like E::M did. >> > >> > That said, {-public_inbox_raw} still works and Eml looks >> > like a drop-in replacement as far as imap_fetch is concerned. >> >> I almost did that. But I looked and saw PublicInbox::MIME still present >> and a number of other references to Email::MIME so I wasn't certain >> exactly how that was being handled. But since Email::MIME still >> worked I didn't mess with that. > > I think the Import .pod documentation is the only place aside > from some random comments and maintainer tests in xt/*, right? I am looking at 1.5.0 so you may have made a bit more progress but Import.pm still uses Email::MIME, and PublicInbox::MIME still uses Email::MIME as a base class. > I'll definitely keep the import + indexing parts compatible with > Email::MIME. > > I'm not sure how much of a Perl API I want to expose in case > this gets reimplemented/migrated to another language someday, > but maybe exposing the most heavily-used parts is OK. > > I also don't think another language is likely, at this point. >> >> > No email messages are deleted from the server instead IMAPTracker >> >> > is used to remember which messages were downloaded. >> > >> > Yup. I've integrated IMAPTracker into a local branch, already. >> > I think it could be reused for tracking NNTP fetches, too, >> > since UIDs and NNTP article numbers seem interchangeable. >> >> True. I guess it is possible to make the equivalent of my imap_fetch >> for NNTP queries as well. Does NNTP have the equivalent of IDLE where >> it is possible to get timely updates of new messages? > > I haven't seen anything in NNTP like IDLE. Maybe there's > something in the draft stages, but AFAIK nothing widely > implemented. > >> I suppose I should see if the git protocol does as well. I am trying to >> figure out what the best way to track other public inbox git repositories. > > I'll likely implement a custom HTTP endpoint and usable with > (curl -XIDLE ...); but the timeout will need to vary depending > on firewalls. > >> Currently I have a script that every N minutes does git remote update. >> Which is ok. But I think something like imap_fetch might be nicer. > > Definitely. > >> >> Bah. I sent this a little too soon. The patch needs this small >> >> incremental fix to actually work. >> > >> > No worries. >> > >> > Btw, any reason you create the SSLSocket yourself instead of >> > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new? >> >> When I read the documentation it looked like that was the way to do >> things. Even now when I reread the documentation that looks like the >> way to go. Especially if I wanted to be certain the connection was >> encrypted. > > There seems more than one way to do it, but `Starttls' and `Ssl' > are just as documented from what I tell (in v3.38). > Socket/RawSocket seem useful for using an external command to > connect/launch an IMAP tunnel or server; so it'll be used to > mimic the `imap.tunnel' support of git-imap-send. Now that you point it out I can see it. Commands like starttls are a bit dangerous as they are subject to man in the middle attacks. But I think that is the difference of just tossing something together for yourself versus making something that works with everyone's setup. The one challenge I ran into was getting ssl verification to work on RHEL7. Apparently IO::Socket::SSL::default_ca() does not exist in the old version of perl that comes with RHEL7. Which is why I have the %ca and the eval. Eric ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] confine Email::MIME use even further 2020-05-16 20:09 ` Eric W. Biederman @ 2020-05-16 22:53 ` Eric Wong 0 siblings, 0 replies; 17+ messages in thread From: Eric Wong @ 2020-05-16 22:53 UTC (permalink / raw) To: Eric W. Biederman; +Cc: meta "Eric W. Biederman" <ebiederm@xmission.com> wrote: > Eric Wong <e@yhbt.net> writes: > > > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> Eric Wong <e@yhbt.net> writes: > >> > "Eric W. Biederman" <ebiederm@xmission.com> wrote: > >> >> > The email messages are placed without modification into the public > >> >> > inbox repository so minimize changes of corruption or of loosing > >> >> > valuable information. I use the command imap_fetch for all of my > >> >> > email and not just a mailling list mirror so I don't want automation > >> >> > to accidentally cause something important to be lost. > >> > > >> > Btw, Email::MIME usage is gone from 1.5.0 due to nasty > >> > performance problems and replaced by PublicInbox::Eml. Eml > >> > should be completely non-destructive unless somebody sends an > >> > abusive message which exceeds the new safety limits; in which > >> > case it won't OOM or burn CPU like E::M did. > >> > > >> > That said, {-public_inbox_raw} still works and Eml looks > >> > like a drop-in replacement as far as imap_fetch is concerned. > >> > >> I almost did that. But I looked and saw PublicInbox::MIME still present > >> and a number of other references to Email::MIME so I wasn't certain > >> exactly how that was being handled. But since Email::MIME still > >> worked I didn't mess with that. > > > > I think the Import .pod documentation is the only place aside > > from some random comments and maintainer tests in xt/*, right? > > I am looking at 1.5.0 so you may have made a bit more progress > but Import.pm still uses Email::MIME, > and PublicInbox::MIME still uses Email::MIME as a base class. Yeah, PublicInbox::MIME only existed to workaround old bugs in Email::MIME, so we used it everywhere for years and and will keep it in old tests. Below is a patch to remove most references to Email::MIME; but I guess PublicInbox::Eml will need POD docs at some point... > >> > Btw, any reason you create the SSLSocket yourself instead of > >> > passing (Ssl => \@SSL_Socket_options) to IMAPClient->new? > >> > >> When I read the documentation it looked like that was the way to do > >> things. Even now when I reread the documentation that looks like the > >> way to go. Especially if I wanted to be certain the connection was > >> encrypted. > > > > There seems more than one way to do it, but `Starttls' and `Ssl' > > are just as documented from what I tell (in v3.38). > > Socket/RawSocket seem useful for using an external command to > > connect/launch an IMAP tunnel or server; so it'll be used to > > mimic the `imap.tunnel' support of git-imap-send. > > Now that you point it out I can see it. Commands like starttls > are a bit dangerous as they are subject to man in the middle attacks. > > But I think that is the difference of just tossing something together > for yourself versus making something that works with everyone's setup. > > The one challenge I ran into was getting ssl verification to work on > RHEL7. Apparently IO::Socket::SSL::default_ca() does not exist in > the old version of perl that comes with RHEL7. Which is why I have > the %ca and the eval. Ouch. Yes, I remember that being a problem for testing NNTPS, too. Net::NNTP doesn't support old IO::Socket::SSL, either. Don't feel obligated to figure this out; but how did IO::Socket::SSL work before it got default_ca()? Did it force the user to configure that on their own, set it behind-the-scenes as a default, or did it (*gasp*) skip verification? -----------8<----------- Subject: [PATCH] confine Email::MIME use even further To avoid confusing future readers and users, recommend PublicInbox::Eml in our Import POD and refer to PublicInbox::Eml comments at the top of PublicInbox::MIME. mime_load() confined to t/eml.t, since we won't be using it anywhere else in our tests. --- lib/PublicInbox/Import.pm | 22 +++++++++++++--------- lib/PublicInbox/MIME.pm | 4 +++- lib/PublicInbox/TestCommon.pm | 10 +--------- t/eml.t | 6 ++++++ 4 files changed, 23 insertions(+), 19 deletions(-) diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm index fc61d062..792570c8 100644 --- a/lib/PublicInbox/Import.pm +++ b/lib/PublicInbox/Import.pm @@ -648,7 +648,10 @@ version 1.0 =head1 SYNOPSIS - use Email::MIME; + use PublicInbox::Eml; + # PublicInbox::Eml exists as of public-inbox 1.5.0, + # Email::MIME was used in older versions + use PublicInbox::Git; use PublicInbox::Import; @@ -664,7 +667,7 @@ version 1.0 "Date: Thu, 01 Jan 1970 00:00:00 +0000\n" . "Message-ID: <m\@example.org>\n". "\ntest message"; - my $parsed = Email::MIME->new($message); + my $parsed = PublicInbox::Eml->new($message); my $ret = $im->add($parsed); if (!defined $ret) { warn "duplicate: ", @@ -675,7 +678,7 @@ version 1.0 $im->done; # to remove a message - my $junk = Email::MIME->new($message); + my $junk = PublicInbox::Eml->new($message); my ($mark, $orig) = $im->remove($junk); if ($mark eq 'MISSING') { print "not found\n"; @@ -690,8 +693,8 @@ version 1.0 =head1 DESCRIPTION -An importer and remover for public-inboxes which takes L<Email::MIME> -messages as input and stores them in a git repository as +An importer and remover for public-inboxes which takes C<PublicInbox::Eml> +or L<Email::MIME> messages as input and stores them in a git repository as documented in L<https://public-inbox.org/public-inbox-v1-format.txt>, except it does not allow duplicate Message-IDs. @@ -709,7 +712,7 @@ Initialize a new PublicInbox::Import object. =head2 add - my $parsed = Email::MIME->new($message); + my $parsed = PublicInbox::Eml->new($message); $im->add($parsed); Adds a message to to the git repository. This will acquire @@ -720,12 +723,13 @@ is called, but L</remove> may be called on them. =head2 remove - my $junk = Email::MIME->new($message); + my $junk = PublicInbox::Eml->new($message); my ($code, $orig) = $im->remove($junk); Removes a message from the repository. On success, it returns a ':'-prefixed numeric code representing the git-fast-import -mark and the original messages as an Email::MIME object. +mark and the original messages as a PublicInbox::Eml +(or Email::MIME) object. If the message could not be found, the code is "MISSING" and the original message is undef. If there is a mismatch where the "Message-ID" is matched but the subject and body do not match, @@ -749,7 +753,7 @@ The mail archives are hosted at L<https://public-inbox.org/meta/> =head1 COPYRIGHT -Copyright (C) 2016 all contributors L<mailto:meta@public-inbox.org> +Copyright (C) 2016-2020 all contributors L<mailto:meta@public-inbox.org> License: AGPL-3.0+ L<http://www.gnu.org/licenses/agpl-3.0.txt> diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm index 9077386a..831a3d19 100644 --- a/lib/PublicInbox/MIME.pm +++ b/lib/PublicInbox/MIME.pm @@ -4,7 +4,9 @@ # The license for this file differs from the rest of public-inbox. # # We no longer load this in any of our code outside of maintainer -# tests for compatibility. +# tests for compatibility. PublicInbox::Eml is favored throughout +# our codebase for performance and safety reasons, though we maintain +# Email::MIME-compatibility in mail injection and indexing code paths. # # It monkey patches the "parts_multipart" subroutine with patches # from Matthew Horsfall <wolfsage@gmail.com> at: diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm index d952ee6d..79e597f5 100644 --- a/lib/PublicInbox/TestCommon.pm +++ b/lib/PublicInbox/TestCommon.pm @@ -9,15 +9,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek); use POSIX qw(dup2); use IO::Socket::INET; our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods - run_script start_script key2sub xsys xqx mime_load eml_load); - -sub mime_load ($) { - my ($path) = @_; - open(my $fh, '<', $path) or die "open $path: $!"; - # test should've called: require_mods('Email::MIME') - require PublicInbox::MIME; - PublicInbox::MIME->new(\(do { local $/; <$fh> })); -} + run_script start_script key2sub xsys xqx eml_load); sub eml_load ($) { my ($path, $cb) = @_; diff --git a/t/eml.t b/t/eml.t index b7f58ac7..1892b001 100644 --- a/t/eml.t +++ b/t/eml.t @@ -12,6 +12,12 @@ SKIP: { }; use_ok $_ for @classes; +sub mime_load ($) { + my ($path) = @_; + open(my $fh, '<', $path) or die "open $path: $!"; + PublicInbox::MIME->new(\(do { local $/; <$fh> })); +} + { my $eml = PublicInbox::Eml->new(\(my $str = "a: b\n\nhi\n")); is($str, "hi\n", '->new modified body like Email::Simple'); ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-05-16 22:53 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-10-29 14:40 I have figured out IMAP IDLE Eric W. Biederman 2019-10-29 22:31 ` Eric Wong 2019-10-29 23:12 ` WWW::Curl [was: I have figured out IMAP IDLE] Eric Wong 2019-11-03 16:28 ` I have figured out IMAP IDLE Eric W. Biederman 2020-05-13 19:31 ` Eric Wong 2020-05-13 21:48 ` Eric W. Biederman 2020-05-13 22:17 ` Eric Wong 2020-05-14 12:32 ` Eric W. Biederman 2020-05-14 16:15 ` Eric Wong 2020-05-15 21:00 ` [PATCH 1/2] IMAPTracker: Add a helper to track our place in reading imap mailboxes Eric W. Biederman 2020-05-15 21:02 ` [PATCH 2/2] imap_fetch: Add a command to continuously fetch from an imap mailbox Eric W. Biederman 2020-05-15 21:26 ` Eric W. Biederman 2020-05-15 22:56 ` Eric Wong 2020-05-16 10:47 ` Eric W. Biederman 2020-05-16 19:12 ` Eric Wong 2020-05-16 20:09 ` Eric W. Biederman 2020-05-16 22:53 ` [PATCH] confine Email::MIME use even further Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).