unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/13] eml: pure-Perl replacement for Email::MIME
@ 2020-05-07 21:05 Eric Wong
  2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Eric Wong (13):
  msg_iter: make ->each_part method for PublicInbox::MIME
  msg_iter: pass $idx as a scalar, not array
  filter/rubylang: avoid recursing subparts to strip trailers
  smsg: use capitalization for header retrieval
  eml: pure-Perl replacement for Email::MIME
  switch read-only Email::Simple users to Eml
  replace most uses of PublicInbox::MIME with Eml
  EmlContentFoo: Email::MIME::ContentType replacement
  EmlContentFoo: relax Encode version requirement
  eml: remove dependency on Email::MIME::Encodings
  xt: eml comparison tests
  remove most internal Email::MIME usage
  eml: drop trailing blank line on missing epilogue

 Documentation/mknews.perl          |   4 +-
 INSTALL                            |  26 +-
 MANIFEST                           |   7 +
 Makefile.PL                        |   7 +-
 ci/deps.perl                       |   3 -
 lib/PublicInbox/Admin.pm           |   2 +-
 lib/PublicInbox/Eml.pm             | 421 +++++++++++++++++++++++++++++
 lib/PublicInbox/EmlContentFoo.pm   | 317 ++++++++++++++++++++++
 lib/PublicInbox/Filter/RubyLang.pm |  32 ++-
 lib/PublicInbox/Filter/Vger.pm     |   4 +-
 lib/PublicInbox/Import.pm          |  11 +-
 lib/PublicInbox/Inbox.pm           |   4 +-
 lib/PublicInbox/InboxWritable.pm   |   4 +-
 lib/PublicInbox/MDA.pm             |   1 -
 lib/PublicInbox/MIME.pm            |   6 +
 lib/PublicInbox/Mbox.pm            |  16 +-
 lib/PublicInbox/MboxGz.pm          |   4 +-
 lib/PublicInbox/MsgIter.pm         |  21 +-
 lib/PublicInbox/MsgTime.pm         |   8 +-
 lib/PublicInbox/NNTP.pm            |  19 +-
 lib/PublicInbox/SearchIdx.pm       |   8 +-
 lib/PublicInbox/SearchIdxShard.pm  |   3 +-
 lib/PublicInbox/Smsg.pm            |  24 +-
 lib/PublicInbox/SolverGit.pm       |   4 +-
 lib/PublicInbox/TestCommon.pm      |  11 +-
 lib/PublicInbox/V2Writable.pm      |  17 +-
 lib/PublicInbox/View.pm            |  28 +-
 lib/PublicInbox/WWW.pm             |   8 +-
 lib/PublicInbox/WatchMaildir.pm    |   4 +-
 lib/PublicInbox/WwwAttach.pm       |  15 +-
 script/public-inbox-edit           |   8 +-
 script/public-inbox-learn          |   4 +-
 script/public-inbox-mda            |  16 +-
 script/public-inbox-purge          |   4 +-
 t/altid.t                          |   4 +-
 t/altid_v2.t                       |   4 +-
 t/cgi.t                            |   8 +-
 t/content_id.t                     |   6 +-
 t/convert-compact.t                |   4 +-
 t/edit.t                           |  20 +-
 t/eml.t                            | 363 +++++++++++++++++++++++++
 t/eml_content_disposition.t        | 102 +++++++
 t/eml_content_type.t               | 289 ++++++++++++++++++++
 t/feed.t                           |   6 +-
 t/filter_base.t                    |   4 +-
 t/filter_mirror.t                  |   2 +-
 t/filter_rubylang.t                |   8 +-
 t/filter_subjecttag.t              |   4 +-
 t/filter_vger.t                    |   6 +-
 t/html_index.t                     |   4 +-
 t/httpd.t                          |   4 +-
 t/import.t                         |   6 +-
 t/indexlevels-mirror.t             |   4 +-
 t/mda.t                            |   4 +-
 t/mda_filter_rubylang.t            |   2 +-
 t/mid.t                            |   4 +-
 t/mime.t                           |  82 +++---
 t/msg_iter.t                       |  10 +-
 t/msgtime.t                        |   6 +-
 t/multi-mid.t                      |   6 +-
 t/nntp.t                           |   4 +-
 t/nntpd-tls.t                      |   4 +-
 t/nntpd.t                          |   6 +-
 t/nulsubject.t                     |   2 +-
 t/plack.t                          |  10 +-
 t/precheck.t                       |  10 +-
 t/psgi_attach.t                    |   2 +-
 t/psgi_bad_mids.t                  |   4 +-
 t/psgi_mount.t                     |   4 +-
 t/psgi_multipart_not.t             |   4 +-
 t/psgi_scan_all.t                  |   4 +-
 t/psgi_search.t                    |   8 +-
 t/psgi_text.t                      |   2 +-
 t/psgi_v2.t                        |   6 +-
 t/purge.t                          |   2 +-
 t/replace.t                        |  12 +-
 t/reply.t                          |   4 +-
 t/search-thr-index.t               |   6 +-
 t/search.t                         |  26 +-
 t/solver_git.t                     |   4 +-
 t/spamcheck_spamc.t                |   8 +-
 t/thread-cycle.t                   |   3 +-
 t/time.t                           |   4 +-
 t/v1-add-remove-add.t              |   4 +-
 t/v1reindex.t                      |   4 +-
 t/v2-add-remove-add.t              |   4 +-
 t/v2mda.t                          |   4 +-
 t/v2mirror.t                       |   4 +-
 t/v2reindex.t                      |   8 +-
 t/v2writable.t                     |   8 +-
 t/watch_filter_rubylang.t          |   2 +-
 t/watch_maildir.t                  |   2 +-
 t/watch_maildir_v2.t               |   2 +-
 t/www_altid.t                      |   2 +-
 t/xcpdb-reshard.t                  |   4 +-
 xt/cmp-msgstr.t                    | 108 ++++++++
 xt/cmp-msgview.t                   |  95 +++++++
 xt/msgtime_cmp.t                   |  12 +-
 xt/perf-msgview.t                  |   2 +-
 99 files changed, 2084 insertions(+), 353 deletions(-)
 create mode 100644 lib/PublicInbox/Eml.pm
 create mode 100644 lib/PublicInbox/EmlContentFoo.pm
 create mode 100644 t/eml.t
 create mode 100644 t/eml_content_disposition.t
 create mode 100644 t/eml_content_type.t
 create mode 100644 xt/cmp-msgstr.t
 create mode 100644 xt/cmp-msgview.t


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

The reliance on Email::MIME->subparts is a tad inefficient with
a work-in-progress module to replace Email::MIME.  So move
towards using ->each_part as a class-specific iterator which can
take advantage of more class-specific optimizations in the
yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime
classes.

The msg_iter() sub remains for compatibility with existing
3rd-party scripts/modules which use our small public Perl API
and Email::MIME.
---
 lib/PublicInbox/MIME.pm      |  3 +++
 lib/PublicInbox/MsgIter.pm   | 15 +++++++++++++--
 lib/PublicInbox/SolverGit.pm |  4 ++--
 lib/PublicInbox/View.pm      | 10 +++++-----
 lib/PublicInbox/WwwAttach.pm |  4 ++--
 5 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm
index 456eed64..b795b93b 100644
--- a/lib/PublicInbox/MIME.pm
+++ b/lib/PublicInbox/MIME.pm
@@ -24,6 +24,7 @@ use strict;
 use warnings;
 use base qw(Email::MIME);
 use Email::MIME::ContentType;
+use PublicInbox::MsgIter ();
 $Email::MIME::ContentType::STRICT_PARAMS = 0;
 
 if ($Email::MIME::VERSION <= 1.937) {
@@ -101,4 +102,6 @@ sub parts_multipart {
 }
 }
 
+no warnings 'once';
+*each_part = \&PublicInbox::MsgIter::em_each_part;
 1;
diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index fa25564a..cd5a5d99 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -7,12 +7,12 @@ use strict;
 use warnings;
 use base qw(Exporter);
 our @EXPORT = qw(msg_iter msg_part_text);
-use PublicInbox::MIME;
 
+# This becomes PublicInbox::MIME->each_part:
 # Like Email::MIME::walk_parts, but this is:
 # * non-recursive
 # * passes depth and indices to the iterator callback
-sub msg_iter ($$;$$) {
+sub em_each_part ($$;$$) {
 	my ($mime, $cb, $cb_arg, $do_undef) = @_;
 	my @parts = $mime->subparts;
 	if (@parts) {
@@ -36,6 +36,17 @@ sub msg_iter ($$;$$) {
 	}
 }
 
+# Use this when we may accept Email::MIME from user scripts
+# (not just PublicInbox::MIME)
+sub msg_iter ($$;$$) { # $_[0] = PublicInbox::MIME/Email::MIME-like obj
+	my (undef, $cb, $cb_arg, $once) = @_;
+	if (my $ep = $_[0]->can('each_part')) { # PublicInbox::{MIME,*}
+		$ep->($_[0], $cb, $cb_arg, $once);
+	} else { # for compatibility with existing Email::MIME users:
+		em_each_part($_[0], $cb, $cb_arg, $once);
+	}
+}
+
 sub msg_part_text ($$) {
 	my ($part, $ct) = @_;
 
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index c32a5bae..f718e28c 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -14,7 +14,7 @@ use 5.010_001;
 use File::Temp 0.19 (); # 0.19 for ->newdir
 use Fcntl qw(SEEK_SET);
 use PublicInbox::Git qw(git_unquote git_quote);
-use PublicInbox::MsgIter qw(msg_iter msg_part_text);
+use PublicInbox::MsgIter qw(msg_part_text);
 use PublicInbox::Qspawn;
 use PublicInbox::Tmpfile;
 use URI::Escape qw(uri_escape_utf8);
@@ -234,7 +234,7 @@ sub find_extract_diffs ($$$) {
 	my $diffs = [];
 	foreach my $smsg (@$msgs) {
 		$ibx->smsg_mime($smsg) or next;
-		msg_iter(delete $smsg->{mime}, \&extract_diff,
+		delete($smsg->{mime})->each_part(\&extract_diff,
 				[$self, $diffs, $pre, $post, $ibx, $smsg], 1);
 	}
 	@$diffs ? $diffs : undef;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index f7a8ae32..e42fb362 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -243,7 +243,7 @@ sub index_entry {
 	# scan through all parts, looking for displayable text
 	$ctx->{mhref} = $mhref;
 	$ctx->{obuf} = \$rv;
-	msg_iter($mime, \&add_text_body, $ctx, 1);
+	$mime->each_part(\&add_text_body, $ctx, 1);
 	delete $ctx->{obuf};
 
 	# add the footer
@@ -474,10 +474,10 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 }
 
 sub multipart_text_as_html {
-	# ($mime, $ctx) = @_; # msg_iter will do "$_[0] = undef"
+	# ($mime, $ctx) = @_; # each_part may do "$_[0] = undef"
 
 	# scan through all parts, looking for displayable text
-	msg_iter($_[0], \&add_text_body, $_[1], 1);
+	$_[0]->each_part(\&add_text_body, $_[1], 1);
 }
 
 sub attach_link ($$$$;$) {
@@ -515,11 +515,11 @@ EOF
 	undef;
 }
 
-sub add_text_body { # callback for msg_iter
+sub add_text_body { # callback for each_part
 	my ($p, $ctx) = @_;
 	my $upfx = $ctx->{mhref};
 	my $ibx = $ctx->{-inbox};
-	# $p - from msg_iter: [ Email::MIME, depth, @idx ]
+	# $p - from each_part: [ Email::MIME-like, depth, @idx ]
 	my ($part, $depth, @idx) = @$p;
 	my $ct = $part->content_type || 'text/plain';
 	my $fn = $part->filename;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index f795618e..774b38ae 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -10,7 +10,7 @@ use Email::MIME::ContentType qw(parse_content_type);
 use PublicInbox::MIME;
 use PublicInbox::MsgIter;
 
-sub get_attach_i { # msg_iter callback
+sub get_attach_i { # ->each_part callback
 	my ($part, $depth, @idx) = @{$_[0]};
 	my $res = $_[1];
 	return if join('.', @idx) ne $res->[3]; # $idx
@@ -40,7 +40,7 @@ sub get_attach ($$$) {
 	my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
 	$mime = PublicInbox::MIME->new($mime);
 	$res->[3] = $idx;
-	msg_iter($mime, \&get_attach_i, $res, 1);
+	$mime->each_part(\&get_attach_i, $res, 1);
 	pop @$res; # cleanup before letting PSGI server see it
 	$res
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/13] msg_iter: pass $idx as a scalar, not array
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
  2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

This doesn't make any difference for most multipart
messages (or any single part messages).  However,
this starts having space savings when parts start
nesting.

It also slightly simplifies callers.
---
 lib/PublicInbox/MsgIter.pm   |  6 ++++--
 lib/PublicInbox/SearchIdx.pm |  2 +-
 lib/PublicInbox/View.pm      | 18 +++++++++---------
 lib/PublicInbox/WwwAttach.pm |  4 ++--
 t/mime.t                     |  5 +++--
 t/msg_iter.t                 |  2 +-
 6 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index cd5a5d99..7c28d019 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -20,12 +20,14 @@ sub em_each_part ($$;$$) {
 		my $i = 0;
 		@parts = map { [ $_, 1, ++$i ] } @parts;
 		while (my $p = shift @parts) {
-			my ($part, $depth, @idx) = @$p;
+			my ($part, $depth, $idx) = @$p;
 			my @sub = $part->subparts;
 			if (@sub) {
 				$depth++;
 				$i = 0;
-				@sub = map { [ $_, $depth, @idx, ++$i ] } @sub;
+				@sub = map {
+					[ $_, $depth, "$idx.".(++$i) ]
+				} @sub;
 				@parts = (@sub, @parts);
 			} else {
 				$cb->($p, $cb_arg);
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 25118f43..a7e31b71 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -277,7 +277,7 @@ sub index_diff ($$$) {
 }
 
 sub index_xapian { # msg_iter callback
-	my $part = $_[0]->[0]; # ignore $depth and @idx
+	my $part = $_[0]->[0]; # ignore $depth and $idx
 	my ($self, $doc) = @{$_[1]};
 	my $ct = $part->content_type || 'text/plain';
 	my $fn = $part->filename;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index e42fb362..3328c865 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -482,9 +482,8 @@ sub multipart_text_as_html {
 
 sub attach_link ($$$$;$) {
 	my ($ctx, $ct, $p, $fn, $err) = @_;
-	my ($part, $depth, @idx) = @$p;
-	my $nl = $idx[-1] > 1 ? "\n" : '';
-	my $idx = join('.', @idx);
+	my ($part, $depth, $idx) = @$p;
+	my $nl = substr($idx, -2) eq '.1' ? '' : "\n"; # like join("\n", ...)
 	my $size = bytes::length($part->body);
 
 	# hide attributes normally, unless we want to aid users in
@@ -519,8 +518,8 @@ sub add_text_body { # callback for each_part
 	my ($p, $ctx) = @_;
 	my $upfx = $ctx->{mhref};
 	my $ibx = $ctx->{-inbox};
-	# $p - from each_part: [ Email::MIME-like, depth, @idx ]
-	my ($part, $depth, @idx) = @$p;
+	# $p - from each_part: [ Email::MIME-like, depth, $idx ]
+	my ($part, $depth, $idx) = @$p;
 	my $ct = $part->content_type || 'text/plain';
 	my $fn = $part->filename;
 	my ($s, $err) = msg_part_text($part, $ct);
@@ -537,13 +536,14 @@ sub add_text_body { # callback for each_part
 	# headers for solver unless some coderepo are configured:
 	my $diff;
 	if ($s =~ /^--- [^\n]+\n\+{3} [^\n]+\n@@ /ms) {
-		# diffstat anchors do not link across attachments or messages:
-		$idx[0] = $upfx . $idx[0] if $upfx ne '';
-		$ctx->{-apfx} = join('/', @idx);
+		# diffstat anchors do not link across attachments or messages,
+		# -apfx is just a stable prefix for making diffstat anchors
+		# linkable to the first diff hunk w/o crossing attachments
+		$idx =~ tr!.!/!; # compatibility with previous versions
+		$ctx->{-apfx} = $upfx . $idx;
 
 		# do attr => filename mappings for diffstats in git diffs:
 		$ctx->{-anchors} = {} if $s =~ /^diff --git /sm;
-
 		$diff = 1;
 		delete $ctx->{-long_path};
 		my $spfx;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index 774b38ae..b1009907 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -11,9 +11,9 @@ use PublicInbox::MIME;
 use PublicInbox::MsgIter;
 
 sub get_attach_i { # ->each_part callback
-	my ($part, $depth, @idx) = @{$_[0]};
+	my ($part, $depth, $idx) = @{$_[0]};
 	my $res = $_[1];
-	return if join('.', @idx) ne $res->[3]; # $idx
+	return if $idx ne $res->[3]; # [0-9]+(?:\.[0-9]+)+
 	$res->[0] = 200;
 	my $ct = $part->content_type;
 	$ct = parse_content_type($ct) if $ct;
diff --git a/t/mime.t b/t/mime.t
index 0d478ace..b9a4d66b 100644
--- a/t/mime.t
+++ b/t/mime.t
@@ -98,9 +98,10 @@ $msg = PublicInbox::MIME->new($raw);
 my $nr = 0;
 msg_iter($msg, sub {
 	my ($part, $level, @ex) = @{$_[0]};
-	if ($ex[0] == 1) {
+	is($level, 1, 'at expected level');
+	if (join('fail if $#ex > 0', @ex) eq '1') {
 		is($part->body_str, "your tree directly? \r\n", 'body OK');
-	} elsif ($ex[0] == 2) {
+	} elsif (join('fail if $#ex > 0', @ex) eq '2') {
 		is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
 				"=7wIb\n" .
 				"-----END PGP SIGNATURE-----\n",
diff --git a/t/msg_iter.t b/t/msg_iter.t
index 5c57e043..e8115e25 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -28,7 +28,7 @@ use_ok('PublicInbox::MsgIter');
 		$s =~ s/\s+//s;
 		push @parts, [ $s, $level, @ex ];
 	});
-	is_deeply(\@parts, [ [qw(a 2 1 1)], [qw(b 2 1 2)], [qw(sig 1 2)] ],
+	is_deeply(\@parts, [ [qw(a 2 1.1)], [qw(b 2 1.2)], [qw(sig 1 2)] ],
 		'nested part shows up properly');
 }
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
  2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
  2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Mailman only seems to add trailers (or signatures) as
attachments at the top-level of MIME messages.  So don't bother
recursing with ->walk_parts since ->walk_parts is non-trivial to
recreate in the Email::MIME replacement I'm working on.
---
 lib/PublicInbox/Filter/RubyLang.pm | 32 ++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/Filter/RubyLang.pm b/lib/PublicInbox/Filter/RubyLang.pm
index a65a5971..06e4ea75 100644
--- a/lib/PublicInbox/Filter/RubyLang.pm
+++ b/lib/PublicInbox/Filter/RubyLang.pm
@@ -28,19 +28,29 @@ sub new {
 	$self;
 }
 
+sub scrub_part ($) {
+	my ($part) = @_;
+	my $ct = $part->content_type;
+	if (!$ct || $ct =~ m{\btext/plain\b}i) {
+		my $s = eval { $part->body_str };
+		if (defined $s && $s =~ s/\n?$l1\n$l2\n\z//os) {
+			$part->body_str_set($s);
+			return 1;
+		}
+	}
+	0;
+}
+
 sub scrub {
 	my ($self, $mime, $for_remove) = @_;
-	# no msg_iter here, that is only for read-only access
-	$mime->walk_parts(sub {
-		my ($part) = $_[0];
-		my $ct = $part->content_type;
-		if (!$ct || $ct =~ m{\btext/plain\b}i) {
-			my $s = eval { $part->body_str };
-			if (defined $s && $s =~ s/\n?$l1\n$l2\n\z//os) {
-				$part->body_str_set($s);
-			}
-		}
-	});
+	# no msg_iter here, msg_iter is only for read-only access
+	if (my @sub = $mime->subparts) {
+		my $changed = 0;
+		$changed |= scrub_part($_) for @sub;
+		$mime->parts_set(\@sub) if $changed;
+	} else {
+		scrub_part($mime);
+	}
 	my $altid = $self->{-altid};
 	if ($altid && !$for_remove) {
 		my $hdr = $mime->header_obj;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/13] smsg: use capitalization for header retrieval
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (2 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

PublicInbox::Eml will have case-sensitive memoization to
avoid the need to call `lc' to retrieve common headers,
so ensure we call $mime->header() with the common
capitalization.

Unfortunately, we need to continue using lowercase for field
names for smsg, since NNTP requires case-insensitivity when
matching headers and method dispatch is expensive.
---
 lib/PublicInbox/Smsg.pm | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 7c90b92d..7a2766d8 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -106,20 +106,18 @@ sub lines ($) { $_[0]->{lines} }
 
 sub __hdr ($$) {
 	my ($self, $field) = @_;
-	my $val = $self->{$field};
-	return $val if defined $val;
-
-	my $mime = $self->{mime} or return;
-	my @raw = $mime->header($field);
-	$val = join(', ', @raw);
-	$val =~ tr/\t\n/  /;
-	$val =~ tr/\r//d;
-	$self->{$field} = $val;
+	$self->{lc($field)} //= do {
+		my $mime = $self->{mime} or return;
+		my $val = join(', ', $mime->header($field));
+		$val =~ tr/\r//d;
+		$val =~ tr/\t\n/  /;
+		$val;
+	};
 }
 
-sub subject ($) { __hdr($_[0], 'subject') }
-sub to ($) { __hdr($_[0], 'to') }
-sub cc ($) { __hdr($_[0], 'cc') }
+sub subject ($) { __hdr($_[0], 'Subject') }
+sub to ($) { __hdr($_[0], 'To') }
+sub cc ($) { __hdr($_[0], 'Cc') }
 
 # no strftime, that is locale-dependent and not for RFC822
 my @DoW = qw(Sun Mon Tue Wed Thu Fri Sat);
@@ -137,7 +135,7 @@ sub date ($) {
 
 sub from ($) {
 	my ($self) = @_;
-	my $from = __hdr($self, 'from');
+	my $from = __hdr($self, 'From');
 	if (defined $from && !defined $self->{from_name}) {
 		my @n = PublicInbox::Address::names($from);
 		$self->{from_name} = join(', ', @n);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/13] eml: pure-Perl replacement for Email::MIME
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (3 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Email::MIME eats memory, wastes time parsing out all the
headers, and some problems can't be fixed without breaking
compatibility for other projects which depend on it.

Informal benchmarks show a ~2x improvement in general
stats gathering scripts and ~10% improvement in HTML
view rendering.

We also don't need the ability to create MIME messages, just
parse them and maybe drop an attachment.

While this isn't the zero-copy or streaming MIME parser of my
dreams; it's still an improvement in that it doesn't keep a
scalar copy of the raw body around along with subparts.  It also
doesn't parse subparts up front, so it can also replace our uses
of Email::Simple.
---
 MANIFEST                      |   2 +
 lib/PublicInbox/Eml.pm        | 393 ++++++++++++++++++++++++++++++++++
 lib/PublicInbox/TestCommon.pm |   9 +-
 t/eml.t                       | 363 +++++++++++++++++++++++++++++++
 4 files changed, 766 insertions(+), 1 deletion(-)
 create mode 100644 lib/PublicInbox/Eml.pm
 create mode 100644 t/eml.t

diff --git a/MANIFEST b/MANIFEST
index 90a05d33..0906448e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -105,6 +105,7 @@ lib/PublicInbox/DSKQXS.pm
 lib/PublicInbox/DSPoll.pm
 lib/PublicInbox/Daemon.pm
 lib/PublicInbox/Emergency.pm
+lib/PublicInbox/Eml.pm
 lib/PublicInbox/ExtMsg.pm
 lib/PublicInbox/Feed.pm
 lib/PublicInbox/Filter/Base.pm
@@ -229,6 +230,7 @@ t/ds-leak.t
 t/ds-poll.t
 t/edit.t
 t/emergency.t
+t/eml.t
 t/epoll.t
 t/fail-bin/spamc
 t/feed.t
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
new file mode 100644
index 00000000..0c23bed0
--- /dev/null
+++ b/lib/PublicInbox/Eml.pm
@@ -0,0 +1,393 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Lazy MIME parser, it still slurps the full message but keeps short
+# lifetimes.  Unlike Email::MIME, it doesn't pre-split multipart
+# messages or do any up-front parsing of headers besides splitting
+# the header string from the body.
+#
+# Contains ideas and code from Email::Simple and Email::MIME
+# (Perl Artistic License, GPL-1+)
+#
+# This aims to replace Email::MIME for our purposes, similar API
+# but internal field names are differ if they're not 100%-compatible.
+#
+# Includes some proposed fixes for Email::MIME:
+# - header-less sub parts - https://github.com/rjbs/Email-MIME/issues/14
+# - "0" as boundary - https://github.com/rjbs/Email-MIME/issues/63
+#
+# $self = {
+#	bdy => scalar ref for body (may be undef),
+#	hdr => scalar ref for header,
+#	crlf => "\n" or "\r\n" (scalar, not a ref),
+#
+#	# filled in during ->each_part
+#	ct => hash ref returned by parse_content_type
+# }
+package PublicInbox::Eml;
+use strict;
+use v5.10.1;
+use Carp qw(croak);
+use Encode qw(find_encoding decode encode); # stdlib
+use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
+
+my $MIME_Header = find_encoding('MIME-Header');
+
+# TODO remove these dependencies
+use Email::MIME::ContentType;
+use Email::MIME::Encodings;
+$Email::MIME::ContentType::STRICT_PARAMS = 0;
+
+our $MAXPARTS = 1000; # same as SpamAssassin
+our $MAXDEPTH = 20; # seems enough, Perl sucks, here
+our $MAXBOUNDLEN = 2048; # same as postfix
+
+my $NO_ENCODE_RE = qr/\A(?:7bit|8bit|binary)[ \t]*(?:;|$)?/i;
+my %DECODE_ADDRESS = map { $_ => 1 } qw(From To Cc Sender Reply-To);
+my %DECODE_FULL = (
+	Subject => 1,
+	'Content-Description' => 1,
+	'Content-Type' => 1, # not correct, but needed, oh well
+);
+our %STR_TYPE = (text => 1);
+our %STR_SUBTYPE = (plain => 1, html => 1);
+
+my %re_memo;
+sub re_memo ($) {
+	my ($k) = @_;
+	# Do not normalize $k with lc/uc; instead strive to keep
+	# capitalization in our codebase consistent.
+	$re_memo{$k} ||= qr/^\Q$k\E:[ \t]*([^\n]*\r?\n # 1st line
+					# continuation lines:
+					(?:[^:\n]*?[ \t]+[^\n]*\r?\n)*)
+					/ismx
+}
+
+# compatible with our uses of Email::MIME
+sub new {
+	my $ref = ref($_[1]) ? $_[1] : \(my $cpy = $_[1]);
+	if ($$ref =~ /(?:\r?\n(\r?\n))/gs) { # likely
+		# This can modify $$ref in-place and to avoid memcpy/memmove
+		# on a potentially large $$ref.  It does need to make a
+		# copy for $hdr, though.  Idea stolen from Email::Simple
+		my $hdr = substr($$ref, 0, pos($$ref), ''); # sv_chop on $$ref
+		substr($hdr, -(length($1))) = ''; # lower SvCUR
+		bless { hdr => \$hdr, crlf => $1, bdy => $ref }, __PACKAGE__;
+	} elsif ($$ref =~ /^[a-z0-9-]+[ \t]*:/ims && $$ref =~ /(\r?\n)\z/s) {
+		# body is optional :P
+		bless { hdr => \($$ref), crlf => $1 }, __PACKAGE__;
+	} else { # nothing useful
+		my $hdr = $$ref = '';
+		bless { hdr => \$hdr, crlf => "\n" }, __PACKAGE__;
+	}
+}
+
+sub new_sub {
+	my (undef, $ref) = @_;
+	# special case for messages like <85k5su9k59.fsf_-_@lola.goethe.zz>
+	$$ref =~ /\A(?:(\r?\n))/gs or goto &new;
+	my $hdr = substr($$ref, 0, pos($$ref), ''); # sv_chop on $$ref
+	bless { hdr => \$hdr, crlf => $1, bdy => $ref }, __PACKAGE__;
+}
+
+# same output as Email::Simple::Header::header_raw, but we extract
+# headers on-demand instead of parsing them into a list which
+# requires O(n) lookups anyways
+sub header_raw {
+	my $re = re_memo($_[1]);
+	my @v = (${ $_[0]->{hdr} } =~ /$re/g);
+	for (@v) {
+		# for compatibility w/ Email::Simple::Header,
+		s/\s+\z//s;
+		s/\A\s+//s;
+		s/\r?\n[ \t]*/ /gs;
+	}
+	wantarray ? @v : $v[0];
+}
+
+# pick the first Content-Type header to match Email::MIME behavior.
+# It's usually the right one based on historical archives.
+sub ct ($) {
+	# Email::MIME::ContentType::content_type:
+	$_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
+}
+
+sub body_decode ($$) {
+	my $cte = header_raw($_[0], 'Content-Transfer-Encoding');
+	($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) if $cte; # For S/MIME, etc
+	(!$cte || $cte =~ $NO_ENCODE_RE) ?
+		$_[1] : Email::MIME::Encodings::decode($cte, $_[1], '7bit');
+}
+
+# returns a queue of sub-parts iff it's worth descending into
+# TODO: descend into message/rfc822 parts (Email::MIME didn't)
+sub mp_descend ($$) {
+	my ($self, $nr) = @_; # or $once for top-level
+	my $bnd = ct($self)->{attributes}->{boundary} // return; # single-part
+	return if $bnd eq '' || length($bnd) >= $MAXBOUNDLEN;
+	$bnd = quotemeta($bnd);
+
+	# "multipart" messages can exist w/o a body
+	my $bdy = ($nr ? delete($self->{bdy}) : \(body_raw($self))) or return;
+
+	# Cut at the the first epilogue, not subsequent ones.
+	# *sigh* just the regexp match alone seems to bump RSS by
+	# length($$bdy) on a ~30M string:
+	$$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm and
+		substr($$bdy, pos($$bdy) - length($1)) = '';
+
+	# *Sigh* split() doesn't work in-place and return CoW strings
+	# because Perl wants to "\0"-terminate strings.  So split()
+	# again bumps RSS by length($$bdy)
+
+	# Quiet warning for "Complex regular subexpression recursion limit"
+	# in case we get many empty parts, it's harmless in this case
+	no warnings 'regexp';
+	my ($pre, @parts) = split(/(?:\r?\n)?(?:^--$bnd[ \t]*\r?\n)+/ms,
+				$$bdy,
+				# + 3 since we don't want the last part
+				# processed to include any other excluded
+				# parts ($nr starts at 1, and I suck at math)
+				$MAXPARTS + 3 - $nr);
+
+	if (@parts) { # the usual path if we got this far:
+		undef $bdy; # release memory ASAP if $nr > 0
+		@parts = grep /[^ \t\r\n]/s, @parts; # ignore empty parts
+
+		# Keep "From: someone..." from preamble in old,
+		# buggy versions of git-send-email, otherwise drop it
+		# There's also a case where quoted text showed up in the
+		# preamble
+		# <20060515162817.65F0F1BBAE@citi.umich.edu>
+		unshift(@parts, $pre) if $pre =~ /:/s;
+		return \@parts;
+	}
+	# "multipart", but no boundary found, treat as single part
+	$self->{bdy} //= $bdy;
+	undef;
+}
+
+# $p = [ \@parts, $depth, $idx ]
+# $idx[0] grows as $depth grows, $idx[1] == $p->[-1] == current part
+# (callers need to be updated)
+# \@parts is a queue which empties when we're done with a parent part
+
+# same usage as PublicInbox::MsgIter::msg_iter
+# $cb - user-supplied callback sub
+# $arg - user-supplied arg (think pthread_create)
+# $once - unref body scalar during iteration
+sub each_part {
+	my ($self, $cb, $arg, $once) = @_;
+	my $p = mp_descend($self, $once // 0) or
+					return $cb->([$self, 0, 0], $arg);
+	$p = [ $p, 0 ];
+	my @s; # our virtual stack
+	my $nr = 0;
+	while ((scalar(@{$p->[0]}) || ($p = pop @s)) && ++$nr <= $MAXPARTS) {
+		++$p->[-1]; # bump index
+		my (undef, @idx) = @$p;
+		@idx = (join('.', @idx));
+		my $depth = ($idx[0] =~ tr/././) + 1;
+		my $sub = new_sub(undef, \(shift @{$p->[0]}));
+		if ($depth < $MAXDEPTH && (my $nxt = mp_descend($sub, $nr))) {
+			push(@s, $p) if scalar @{$p->[0]};
+			$p = [ $nxt, @idx, 0 ];
+		} else { # a leaf node
+			$cb->([$sub, $depth, @idx], $arg);
+		}
+	}
+}
+
+########### compatibility section for existing Email::MIME uses #########
+
+sub header_obj {
+	bless { hdr => $_[0]->{hdr}, crlf => $_[0]->{crlf} }, __PACKAGE__;
+}
+
+sub subparts {
+	my ($self) = @_;
+	my $parts = mp_descend($self, 0) or return ();
+	my $bnd = ct($self)->{attributes}->{boundary} // die 'BUG: no boundary';
+	my $bdy = $self->{bdy};
+	if ($$bdy =~ /\A(.*?)(?:\r?\n)?^--\Q$bnd\E[ \t]*\r?$/sm) {
+		$self->{preamble} = $1;
+	}
+	if ($$bdy =~ /^--\Q$bnd\E--[ \t]*\r?\n(.+)\z/sm) {
+		$self->{epilogue} = $1;
+	}
+	map { new_sub(undef, \$_) } @$parts;
+}
+
+sub parts_set {
+	my ($self, $parts) = @_;
+
+	# we can't fully support what Email::MIME does,
+	# just what our filter code needs:
+	my $bnd = ct($self)->{attributes}->{boundary} // die <<EOF;
+->parts_set not supported for single-part messages
+EOF
+	my $crlf = $self->{crlf};
+	my $fin_bnd = "$crlf--$bnd--$crlf";
+	$bnd = "$crlf--$bnd$crlf";
+	${$self->{bdy}} = join($bnd,
+				delete($self->{preamble}) // '',
+				map { $_->as_string } @$parts
+				) .
+				$fin_bnd .
+				(delete($self->{epilogue}) // '');
+	undef;
+}
+
+sub body_set {
+	my ($self, $body) = @_;
+	my $bdy = $self->{bdy} = ref($body) ? $body : \$body;
+	my $cte = header_raw($self, 'Content-Transfer-Encoding');
+	if ($cte && $cte !~ $NO_ENCODE_RE) {
+		$$bdy = Email::MIME::Encodings::encode($cte, $$bdy)
+	}
+	undef;
+}
+
+sub body_str_set {
+	my ($self, $body_str) = @_;
+	my $charset = ct($self)->{attributes}->{charset} or
+		Carp::confess('body_str was given, but no charset is defined');
+	body_set($self, \(encode($charset, $body_str, Encode::FB_CROAK)));
+}
+
+sub content_type { scalar header($_[0], 'Content-Type') }
+
+# we only support raw header_set
+sub header_set {
+	my ($self, $pfx, @vals) = @_;
+	my $re = re_memo($pfx);
+	my $hdr = $self->{hdr};
+	return $$hdr =~ s!$re!!g if !@vals;
+	$pfx .= ': ';
+	my $len = 78 - length($pfx);
+	@vals = map {;
+		# folding differs from Email::Simple::Header,
+		# we favor tabs for visibility (and space savings :P)
+		if (length($_) >= $len && (/\n[^ \t]/s || !/\n/s)) {
+			local $Text::Wrap::columns = $len;
+			local $Text::Wrap::huge = 'overflow';
+			$pfx . wrap('', "\t", $_) . $self->{crlf};
+		} else {
+			$pfx . $_ . $self->{crlf};
+		}
+	} @vals;
+	$$hdr =~ s!$re!shift(@vals) // ''!ge; # replace current headers, first
+	$$hdr .= join('', @vals); # append any leftovers not replaced
+	# wantarray ? @_[2..$#_] : $_[2]; # Email::Simple::Header compat
+	undef; # we don't care for the return value
+}
+
+# note: we only call this method on Subject
+sub header_str_set {
+	my ($self, $name, @vals) = @_;
+	for (@vals) {
+		next unless /[^\x20-\x7e]/;
+		utf8::encode($_); # to octets
+		# 39: int((75 - length("Subject: =?UTF-8?B?".'?=') ) / 4) * 3;
+		s/(.{1,39})/'=?UTF-8?B?'.encode_base64($1, '').'?='/ges;
+	}
+	header_set($self, $name, @vals);
+}
+
+sub mhdr_decode ($) { eval { $MIME_Header->decode($_[0]) } // $_[0] }
+
+sub filename {
+	my $dis = header_raw($_[0], 'Content-Disposition');
+	my $attrs = parse_content_disposition($dis)->{attributes};
+	my $fn = $attrs->{filename};
+	$fn = ct($_[0])->{attributes}->{name} if !defined($fn) || $fn eq '';
+	(defined($fn) && $fn =~ /=\?/) ? mhdr_decode($fn) : $fn;
+}
+
+sub xs_addr_str { # helper for ->header / ->header_str
+	for (@_) { # array from header_raw()
+		next unless /=\?/;
+		my @g = parse_email_groups($_); # [ foo => [ E::A::X, ... ]
+		for (my $i = 0; $i < @g; $i += 2) {
+			if (defined($g[$i]) && $g[$i] =~ /=\?/) {
+				$g[$i] = mhdr_decode($g[$i]);
+			}
+			my $addrs = $g[$i + 1];
+			for my $eax (@$addrs) {
+				for my $m (qw(phrase comment)) {
+					my $v = $eax->$m;
+					$eax->$m(mhdr_decode($v)) if
+							$v && $v =~ /=\?/;
+				}
+			}
+		}
+		$_ = format_email_groups(@g);
+	}
+}
+
+eval {
+	require Email::Address::XS;
+	Email::Address::XS->import(qw(parse_email_groups format_email_groups));
+	1;
+} or do {
+	# fallback to just decoding everything, because parsing
+	# email addresses correctly w/o C/XS is slow
+	%DECODE_FULL = (%DECODE_FULL, %DECODE_ADDRESS);
+	%DECODE_ADDRESS = ();
+};
+
+*header = \&header_str;
+sub header_str {
+	my ($self, $name) = @_;
+	my @v = header_raw($self, $name);
+	if ($DECODE_ADDRESS{$name}) {
+		xs_addr_str(@v);
+	} elsif ($DECODE_FULL{$name}) {
+		for (@v) {
+			$_ = mhdr_decode($_) if /=\?/;
+		}
+	}
+	wantarray ? @v : $v[0];
+}
+
+sub body_raw { ${$_[0]->{bdy} // \''}; }
+
+sub body { body_decode($_[0], body_raw($_[0])) }
+
+sub body_str {
+	my ($self) = @_;
+	my $ct = ct($self);
+	my $charset = $ct->{attributes}->{charset};
+	if (!$charset) {
+		if ($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) {
+			return body($self);
+		}
+		Carp::confess("can't get body as a string for ",
+			join("\n\t", header_raw($self, 'Content-Type')));
+	}
+	decode($charset, body($self), Encode::FB_CROAK);
+}
+
+sub as_string {
+	my ($self) = @_;
+	my $ret = ${ $self->{hdr} };
+	return $ret unless defined($self->{bdy});
+	$ret .= $self->{crlf};
+	$ret .= ${$self->{bdy}};
+}
+
+# Unlike Email::MIME::charset_set, this only changes the parsed
+# representation of charset used for search indexing and HTML display.
+# This does NOT affect what ->as_string returns.
+sub charset_set {
+	ct($_[0])->{attributes}->{charset} = $_[1];
+}
+
+sub crlf { $_[0]->{crlf} // "\n" }
+
+sub willneed { re_memo($_) for @_ }
+
+willneed(qw(From To Cc Date Subject Content-Type In-Reply-To References
+		Message-ID X-Alt-Message-ID));
+
+1;
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index cd73b5b6..600843f0 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -9,7 +9,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
 use POSIX qw(dup2);
 use IO::Socket::INET;
 our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
-	run_script start_script key2sub xsys xqx mime_load);
+	run_script start_script key2sub xsys xqx mime_load eml_load);
 
 sub mime_load ($) {
 	my ($path) = @_;
@@ -17,6 +17,13 @@ sub mime_load ($) {
 	PublicInbox::MIME->new(\(do { local $/; <$fh> }));
 }
 
+sub eml_load ($) {
+	my ($path, $cb) = @_;
+	open(my $fh, '<', $path) or die "open $path: $!";
+	binmode $fh;
+	PublicInbox::Eml->new(\(do { local $/; <$fh> }));
+}
+
 sub tmpdir (;$) {
 	my ($base) = @_;
 	require File::Temp;
diff --git a/t/eml.t b/t/eml.t
new file mode 100644
index 00000000..43c735e7
--- /dev/null
+++ b/t/eml.t
@@ -0,0 +1,363 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::MsgIter qw(msg_part_text);
+my @classes = qw(PublicInbox::Eml);
+SKIP: {
+	require_mods('Email::MIME', 1);
+	push @classes, 'PublicInbox::MIME';
+};
+use_ok $_ for @classes;
+
+{
+	my $eml = PublicInbox::Eml->new(\(my $str = "a: b\n\nhi\n"));
+	is($str, "hi\n", '->new modified body like Email::Simple');
+	is($eml->body, "hi\n", '->body works');
+	is($eml->as_string, "a: b\n\nhi\n", '->as_string');
+}
+
+for my $cls (@classes) {
+	my $mime = $cls->new(my $orig = "From: x\n\nb");
+	is($mime->as_string, $orig, '->as_string works');
+	is($mime->header_obj->as_string, "From: x\n",
+			'header ->as_string works');
+
+	# headers
+	is($mime->header_raw('From'), 'x', 'header_raw scalar context');
+	$mime = $cls->new("R:\n\tx\nR:\n 1\n");
+	is_deeply([$mime->header_raw('r')], [ 'x', '1' ], 'multi-value');
+	$mime = $cls->new("R:x\nR: 1\n");
+	is_deeply([$mime->header_raw('r')], [ 'x', '1' ], 'multi-value header');
+	$mime = $cls->new("R:x\n R: 1\nR:\n f\n");
+	is_deeply([$mime->header_raw('r')], [ 'x R: 1', 'f' ],
+		'multi-line, multi-value header');
+
+	$mime->header_set('r');
+	is_deeply([$mime->header_raw('r')], [], 'header_set clears');
+	$mime->header_set('r');
+	is_deeply([$mime->header_raw('r')], [], 'header_set clears idempotent');
+	$mime->header_set('r', 'h');
+	is_deeply([$mime->header_raw('r')], ['h'], 'header_set');
+	$mime->header_set('r', 'h', 'i');
+	is_deeply([$mime->header_raw('r')], ['h', 'i'], 'header_set ary');
+	$mime->header_set('rr', 'b');
+	is_deeply([$mime->header_raw('r')], ['h', 'i'],
+				"header_set `rr' did not clobber `r'");
+	is($mime->header_raw('rr'), 'b', 'got set scalar');
+	$mime->header_set('rr', 'b'x100);
+	is($mime->header_raw('rr'), 'b'x100, 'got long set scalar');
+	if ($cls eq 'PublicInbox::Eml') {
+		like($mime->as_string, qr/^rr: b{100}\n(?:\n|\z)/sm,
+			'single token not wrapped');
+	}
+	$mime->header_set('rr', ('b'x100) . ' wrap me');
+	if ($cls eq 'PublicInbox::Eml') {
+		like($mime->as_string, qr/^rr: b{100}\n\twrap me\n/sm,
+			'wrapped after long token');
+	}
+	my $exp = "pre\tformatted\n with\n breaks";
+	$mime->header_set('r', $exp);
+	like($mime->as_string, qr/^r: \Q$exp\E/sm, 'preformatted preserved');
+} # for @classes
+
+for my $cls (@classes) { # make sure we don't add quotes if not needed
+	my $eml = $cls->new("From: John Smith <j\@example.com>\n\n");
+	is($eml->header('From'), 'John Smith <j@example.com>',
+		"name not unnecessarily quoted $cls");
+}
+
+for my $cls (@classes) {
+	my $eml = $cls->new("Subject: foo\n\n");
+	$eml->header_str_set('Subject', "\x{100}");
+	like($eml->header_raw('Subject'), qr/utf-8\?B\?/i,
+		'MIME-B encoded UTF-8 Subject');
+	is_deeply([$eml->header_str('Subject')], [ "\x{100}" ],
+		'got wide character back');
+}
+
+# linux-mips apparently got some messages injected w/o Message-ID
+# and long Subject: lines w/o leading whitespace.
+# What appears in the blobs was generated by V2Writable.
+for my $cls (@classes) {
+	my $eml = $cls->new(<<'EOF');
+Message-ID: <20101130193431@z>
+Subject: something really long
+and really wrong
+From: linux-mips archive injection
+Object-Id: 8c56b7abdd551b1264e6522ededbbed9890cccd0
+EOF
+	is_deeply([ $eml->header('Subject') ],
+		[ 'something really long and really wrong' ],
+		'continued long line w/o leading spaces '.$cls);
+	is_deeply([ $eml->header('From') ],
+		[ 'linux-mips archive injection' ],
+		'subsequent line not corrupted');
+	is_deeply([ $eml->header('Message-ID') ],
+		['<20101130193431@z>'],
+		'preceding line readable');
+} # for @classes
+
+{
+	my $eml = eml_load 't/msg_iter-order.eml';
+	my @parts;
+	my $orig = $eml->as_string;
+	$eml->each_part(sub {
+		my ($part, $level, @ex) = @{$_[0]};
+		my $s = $part->body_str;
+		$s =~ s/\s+//sg;
+		push @parts, [ $s, $level, @ex ];
+	});
+	is_deeply(\@parts, [ [ qw(a 1 1) ], [ qw(b 1 2) ] ], 'order is fine');
+	is($eml->as_string, $orig, 'unchanged by ->each_part');
+	$eml->each_part(sub {}, undef, 1);
+	is(defined($eml) ? $eml->body_raw : '', # old msg_iter clobbers $eml
+		'', 'each_part can clobber body');
+}
+
+# body-less, boundary-less
+for my $cls (@classes) {
+	my $call = 0;
+	$cls->new(<<'EOF')->each_part(sub { $call++ }, 0, 1);
+Content-Type: multipart/mixed; boundary="body-less"
+
+EOF
+	is($call, 1, 'called on bodyless multipart');
+
+	my @tmp;
+	$cls->new(<<'EOF')->each_part(sub { push @tmp, \@_; }, 0, 1);
+Content-Type: multipart/mixed; boundary="boundary-less"
+
+hello world
+EOF
+	is(scalar(@tmp), 1, 'got one part even w/o boundary');
+	is($tmp[0]->[0]->[0]->body, "hello world\n", 'body preserved');
+	is($tmp[0]->[0]->[1], 0, '$depth is zero');
+	is($tmp[0]->[0]->[2], 0, '@idx is zero');
+}
+
+# I guess the following only worked in PI::M because of a happy accident
+# involving inheritance:
+for my $cls (@classes) {
+	my @tmp;
+	my $header_less = <<'EOF';
+Archived-At: <85k5su9k59.fsf_-_@lola.goethe.zz>
+Content-Type: multipart/mixed; boundary="header-less"
+
+--header-less
+
+this is the body
+
+--header-less
+i-haz: header
+
+something else
+
+--header-less--
+EOF
+	my $expect = "this is the body\n";
+	$cls->new($header_less)->each_part(sub { push @tmp, \@_  }, 0, 1);
+	my $body = $tmp[0]->[0]->[0]->body;
+	if ($cls eq 'PublicInbox::Eml') {
+		is($body, $expect, 'body-only subpart in '.$cls);
+	} elsif ($body ne $expect) {
+		diag "W: $cls `$body' != `$expect'";
+	}
+	is($tmp[1]->[0]->[0]->body, "something else\n");
+	is(scalar(@tmp), 2, 'two parts');
+}
+
+if ('one newline before headers') {
+	my $eml = PublicInbox::Eml->new("\nNewline: no Header \n");
+	my @v = $eml->header_raw('Newline');
+	is_deeply(\@v, ['no Header'], 'no header');
+	is($eml->crlf, "\n", 'got CRLF as "\n"');
+	is($eml->body, "");
+}
+
+for my $cls (@classes) { # XXX: matching E::M, but not sure about this
+	my $s = <<EOF;
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+header: only
+--b--
+EOF
+	my $eml = $cls->new(\$s);
+	my $nr = 0;
+	my @v;
+	$eml->each_part(sub {
+		@v = $_[0]->[0]->header_raw('Header');
+		$nr++;
+	});
+	is($nr, 1, 'only one part');
+	is_deeply(\@v, [], "nothing w/o body $cls");
+}
+
+for my $cls (@classes) {
+	my $s = <<EOF; # double epilogue, double the fun
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+should: appear
+
+yes
+
+--b--
+
+--b
+should: not appear
+
+nope
+--b--
+EOF
+	my $eml = $cls->new(\$s);
+	my $nr = 0;
+	$eml->each_part(sub {
+		my $part = $_[0]->[0];
+		is_deeply([$part->header_raw('should')], ['appear'],
+			'only got one header');
+		is($part->body, "yes\n", 'got expected body');
+		$nr++;
+	});
+	is($nr, 1, 'only one part');
+}
+
+for my $cls (@classes) {
+	my $s = <<EOF; # buggy git-send-email versions, again?
+Content-Type: text/plain; =?ISO-8859-1?Q?=20charset=3D=1BOF?=
+Content-Transfer-Encoding: 8bit
+Object-Id: ab0440d8cd6d843bee9a27709a459ce3b2bdb94d (lore/kvm)
+
+\xc4\x80
+EOF
+	my $eml = $cls->new(\$s);
+	my ($str, $err) = msg_part_text($eml, $eml->content_type);
+	is($str, "\x{100}\n", "got wide character by assuming utf-8");
+}
+
+if ('we differ from Email::MIME with final "\n" on missing epilogue') {
+	my $s = <<EOF;
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+header: but
+
+no epilogue
+EOF
+	my $eml = PublicInbox::Eml->new(\$s);
+	is(($eml->subparts)[-1]->body, "no epilogue\n",
+		'final "\n" preserved on missing epilogue');
+}
+
+if ('maxparts is a feature unique to us') {
+	my $eml = eml_load 't/psgi_attach.eml';
+	my @orig;
+	$eml->each_part(sub { push @orig, $_[0]->[0] });
+
+	local $PublicInbox::Eml::MAXPARTS = scalar(@orig);
+	my $i = 0;
+	$eml->each_part(sub {
+		my $cur = $_[0]->[0];
+		my $prv = $orig[$i++];
+		is($cur->body_raw, $prv->body_raw, "part #$i matches");
+	});
+	is($i, scalar(@orig), 'maxparts honored');
+	$PublicInbox::Eml::MAXPARTS--;
+	my @ltd;
+	$eml->each_part(sub { push @ltd, $_[0]->[0] });
+	for ($i = 0; $i <= $#ltd; $i++) {
+		is($ltd[$i]->body_raw, $orig[$i]->body_raw,
+			"part[$i] matches");
+	}
+	is(scalar(@ltd), scalar(@orig) - 1, 'maxparts honored');
+}
+
+SKIP: {
+	require_mods('PublicInbox::MIME', 1);
+	my $eml = eml_load 't/utf8.eml';
+	my $mime = mime_load 't/utf8.eml';
+	for my $h (qw(Subject From To)) {
+		my $v = $eml->header($h);
+		my $m = $mime->header($h);
+		is($v, $m, "decoded -8 $h matches Email::MIME");
+		ok(utf8::is_utf8($v), "$h is UTF-8");
+		ok(utf8::valid($v), "UTF-8 valid $h");
+	}
+	my $s = $eml->body_str;
+	ok(utf8::is_utf8($s), 'body_str is UTF-8');
+	ok(utf8::valid($s), 'UTF-8 valid body_str');
+	my $ref = \(my $x = 'ref');
+	for my $msg ($eml, $mime) {
+		$msg->body_str_set($s .= "\nHI\n");
+		ok(!utf8::is_utf8($msg->body_raw),
+				'raw octets after body_str_set');
+		$s = $msg->body_str;
+		ok(utf8::is_utf8($s), 'body_str is UTF-8 after set');
+		ok(utf8::valid($s), 'UTF-8 valid body_str after set');
+		$msg->body_set($ref);
+		is($msg->body_raw, $$ref, 'body_set worked on scalar ref');
+		$msg->body_set($$ref);
+		is($msg->body_raw, $$ref, 'body_set worked on scalar');
+	}
+	$eml = eml_load 't/iso-2202-jp.eml';
+	$mime = mime_load 't/iso-2202-jp.eml';
+	$s = $eml->body_str;
+	is($s, $mime->body_str, 'ISO-2202-JP body_str');
+	ok(utf8::is_utf8($s), 'ISO-2202-JP => UTF-8 body_str');
+	ok(utf8::valid($s), 'UTF-8 valid body_str');
+
+	$eml = eml_load 't/psgi_attach.eml';
+	$mime = mime_load 't/psgi_attach.eml';
+	is_deeply([ map { $_->body_raw } $eml->subparts ],
+		[ map { $_->body_raw } $mime->subparts ],
+		'raw ->subparts match deeply');
+	is_deeply([ map { $_->body } $eml->subparts ],
+		[ map { $_->body } $mime->subparts ],
+		'->subparts match deeply');
+	for my $msg ($eml, $mime) {
+		my @old = $msg->subparts;
+		$msg->parts_set([]);
+		is_deeply([$msg->subparts], [], 'parts_set can clear');
+		$msg->parts_set([$old[-1]]);
+		is(scalar $msg->subparts, 1, 'only last remains');
+	}
+	is($eml->as_string, $mime->as_string,
+		'as_string matches after parts_set');
+}
+
+for my $cls (@classes) {
+	my $s = <<'EOF';
+Content-Type: text/x-patch; name="=?utf-8?q?vtpm-fakefile.patch?="
+Content-Disposition: attachment; filename="=?utf-8?q?vtpm-makefile.patch?="
+
+EOF
+	is($cls->new($s)->filename, 'vtpm-makefile.patch', 'filename decoded');
+	$s =~ s/^Content-Disposition:.*$//sm;
+	is($cls->new($s)->filename, 'vtpm-fakefile.patch', 'filename fallback');
+	is($cls->new($s)->content_type,
+		'text/x-patch; name="vtpm-fakefile.patch"',
+		'matches Email::MIME output, "correct" or not');
+
+	$s = <<'EOF';
+Content-Type: multipart/foo; boundary=b
+
+--b
+Content-Disposition: attachment; filename="=?utf-8?q?vtpm-makefile.patch?="
+
+a
+--b
+Content-Type: text/x-patch; name="=?utf-8?q?vtpm-fakefile.patch?="
+
+b
+--b--
+EOF
+	my @tmp;
+	$cls->new($s)->each_part(sub { push @tmp, $_[0]->[0]->filename });
+	is_deeply(['vtpm-makefile.patch', 'vtpm-fakefile.patch'], \@tmp,
+		'got filename for both attachments');
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/13] switch read-only Email::Simple users to Eml
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (4 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Since PublicInbox::Eml doesn't parse MIME subparts
up front, it can replace most uses of Email::Simple
without performance penalty.

This will eventually allow us to lower overall internal
API footprint by not having to keep the MIME vs Simple
distinction.
---
 lib/PublicInbox/Mbox.pm   | 16 +++++-----------
 lib/PublicInbox/MboxGz.pm |  4 ++--
 lib/PublicInbox/NNTP.pm   | 19 ++++++++-----------
 lib/PublicInbox/WWW.pm    |  6 +++---
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 97bec5e7..94e61d4d 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -14,19 +14,13 @@ use PublicInbox::MID qw/mid_escape/;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Smsg;
 use PublicInbox::WwwStream qw(html_oneshot);
-use Email::Simple;
-use Email::MIME::Encode;
+use PublicInbox::Eml;
 
 sub subject_fn ($) {
 	my ($hdr) = @_;
-	my $fn = $hdr->header('Subject');
+	my $fn = $hdr->header_str('Subject');
 	return 'no-subject' if (!defined($fn) || $fn eq '');
 
-	# no need for full Email::MIME, here
-	if ($fn =~ /=\?/) {
-		eval { $fn = Encode::decode('MIME-Header', $fn) };
-		return 'no-subject' if $@;
-	}
 	$fn =~ s/^re:\s+//i;
 	$fn eq '' ? 'no-subject' : to_filename($fn);
 }
@@ -51,7 +45,7 @@ sub getline {
 	my $ibx = $ctx->{-inbox};
 	$next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
 	$mref = $ibx->msg_by_smsg($cur) or return;
-	$hdr = Email::Simple->new($mref)->header_obj;
+	$hdr = PublicInbox::Eml->new($mref)->header_obj;
 	@$more = ($ctx, $id, $prev, $next); # $next may be undef, here
 	msg_hdr($ctx, $hdr) . msg_body($$mref);
 }
@@ -72,7 +66,7 @@ sub emit_raw {
 	} else {
 		$mref = $ibx->msg_by_mid($mid) or return;
 	}
-	my $hdr = Email::Simple->new($mref)->header_obj;
+	my $hdr = PublicInbox::Eml->new($mref)->header_obj;
 	$more = [ $ctx, $id, $prev, $next, $mref, $hdr ]; # for ->getline
 	my $fn = subject_fn($hdr);
 	my @hdr = ('Content-Type');
@@ -114,7 +108,7 @@ sub msg_hdr ($$;$) {
 	for (my $i = 0; $i < @append; $i += 2) {
 		my $k = $append[$i];
 		my $v = $append[$i + 1];
-		my @v = $header_obj->header($k);
+		my @v = $header_obj->header_raw($k);
 		foreach (@v) {
 			if ($v eq $_) {
 				$v = undef;
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index e506de3d..f7fc4afc 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -3,7 +3,7 @@
 package PublicInbox::MboxGz;
 use strict;
 use warnings;
-use Email::Simple;
+use PublicInbox::Eml;
 use PublicInbox::Hval qw/to_filename/;
 use PublicInbox::Mbox;
 use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
@@ -41,7 +41,7 @@ sub getline {
 	my $buf = delete($self->{buf});
 	while (my $smsg = $self->{cb}->($ctx)) {
 		my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
-		my $h = Email::Simple->new($mref)->header_obj;
+		my $h = PublicInbox::Eml->new($mref)->header_obj;
 
 		my $err = $gz->deflate(
 			PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index e9c66cd1..54207500 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -8,7 +8,7 @@ use warnings;
 use base qw(PublicInbox::DS);
 use fields qw(nntpd article ng long_cb);
 use PublicInbox::MID qw(mid_escape $MID_EXTRACT);
-use Email::Simple;
+use PublicInbox::Eml;
 use POSIX qw(strftime);
 use PublicInbox::DS qw(now);
 use Digest::SHA qw(sha1_hex);
@@ -383,7 +383,7 @@ sub cmd_quit ($) {
 
 sub header_append ($$$) {
 	my ($hdr, $k, $v) = @_;
-	my @v = $hdr->header($k);
+	my @v = $hdr->header_raw($k);
 	foreach (@v) {
 		return if $v eq $_;
 	}
@@ -416,11 +416,11 @@ sub set_nntp_headers ($$$$$) {
 	# leafnode (and maybe other NNTP clients) have trouble dealing
 	# with v2 messages which have multiple Message-IDs (either due
 	# to our own content-based dedupe or buggy git-send-email versions).
-	my @mids = $hdr->header('Message-ID');
+	my @mids = $hdr->header_raw('Message-ID');
 	if (scalar(@mids) > 1) {
 		my $mid0 = "<$mid>";
 		$hdr->header_set('Message-ID', $mid0);
-		my @alt = $hdr->header('X-Alt-Message-ID');
+		my @alt = $hdr->header_raw('X-Alt-Message-ID');
 		my %seen = map { $_ => 1 } (@alt, $mid0);
 		push(@alt, grep { !$seen{$_}++ } @mids);
 		$hdr->header_set('X-Alt-Message-ID', @alt);
@@ -478,10 +478,9 @@ found:
 	my $smsg = $ng->over->get_art($n) or return $err;
 	my $msg = $ng->msg_by_smsg($smsg) or return $err;
 
-	# Email::Simple->new will modify $msg in-place as documented
-	# in its manpage, so what's left is the body and we won't need
-	# to call Email::Simple::body(), later
-	my $hdr = Email::Simple->new($msg)->header_obj;
+	# PublicInbox::Eml->new will modify $msg in-place, so what's
+	# left is the body and we won't need to call ->body(), later
+	my $hdr = PublicInbox::Eml->new($msg)->header_obj;
 	set_nntp_headers($self, $hdr, $ng, $n, $mid) if $set_headers;
 	[ $n, $mid, $msg, $hdr ];
 }
@@ -511,9 +510,7 @@ sub msg_hdr_write ($$$) {
 	$hdr =~ s/(?<!\r)\n/\r\n/sg; # Alpine barfs without this
 
 	# for leafnode compatibility, we need to ensure Message-ID headers
-	# are only a single line.  We can't subclass Email::Simple::Header
-	# and override _default_fold_at in here, either; since that won't
-	# affect messages already in the archive.
+	# are only a single line.
 	$hdr =~ s/^(Message-ID:)[ \t]*\r\n[ \t]+([^\r]+)\r\n/$1 $2\r\n/igsm;
 	$hdr .= "\r\n" if $body_follows;
 	$self->msg_more($hdr);
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 275e509f..6c016b03 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -22,6 +22,7 @@ use PublicInbox::MID qw(mid_escape);
 use PublicInbox::GitHTTPBackend;
 use PublicInbox::UserContent;
 use PublicInbox::WwwStatic qw(r path_info_raw);
+use PublicInbox::Eml;
 
 # TODO: consider a routing tree now that we have more endpoints:
 our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
@@ -225,9 +226,8 @@ sub invalid_inbox_mid {
 		my ($x2, $x38) = ($1, $2);
 		# this is horrifically wasteful for legacy URLs:
 		my $str = $ctx->{-inbox}->msg_by_path("$x2/$x38") or return;
-		require Email::Simple;
-		my $s = Email::Simple->new($str);
-		$mid = PublicInbox::MID::mid_clean($s->header('Message-ID'));
+		my $s = PublicInbox::Eml->new($str);
+		$mid = PublicInbox::MID::mid_clean($s->header_raw('Message-ID'));
 		return r301($ctx, $inbox, mid_escape($mid));
 	}
 	undef;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (5 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
---
 Documentation/mknews.perl         |  4 ++--
 lib/PublicInbox/Admin.pm          |  2 +-
 lib/PublicInbox/Filter/Vger.pm    |  4 ++--
 lib/PublicInbox/Import.pm         |  3 ++-
 lib/PublicInbox/Inbox.pm          |  4 ++--
 lib/PublicInbox/InboxWritable.pm  |  4 ++--
 lib/PublicInbox/MDA.pm            |  1 -
 lib/PublicInbox/SearchIdx.pm      |  6 +++---
 lib/PublicInbox/SearchIdxShard.pm |  3 ++-
 lib/PublicInbox/TestCommon.pm     |  3 +++
 lib/PublicInbox/V2Writable.pm     | 17 +++++++++--------
 lib/PublicInbox/View.pm           |  2 +-
 lib/PublicInbox/WWW.pm            |  2 +-
 lib/PublicInbox/WatchMaildir.pm   |  4 ++--
 lib/PublicInbox/WwwAttach.pm      |  5 ++---
 script/public-inbox-edit          |  8 ++++----
 script/public-inbox-learn         |  4 ++--
 script/public-inbox-mda           | 16 ++++++++--------
 script/public-inbox-purge         |  4 ++--
 t/filter_rubylang.t               |  8 ++++----
 t/import.t                        |  2 +-
 21 files changed, 55 insertions(+), 51 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index a9dede00..3bdebfce 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -5,7 +5,7 @@
 # this uses unstable internal APIs of public-inbox, and this script
 # needs to be updated if they change.
 use strict;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::View;
 use PublicInbox::MsgTime qw(msg_datestamp);
 use PublicInbox::MID qw(mids mid_escape);
@@ -76,7 +76,7 @@ sub release2mime {
 	my ($release, $mtime_ref) = @_;
 	my $f = "$dir/$release.eml";
 	open(my $fh, '<', $f) or die "open($f): $!";
-	my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
+	my $mime = PublicInbox::Eml->new(\(do { local $/; <$fh> }));
 	# Documentation/include.mk relies on mtimes of each .eml file
 	# to trigger rebuild, so make sure we sync the mtime to the Date:
 	# header in the .eml
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 62ddbe82..2c8d191a 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -122,7 +122,7 @@ EOF
 }
 
 # TODO: make Devel::Peek optional, only used for daemon
-my @base_mod = qw(Email::MIME Devel::Peek);
+my @base_mod = qw(Devel::Peek);
 my @over_mod = qw(DBD::SQLite DBI);
 my %mod_groups = (
 	-index => [ @base_mod, @over_mod ],
diff --git a/lib/PublicInbox/Filter/Vger.pm b/lib/PublicInbox/Filter/Vger.pm
index e746238c..2c73738d 100644
--- a/lib/PublicInbox/Filter/Vger.pm
+++ b/lib/PublicInbox/Filter/Vger.pm
@@ -5,7 +5,7 @@
 package PublicInbox::Filter::Vger;
 use base qw(PublicInbox::Filter::Base);
 use strict;
-use warnings;
+use PublicInbox::Eml;
 
 my $l0 = qr/-+/; # older messages only had one '-'
 my $l1 =
@@ -25,7 +25,7 @@ sub scrub {
 	# so in multipart (e.g. GPG-signed) messages, the list trailer
 	# becomes invisible to MIME-aware email clients.
 	if ($s =~ s/$l0\n$l1\n$l2\n$l3\n($l4\n)?\z//os) {
-		$mime = PublicInbox::MIME->new(\$s);
+		$mime = PublicInbox::Eml->new(\$s);
 	}
 	$self->ACCEPT($mime);
 }
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index de8ff55f..98aa7785 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -15,6 +15,7 @@ use PublicInbox::Address;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use PublicInbox::ContentId qw(content_digest);
 use PublicInbox::MDA;
+use PublicInbox::Eml;
 use POSIX qw(strftime);
 
 sub new {
@@ -137,7 +138,7 @@ sub check_remove_v1 {
 	$info =~ m!\A100644 blob ([a-f0-9]{40})\t!s or die "not blob: $info";
 	my $oid = $1;
 	my $msg = _cat_blob($r, $w, $oid) or die "BUG: cat-blob $1 failed";
-	my $cur = PublicInbox::MIME->new($msg);
+	my $cur = PublicInbox::Eml->new($msg);
 	my $cur_s = $cur->header('Subject');
 	$cur_s = '' unless defined $cur_s;
 	my $cur_m = $mime->header('Subject');
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 186eb420..617b692b 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -7,7 +7,7 @@ use strict;
 use warnings;
 use PublicInbox::Git;
 use PublicInbox::MID qw(mid2path);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 
 # Long-running "git-cat-file --batch" processes won't notice
 # unlinked packs, so we need to restart those processes occasionally.
@@ -328,7 +328,7 @@ sub msg_by_smsg ($$;$) {
 sub smsg_mime {
 	my ($self, $smsg, $ref) = @_;
 	if (my $s = msg_by_smsg($self, $smsg, $ref)) {
-		$smsg->{mime} = PublicInbox::MIME->new($s);
+		$smsg->{mime} = PublicInbox::Eml->new($s);
 		return $smsg;
 	}
 }
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 31aa76c6..3558403b 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -117,7 +117,7 @@ sub mime_from_path ($) {
 		local $/;
 		my $str = <$fh>;
 		$str or return;
-		return PublicInbox::MIME->new(\$str);
+		return PublicInbox::Eml->new(\$str);
 	} elsif ($!{ENOENT}) {
 		# common with Maildir
 		return;
@@ -162,7 +162,7 @@ sub mb_add ($$$$) {
 	} elsif ($variant eq 'mboxo') {
 		$$msg =~ s/^>From /From /gms;
 	}
-	my $mime = PublicInbox::MIME->new($msg);
+	my $mime = PublicInbox::Eml->new($msg);
 	if ($filter) {
 		my $ret = $filter->scrub($mime) or return;
 		return if $ret == REJECT();
diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
index 33696528..57b436b9 100644
--- a/lib/PublicInbox/MDA.pm
+++ b/lib/PublicInbox/MDA.pm
@@ -5,7 +5,6 @@
 package PublicInbox::MDA;
 use strict;
 use warnings;
-use Email::Simple;
 use PublicInbox::MsgTime;
 use constant MAX_SIZE => 1024 * 500; # same as spamc default, should be tunable
 use constant MAX_MID_SIZE => 244; # max term size - 1 in Xapian
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a7e31b71..f054bb6a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -10,7 +10,7 @@ package PublicInbox::SearchIdx;
 use strict;
 use warnings;
 use base qw(PublicInbox::Search PublicInbox::Lock);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
 use PublicInbox::MID qw/mid_clean mid_mime mids_for_index/;
 use PublicInbox::MsgIter;
@@ -365,7 +365,7 @@ sub _msgmap_init ($) {
 }
 
 sub add_message {
-	# mime = Email::MIME object
+	# mime = PublicInbox::Eml or Email::MIME object
 	my ($self, $mime, $smsg) = @_;
 	my $hdr = $mime->header_obj;
 	my $mids = mids_for_index($hdr);
@@ -554,7 +554,7 @@ sub do_cat_mail {
 	my ($git, $blob, $sizeref) = @_;
 	my $str = $git->cat_file($blob, $sizeref) or
 		die "BUG: $blob not found in $git->{git_dir}";
-	PublicInbox::MIME->new($str);
+	PublicInbox::Eml->new($str);
 }
 
 # called by public-inbox-index
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index 06bcd403..e754b038 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -8,6 +8,7 @@ use strict;
 use warnings;
 use base qw(PublicInbox::SearchIdx);
 use IO::Handle (); # autoflush
+use PublicInbox::Eml;
 
 sub new {
 	my ($class, $v2writable, $shard) = @_;
@@ -75,7 +76,7 @@ sub shard_worker_loop ($$$$$) {
 			$self->begin_txn_lazy;
 			my $n = read($r, my $msg, $bytes) or die "read: $!\n";
 			$n == $bytes or die "short read: $n != $bytes\n";
-			my $mime = PublicInbox::MIME->new(\$msg);
+			my $mime = PublicInbox::Eml->new(\$msg);
 			my $smsg = bless {
 				bytes => $bytes,
 				num => $num + 0,
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 600843f0..978c3cd7 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -8,12 +8,15 @@ use parent qw(Exporter);
 use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
 use POSIX qw(dup2);
 use IO::Socket::INET;
+use PublicInbox::MIME; # temporary
 our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
 	run_script start_script key2sub xsys xqx mime_load eml_load);
 
 sub mime_load ($) {
 	my ($path) = @_;
 	open(my $fh, '<', $path) or die "open $path: $!";
+	# test should've called: require_mods('Email::MIME')
+	require PublicInbox::MIME;
 	PublicInbox::MIME->new(\(do { local $/; <$fh> }));
 }
 
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 01b8bed6..f599e0a0 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -9,7 +9,7 @@ use warnings;
 use base qw(PublicInbox::Lock);
 use 5.010_001;
 use PublicInbox::SearchIdxShard;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Git;
 use PublicInbox::Import;
 use PublicInbox::MID qw(mids references);
@@ -357,9 +357,10 @@ sub content_ids ($) {
 	my ($mime) = @_;
 	my @cids = ( content_id($mime) );
 
+	# We still support Email::MIME, here, and
 	# Email::MIME->as_string doesn't always round-trip, so we may
 	# use a second content_id
-	my $rt = content_id(PublicInbox::MIME->new(\($mime->as_string)));
+	my $rt = content_id(PublicInbox::Eml->new(\($mime->as_string)));
 	push @cids, $rt if $cids[0] ne $rt;
 	\@cids;
 }
@@ -405,7 +406,7 @@ sub rewrite_internal ($$;$$$) {
 				next; # continue
 			}
 			my $orig = $$msg;
-			my $cur = PublicInbox::MIME->new($msg);
+			my $cur = PublicInbox::Eml->new($msg);
 			if (content_matches($cids, $cur)) {
 				$gone{$smsg->{num}} = [ $smsg, $cur, \$orig ];
 			}
@@ -842,7 +843,7 @@ sub content_exists ($$$) {
 			warn "broken smsg for $mid\n";
 			next;
 		}
-		my $cur = PublicInbox::MIME->new($msg);
+		my $cur = PublicInbox::Eml->new($msg);
 		return 1 if content_matches($cids, $cur);
 
 		# XXX DEBUG_DIFF is experimental and may be removed
@@ -870,7 +871,7 @@ sub mark_deleted ($$$$) {
 	my ($self, $sync, $git, $oid) = @_;
 	return if PublicInbox::SearchIdx::too_big($self, $git, $oid);
 	my $msgref = $git->cat_file($oid);
-	my $mime = PublicInbox::MIME->new($$msgref);
+	my $mime = PublicInbox::Eml->new($$msgref);
 	my $mids = mids($mime->header_obj);
 	my $cid = content_id($mime);
 	foreach my $mid (@$mids) {
@@ -901,7 +902,7 @@ sub reindex_oid_m ($$$$;$) {
 	$self->{current_info} = "multi_mid $oid";
 	my ($num, $mid0, $len);
 	my $msgref = $git->cat_file($oid, \$len);
-	my $mime = PublicInbox::MIME->new($$msgref);
+	my $mime = PublicInbox::Eml->new($$msgref);
 	my $mids = mids($mime->header_obj);
 	my $cid = content_id($mime);
 	die "BUG: reindex_oid_m called for <=1 mids" if scalar(@$mids) <= 1;
@@ -999,7 +1000,7 @@ sub reindex_oid ($$$$) {
 	my ($num, $mid0, $len);
 	my $msgref = $git->cat_file($oid, \$len);
 	return if $len == 0; # purged
-	my $mime = PublicInbox::MIME->new($$msgref);
+	my $mime = PublicInbox::Eml->new($$msgref);
 	my $mids = mids($mime->header_obj);
 	my $cid = content_id($mime);
 
@@ -1193,7 +1194,7 @@ sub unindex_oid ($$$;$) {
 	my ($self, $git, $oid, $unindexed) = @_;
 	my $mm = $self->{mm};
 	my $msgref = $git->cat_file($oid);
-	my $mime = PublicInbox::MIME->new($msgref);
+	my $mime = PublicInbox::Eml->new($msgref);
 	my $mids = mids($mime->header_obj);
 	$mime = $msgref = undef;
 	my $over = $self->{over};
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3328c865..ef5f4b3a 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -56,7 +56,7 @@ sub msg_page {
 	} else {
 		$first = $ibx->msg_by_mid($mid) or return;
 	}
-	my $mime = PublicInbox::MIME->new($first);
+	my $mime = PublicInbox::Eml->new($first);
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	my $hdr = $ctx->{hdr} = $mime->header_obj;
 	$ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx, 0);
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 6c016b03..71fe1f4b 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -146,7 +146,7 @@ sub preload {
 	require PublicInbox::Feed;
 	require PublicInbox::View;
 	require PublicInbox::SearchThread;
-	require PublicInbox::MIME;
+	require PublicInbox::Eml;
 	require PublicInbox::Mbox;
 	require PublicInbox::ViewVCS;
 	require PublicInbox::WwwText;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 71bd84fc..7ca35403 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -6,7 +6,7 @@
 package PublicInbox::WatchMaildir;
 use strict;
 use warnings;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
 use File::Temp 0.19 (); # 0.19 for ->newdir
 use PublicInbox::Filter::Base qw(REJECT);
@@ -282,7 +282,7 @@ sub _spamcheck_cb {
 		my ($mime) = @_;
 		my $tmp = '';
 		if ($sc->spamcheck($mime, \$tmp)) {
-			return PublicInbox::MIME->new(\$tmp);
+			return PublicInbox::Eml->new(\$tmp);
 		}
 		warn $mime->header('Message-ID')." failed spam check\n";
 		undef;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index b1009907..5b2914b3 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -7,8 +7,7 @@ use strict;
 use warnings;
 use bytes (); # only for bytes::length
 use Email::MIME::ContentType qw(parse_content_type);
-use PublicInbox::MIME;
-use PublicInbox::MsgIter;
+use PublicInbox::Eml;
 
 sub get_attach_i { # ->each_part callback
 	my ($part, $depth, $idx) = @{$_[0]};
@@ -38,7 +37,7 @@ sub get_attach ($$$) {
 	my ($ctx, $idx, $fn) = @_;
 	my $res = [ 404, [ 'Content-Type', 'text/plain' ], [ "Not found\n" ] ];
 	my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
-	$mime = PublicInbox::MIME->new($mime);
+	$mime = PublicInbox::Eml->new($mime);
 	$res->[3] = $idx;
 	$mime->each_part(\&get_attach_i, $res, 1);
 	pop @$res; # cleanup before letting PSGI server see it
diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index 42f914a8..e895a228 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -12,7 +12,7 @@ use File::Temp 0.19 (); # 0.19 for TMPDIR
 use PublicInbox::ContentId qw(content_id);
 use PublicInbox::MID qw(mid_clean mids);
 PublicInbox::Admin::check_require('-index');
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
 use PublicInbox::Import;
 
@@ -52,7 +52,7 @@ sub find_mid ($$$) {
 		my ($id, $prev);
 		while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
 			my $ref = $ibx->msg_by_smsg($smsg);
-			my $mime = PublicInbox::MIME->new($ref);
+			my $mime = PublicInbox::Eml->new($ref);
 			my $cid = content_id($mime);
 			my $tuple = [ $ibx, $smsg ];
 			push @{$found->{$cid} ||= []}, $tuple
@@ -205,8 +205,8 @@ W: possible message boundary splitting error
 		$new_raw =~ s/^>(>*From )/$1/gm;
 	}
 
-	my $new_mime = PublicInbox::MIME->new(\$new_raw);
-	my $old_mime = PublicInbox::MIME->new($old_raw);
+	my $new_mime = PublicInbox::Eml->new(\$new_raw);
+	my $old_mime = PublicInbox::Eml->new($old_raw);
 
 	# make sure we don't compare unwanted headers, since mutt adds
 	# Content-Length, Status, and Lines headers:
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 4c10b68b..a33d813a 100644
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -9,7 +9,7 @@ use strict;
 use warnings;
 use PublicInbox::Config;
 use PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Address;
 use PublicInbox::Spamcheck::Spamc;
 my $train = shift or die "usage: $usage\n";
@@ -20,7 +20,7 @@ if ($train !~ /\A(?:ham|spam|rm)\z/) {
 my $spamc = PublicInbox::Spamcheck::Spamc->new;
 my $pi_config = PublicInbox::Config->new;
 my $err;
-my $mime = PublicInbox::MIME->new(do{
+my $mime = PublicInbox::Eml->new(do{
 	local $/;
 	my $data = <STDIN>;
 	$data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 54d0af01..42d0e00c 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -15,8 +15,7 @@ my $do_exit = sub {
 	exit $code;
 };
 
-use Email::Simple;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::MDA;
 use PublicInbox::Config;
 use PublicInbox::Emergency;
@@ -32,7 +31,7 @@ $ems = PublicInbox::Emergency->new($emergency);
 my $str = do { local $/; <STDIN> };
 $str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 $ems->prepare(\$str);
-my $simple = Email::Simple->new(\$str);
+my $eml = PublicInbox::Eml->new(\$str);
 my $config = PublicInbox::Config->new;
 my $key = 'publicinboxmda.spamcheck';
 my $default = 'PublicInbox::Spamcheck::Spamc';
@@ -44,7 +43,7 @@ if (defined $recipient) {
 	push @$dests, $ibx if $ibx;
 }
 if (!scalar(@$dests)) {
-	$dests = PublicInbox::MDA->inboxes_for_list_id($config, $simple);
+	$dests = PublicInbox::MDA->inboxes_for_list_id($config, $eml);
 	if (!scalar(@$dests) && !defined($recipient)) {
 		die "ORIGINAL_RECIPIENT not defined in ENV\n";
 	}
@@ -61,7 +60,7 @@ my $err;
 		0;
 	# pre-check, MDA has stricter rules than an importer might;
 	} elsif ($precheck) {
-		!!PublicInbox::MDA->precheck($simple, $ibx->{address});
+		!!PublicInbox::MDA->precheck($eml, $ibx->{address});
 	} else {
 		1;
 	}
@@ -69,7 +68,7 @@ my $err;
 
 $do_exit->(67) if $err && scalar(@$dests) == 0;
 
-$simple = undef;
+$eml = undef;
 my $spam_ok;
 if ($spamc) {
 	$str = '';
@@ -101,9 +100,10 @@ my @rejects;
 for my $ibx (@$dests) {
 	mda_filter_adjust($ibx);
 	my $filter = $ibx->filter;
-	my $mime = PublicInbox::MIME->new($str);
+	my $mime = PublicInbox::Eml->new($str);
 	my $ret = $filter->delivery($mime);
-	if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message
+	if (ref($ret) && ($ret->isa('PublicInbox::Eml') ||
+			$ret->isa('Email::MIME'))) { # filter altered message
 		$mime = $ret;
 	} elsif ($ret == PublicInbox::Filter::Base::IGNORE) {
 		next; # nothing, keep looping
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index 8301b06d..82a63b80 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -10,7 +10,7 @@ use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
 use PublicInbox::AdminEdit;
 PublicInbox::Admin::check_require('-index');
 use PublicInbox::Filter::Base qw(REJECT);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require PublicInbox::V2Writable;
 
 my $usage = "$0 [--all] [INBOX_DIRS] </path/to/message";
@@ -26,7 +26,7 @@ $data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 my $n_purged = 0;
 
 foreach my $ibx (@ibxs) {
-	my $mime = PublicInbox::MIME->new($data);
+	my $mime = PublicInbox::Eml->new($data);
 	my $v2w = PublicInbox::V2Writable->new($ibx, 0);
 
 	my $commits = $v2w->purge($mime) || [];
diff --git a/t/filter_rubylang.t b/t/filter_rubylang.t
index 05e1b324..e6c53f98 100644
--- a/t/filter_rubylang.t
+++ b/t/filter_rubylang.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 use_ok 'PublicInbox::Filter::RubyLang';
 
@@ -17,7 +17,7 @@ keep this
 Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
 <http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>
 EOF
-my $mime = PublicInbox::MIME->new($msg);
+my $mime = PublicInbox::Eml->new($msg);
 my $ret = $f->delivery($mime);
 is($ret, $mime, "delivery successful");
 is($mime->body, "keep this\n", 'normal message filtered OK');
@@ -41,7 +41,7 @@ X-Mail-Count: 12
 Message-ID: <a@b>
 
 EOF
-	$mime = PublicInbox::MIME->new($msg);
+	$mime = PublicInbox::Eml->new($msg);
 	$ret = $f->delivery($mime);
 	is($ret, $mime, "delivery successful");
 	my $mm = PublicInbox::Msgmap->new($git_dir);
@@ -53,7 +53,7 @@ Message-ID: <b@b>
 
 EOF
 
-	$mime = PublicInbox::MIME->new($msg);
+	$mime = PublicInbox::Eml->new($msg);
 	$ret = $f->delivery($mime);
 	is($ret, 100, "delivery rejected without X-Mail-Count");
 }
diff --git a/t/import.t b/t/import.t
index d2264102..ba4abd9c 100644
--- a/t/import.t
+++ b/t/import.t
@@ -75,7 +75,7 @@ $im->done;
 is(scalar @revs, 26, '26 revisions exist after mass import');
 my ($mark, $msg) = $im->remove($mime);
 like($mark, qr/\A:\d+\z/, 'got mark');
-is(ref($msg), 'PublicInbox::MIME', 'got old message deleted');
+like(ref($msg), qr/\bPublicInbox::(?:Eml|MIME)\b/, 'got old message deleted');
 
 is(undef, $im->remove($mime), 'remove is idempotent');
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (6 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Since we're getting rid of Email::MIME, get rid of
Email::MIME::ContentType, too; since we may introduce
speedups down the line specific to our codebase.
---
 MANIFEST                         |   3 +
 lib/PublicInbox/Eml.pm           |   7 +-
 lib/PublicInbox/EmlContentFoo.pm | 294 +++++++++++++++++++++++++++++++
 lib/PublicInbox/WwwAttach.pm     |   2 +-
 t/eml_content_disposition.t      | 102 +++++++++++
 t/eml_content_type.t             | 289 ++++++++++++++++++++++++++++++
 6 files changed, 692 insertions(+), 5 deletions(-)
 create mode 100644 lib/PublicInbox/EmlContentFoo.pm
 create mode 100644 t/eml_content_disposition.t
 create mode 100644 t/eml_content_type.t

diff --git a/MANIFEST b/MANIFEST
index 0906448e..055c8c9a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -106,6 +106,7 @@ lib/PublicInbox/DSPoll.pm
 lib/PublicInbox/Daemon.pm
 lib/PublicInbox/Emergency.pm
 lib/PublicInbox/Eml.pm
+lib/PublicInbox/EmlContentFoo.pm
 lib/PublicInbox/ExtMsg.pm
 lib/PublicInbox/Feed.pm
 lib/PublicInbox/Filter/Base.pm
@@ -231,6 +232,8 @@ t/ds-poll.t
 t/edit.t
 t/emergency.t
 t/eml.t
+t/eml_content_disposition.t
+t/eml_content_type.t
 t/epoll.t
 t/fail-bin/spamc
 t/feed.t
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 0c23bed0..1988bdb3 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -33,10 +33,9 @@ use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
 
 my $MIME_Header = find_encoding('MIME-Header');
 
-# TODO remove these dependencies
-use Email::MIME::ContentType;
+use PublicInbox::EmlContentFoo qw(parse_content_type parse_content_disposition);
 use Email::MIME::Encodings;
-$Email::MIME::ContentType::STRICT_PARAMS = 0;
+$PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
 
 our $MAXPARTS = 1000; # same as SpamAssassin
 our $MAXDEPTH = 20; # seems enough, Perl sucks, here
@@ -108,7 +107,7 @@ sub header_raw {
 # pick the first Content-Type header to match Email::MIME behavior.
 # It's usually the right one based on historical archives.
 sub ct ($) {
-	# Email::MIME::ContentType::content_type:
+	# PublicInbox::EmlContentFoo::content_type:
 	$_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
 }
 
diff --git a/lib/PublicInbox/EmlContentFoo.pm b/lib/PublicInbox/EmlContentFoo.pm
new file mode 100644
index 00000000..f507d548
--- /dev/null
+++ b/lib/PublicInbox/EmlContentFoo.pm
@@ -0,0 +1,294 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+#  <https://www.gnu.org/licenses/gpl-1.0.txt>
+#  <https://dev.perl.org/licenses/artistic.html>
+#
+# This license differs from the rest of public-inbox
+#
+# This is a fork of the Email::MIME::ContentType 1.022 with
+# minor improvements and incompatibilities; namely changes to
+# quiet warnings with legacy data.
+package PublicInbox::EmlContentFoo;
+use strict;
+use parent qw(Exporter);
+# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+
+use Encode 2.87 qw(find_mime_encoding);
+our @EXPORT_OK = qw(parse_content_type parse_content_disposition);
+
+our $STRICT_PARAMS = 1;
+
+my $ct_default = 'text/plain; charset=us-ascii';
+
+my $re_token = # US-ASCII except SPACE, CTLs and tspecials ()<>@,;:\\"/[]?=
+	qr/[\x21\x23-\x27\x2A\x2B\x2D\x2E\x30-\x39\x41-\x5A\x5E-\x7E]+/;
+
+my $re_token_non_strict = # allow CTLs and above ASCII
+	qr/([\x00-\x08\x0B\x0C\x0E-\x1F\x7E-\xFF]+|$re_token)/;
+
+my $re_qtext = # US-ASCII except CR, LF, white space, backslash and quote
+	qr/[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7E\x7F]/;
+my $re_quoted_pair = qr/\\[\x00-\x7F]/;
+my $re_quoted_string = qr/"((?:[ \t]*(?:$re_qtext|$re_quoted_pair))*[ \t]*)"/;
+
+my $re_qtext_non_strict = qr/[\x80-\xFF]|$re_qtext/;
+my $re_quoted_pair_non_strict = qr/\\[\x00-\xFF]/;
+my $re_quoted_string_non_strict =
+qr/"((?:[ \t]*(?:$re_qtext_non_strict|$re_quoted_pair_non_strict))*[ \t]*)"/;
+
+my $re_charset = qr/[!"#\$%&'+\-0-9A-Z\\\^_`a-z\{\|\}~]+/;
+my $re_language = qr/[A-Za-z]{1,8}(?:-[0-9A-Za-z]{1,8})*/;
+my $re_exvalue = qr/($re_charset)?'(?:$re_language)?'(.*)/;
+
+sub parse_content_type {
+	my ($ct) = @_;
+
+	# If the header isn't there or is empty, give default answer.
+	$ct = $ct_default unless defined($ct) && length($ct);
+
+	_unfold_lines($ct);
+	_clean_comments($ct);
+
+	# It is also recommend (sic.) that this default be assumed when a
+	# syntactically invalid Content-Type header field is encountered.
+	unless ($ct =~ s/^($re_token)\/($re_token)//) {
+		unless ($STRICT_PARAMS && $ct =~ s/^($re_token_non_strict)\/
+						($re_token_non_strict)//x) {
+			#carp "Invalid Content-Type '$ct'";
+			return parse_content_type($ct_default);
+		}
+	}
+
+	my ($type, $subtype) = (lc $1, lc $2);
+
+	_clean_comments($ct);
+	$ct =~ s/\s+$//;
+
+	my $attributes = {};
+	if ($STRICT_PARAMS && length($ct) && $ct !~ /^;/) {
+		# carp "Missing ';' before first Content-Type parameter '$ct'";
+	} else {
+		$attributes = _process_rfc2231(_parse_attributes($ct));
+	}
+
+	{
+		type	   => $type,
+		subtype	=> $subtype,
+		attributes => $attributes,
+
+		# This is dumb.  Really really dumb.  For backcompat. -- rjbs,
+		# 2013-08-10
+		discrete   => $type,
+		composite  => $subtype,
+	};
+}
+
+my $cd_default = 'attachment';
+
+sub parse_content_disposition {
+	my ($cd) = @_;
+
+	$cd = $cd_default unless defined($cd) && length($cd);
+
+	_unfold_lines($cd);
+	_clean_comments($cd);
+
+	unless ($cd =~ s/^($re_token)//) {
+		unless ($STRICT_PARAMS and $cd =~ s/^($re_token_non_strict)//) {
+			#carp "Invalid Content-Disposition '$cd'";
+			return parse_content_disposition($cd_default);
+		}
+	}
+
+	my $type = lc $1;
+
+	_clean_comments($cd);
+	$cd =~ s/\s+$//;
+
+	my $attributes = {};
+	if ($STRICT_PARAMS && length($cd) && $cd !~ /^;/) {
+# carp "Missing ';' before first Content-Disposition parameter '$cd'";
+	} else {
+		$attributes = _process_rfc2231(_parse_attributes($cd));
+	}
+
+	{
+		type	   => $type,
+		attributes => $attributes,
+	};
+}
+
+sub _unfold_lines {
+	$_[0] =~ s/(?:\r\n|[\r\n])(?=[ \t])//g;
+}
+
+sub _clean_comments {
+	my $ret = ($_[0] =~ s/^\s+//);
+	while (length $_[0]) {
+		last unless $_[0] =~ s/^\(//;
+		my $level = 1;
+		while (length $_[0]) {
+			my $ch = substr $_[0], 0, 1, '';
+			if ($ch eq '(') {
+				$level++;
+			} elsif ($ch eq ')') {
+				$level--;
+				last if $level == 0;
+			} elsif ($ch eq '\\') {
+				substr $_[0], 0, 1, '';
+			}
+		}
+		# carp "Unbalanced comment" if $level != 0 and $STRICT_PARAMS;
+		$ret |= ($_[0] =~ s/^\s+//);
+	}
+	$ret;
+}
+
+sub _process_rfc2231 {
+	my ($attribs) = @_;
+	my %cont;
+	my %encoded;
+	foreach (keys %{$attribs}) {
+		next unless $_ =~ m/^(.*)\*([0-9])\*?$/;
+		my ($attr, $sec) = ($1, $2);
+		$cont{$attr}->[$sec] = $attribs->{$_};
+		$encoded{$attr}->[$sec] = 1 if $_ =~ m/\*$/;
+		delete $attribs->{$_};
+	}
+	foreach (keys %cont) {
+		my $key = $_;
+		$key .= '*' if $encoded{$_};
+		$attribs->{$key} = join '', @{$cont{$_}};
+	}
+	foreach (keys %{$attribs}) {
+		next unless $_ =~ m/^(.*)\*$/;
+		my $key = $1;
+		next unless $attribs->{$_} =~ m/^$re_exvalue$/;
+		my ($charset, $value) = ($1, $2);
+		$value =~ s/%([0-9A-Fa-f]{2})/pack('C', hex($1))/eg;
+		if (length $charset) {
+			my $enc = find_mime_encoding($charset);
+			if (defined $enc) {
+				$value = $enc->decode($value);
+			# } else {
+				#carp "Unknown charset '$charset' in
+				#attribute '$key' value";
+			}
+		}
+		$attribs->{$key} = $value;
+		delete $attribs->{$_};
+	}
+	$attribs;
+}
+
+sub _parse_attributes {
+	local $_ = shift;
+	substr($_, 0, 0, '; ') if length $_ and $_ !~ /^;/;
+	my $attribs = {};
+	while (length $_) {
+		s/^;// or $STRICT_PARAMS and do {
+			#carp "Missing semicolon before parameter '$_'";
+			return $attribs;
+		};
+		_clean_comments($_);
+		unless (length $_) {
+			# Some mail software generates a Content-Type like this:
+			# "Content-Type: text/plain;"
+			# RFC 1521 section 3 says a parameter must exist if
+			# there is a semicolon.
+			#carp "Extra semicolon after last parameter" if
+			#$STRICT_PARAMS;
+			return $attribs;
+		}
+		my $attribute;
+		if (s/^($re_token)=//) {
+			$attribute = lc $1;
+		} else {
+			if ($STRICT_PARAMS) {
+				# carp "Illegal parameter '$_'";
+				return $attribs;
+			}
+			if (s/^($re_token_non_strict)=//) {
+				$attribute = lc $1;
+			} else {
+				unless (s/^([^;=\s]+)\s*=//) {
+					#carp "Cannot parse parameter '$_'";
+					return $attribs;
+				}
+				$attribute = lc $1;
+			}
+		}
+		_clean_comments($_);
+		my $value = _extract_attribute_value();
+		$attribs->{$attribute} = $value;
+		_clean_comments($_);
+	}
+	$attribs;
+}
+
+sub _extract_attribute_value { # EXPECTS AND MODIFIES $_
+	my $value;
+	while (length $_) {
+		if (s/^($re_token)//) {
+			$value .= $1;
+		} elsif (s/^$re_quoted_string//) {
+			my $sub = $1;
+			$sub =~ s/\\(.)/$1/g;
+			$value .= $sub;
+		} elsif ($STRICT_PARAMS) {
+			#my $char = substr $_, 0, 1;
+			#carp "Unquoted '$char' not allowed";
+			return;
+		} elsif (s/^($re_token_non_strict)//) {
+			$value .= $1;
+		} elsif (s/^$re_quoted_string_non_strict//) {
+			my $sub = $1;
+			$sub =~ s/\\(.)/$1/g;
+			$value .= $sub;
+		}
+		my $erased = _clean_comments($_);
+		last if !length $_ or /^;/;
+		if ($STRICT_PARAMS) {
+			#my $char = substr $_, 0, 1;
+			#carp "Extra '$char' found after parameter";
+			return;
+		}
+		if ($erased) {
+			# Sometimes semicolon is missing, so check for = char
+			last if m/^$re_token_non_strict=/;
+			$value .= ' ';
+		}
+		$value .= substr $_, 0, 1, '';
+	}
+	$value;
+}
+
+1;
+__END__
+=func parse_content_type
+
+This routine is exported by default.
+
+This routine parses email content type headers according to section 5.1 of RFC
+2045 and also RFC 2231 (Character Set and Parameter Continuations).  It returns
+a hash as above, with entries for the C<type>, the C<subtype>, and a hash of
+C<attributes>.
+
+For backward compatibility with a really unfortunate misunderstanding of RFC
+2045 by the early implementors of this module, C<discrete> and C<composite> are
+also present in the returned hashref, with the values of C<type> and C<subtype>
+respectively.
+
+=func parse_content_disposition
+
+This routine is exported by default.
+
+This routine parses email Content-Disposition headers according to RFC 2183 and
+RFC 2231.  It returns a hash as above, with entries for the C<type>, and a hash
+of C<attributes>.
+
+=cut
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index 5b2914b3..754da13f 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -6,7 +6,7 @@ package PublicInbox::WwwAttach; # internal package
 use strict;
 use warnings;
 use bytes (); # only for bytes::length
-use Email::MIME::ContentType qw(parse_content_type);
+use PublicInbox::EmlContentFoo qw(parse_content_type);
 use PublicInbox::Eml;
 
 sub get_attach_i { # ->each_part callback
diff --git a/t/eml_content_disposition.t b/t/eml_content_disposition.t
new file mode 100644
index 00000000..9bdacc05
--- /dev/null
+++ b/t/eml_content_disposition.t
@@ -0,0 +1,102 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+#  <https://www.gnu.org/licenses/gpl-1.0.txt>
+#  <https://dev.perl.org/licenses/artistic.html>
+use strict;
+use Test::More;
+use PublicInbox::EmlContentFoo qw(parse_content_disposition);
+
+my %cd_tests = (
+	'' => { type => 'attachment', attributes => {} },
+	'inline' => { type => 'inline', attributes => {} },
+	'attachment' => { type => 'attachment', attributes => {} },
+
+	'attachment; filename=genome.jpeg;' .
+	' modification-date="Wed, 12 Feb 1997 16:29:51 -0500"' => {
+		type => 'attachment',
+		attributes => {
+			filename => 'genome.jpeg',
+			'modification-date' => 'Wed, 12 Feb 1997 16:29:51 -0500'
+		}
+	},
+
+	q(attachment; filename*=UTF-8''genome.jpeg;) .
+	q( modification-date="Wed, 12 Feb 1997 16:29:51 -0500") => {
+		type => 'attachment',
+		attributes => {
+			filename => 'genome.jpeg',
+			'modification-date' => 'Wed, 12 Feb 1997 16:29:51 -0500'
+		}
+	},
+
+	q(attachment; filename*0*=us-ascii'en'This%20is%20even%20more%20;) .
+	q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+		type => 'attachment',
+		attributes => {
+			filename => "This is even more ***fun*** isn't it!"
+		}
+	},
+
+	q(attachment; filename*0*='en'This%20is%20even%20more%20;) .
+	q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+		type => 'attachment',
+		attributes => {
+			filename => "This is even more ***fun*** isn't it!"
+		}
+	},
+
+	q(attachment; filename*0*=''This%20is%20even%20more%20;) .
+	q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+		type => 'attachment',
+		attributes => {
+			filename => "This is even more ***fun*** isn't it!"
+		}
+	},
+
+	q(attachment; filename*0*=us-ascii''This%20is%20even%20more%20;).
+	q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+		type => 'attachment',
+		attributes => {
+			filename => "This is even more ***fun*** isn't it!"
+		}
+	},
+);
+
+my %non_strict_cd_tests = (
+	'attachment; filename=genome.jpeg;' .
+	' modification-date="Wed, 12 Feb 1997 16:29:51 -0500";' => {
+		type => 'attachment',
+		attributes => {
+			filename => 'genome.jpeg',
+			'modification-date' =>
+				'Wed, 12 Feb 1997 16:29:51 -0500'
+		}
+	},
+);
+
+sub test {
+	my ($string, $expect, $info) = @_;
+	local $_;
+	$info =~ s/\r/\\r/g;
+	$info =~ s/\n/\\n/g;
+	is_deeply(parse_content_disposition($string), $expect, $info);
+}
+
+for (sort keys %cd_tests) {
+	test($_, $cd_tests{$_}, "Can parse C-D <$_>");
+}
+
+local $PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
+for (sort keys %cd_tests) {
+	test($_, $cd_tests{$_}, "Can parse non-strict C-D <$_>");
+}
+for (sort keys %non_strict_cd_tests) {
+	test($_, $non_strict_cd_tests{$_}, "Can parse non-strict C-D <$_>");
+}
+
+done_testing;
diff --git a/t/eml_content_type.t b/t/eml_content_type.t
new file mode 100644
index 00000000..5fd7d1d9
--- /dev/null
+++ b/t/eml_content_type.t
@@ -0,0 +1,289 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+#  <https://www.gnu.org/licenses/gpl-1.0.txt>
+#  <https://dev.perl.org/licenses/artistic.html>
+use strict;
+use Test::More;
+use PublicInbox::EmlContentFoo qw(parse_content_type);
+
+my %ct_tests = (
+	'' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "us-ascii" }
+	},
+
+	"text/plain" => {
+		type => "text",
+		subtype => "plain",
+		attributes => {}
+	},
+	'text/plain; charset=us-ascii' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "us-ascii" }
+	},
+	'text/plain; charset="us-ascii"' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "us-ascii" }
+	},
+	"text/plain; charset=us-ascii (Plain text)" => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "us-ascii" }
+	},
+
+	'text/plain; charset=ISO-8859-1' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "ISO-8859-1" }
+	},
+	'text/plain; charset="ISO-8859-1"' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "ISO-8859-1" }
+	},
+	'text/plain; charset="ISO-8859-1" (comment)' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "ISO-8859-1" }
+	},
+
+	'(c) text/plain (c); (c) charset=ISO-8859-1 (c)' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "ISO-8859-1" }
+	},
+	'(c \( \\\\) (c) text/plain (c) (c) ; (c) (c) charset=utf-8 (c)' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "utf-8" }
+	},
+	'text/plain; (c (nested ()c)another c)() charset=ISO-8859-1' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "ISO-8859-1" }
+	},
+	'text/plain (c \(!nested ()c\)\)(nested\(c())); charset=utf-8' => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { charset => "utf-8" }
+	},
+
+	"application/foo" => {
+		type       => "application",
+		subtype    => "foo",
+		attributes => {}
+	},
+	"multipart/mixed; boundary=unique-boundary-1" => {
+		type       => "multipart",
+		subtype    => "mixed",
+		attributes => { boundary => "unique-boundary-1" }
+	},
+	'message/external-body; access-type=local-file; name="/u/n/m.jpg"' => {
+		type       => "message",
+		subtype    => "external-body",
+		attributes => {
+			"access-type" => "local-file",
+			"name"        => "/u/n/m.jpg"
+		}
+	},
+	'multipart/mixed; boundary="----------=_1026452699-10321-0" ' => {
+		'type'       => 'multipart',
+		'subtype'    => 'mixed',
+		'attributes' => {
+			'boundary' => '----------=_1026452699-10321-0'
+		}
+	},
+	'multipart/report; boundary= "=_0=73e476c3-cd5a-5ba3-b910-2="' => {
+		'type'       => 'multipart',
+		'subtype'    => 'report',
+		'attributes' => {
+			'boundary' => '=_0=73e476c3-cd5a-5ba3-b910-2='
+		}
+	},
+	'multipart/report; boundary=' . " \t" . '"=_0=7-c-5-b-2="' => {
+		'type'       => 'multipart',
+		'subtype'    => 'report',
+		'attributes' => {
+			'boundary' => '=_0=7-c-5-b-2='
+		}
+	},
+
+	'message/external-body; access-type=URL;' .
+	' URL*0="ftp://";' .
+	' URL*1="example.com/"' => {
+		'type'       => 'message',
+		'subtype'    => 'external-body',
+		'attributes' => {
+			'access-type' => 'URL',
+			'url' => 'ftp://example.com/'
+		}
+	},
+	'message/external-body; access-type=URL; URL="ftp://example.com/"' => {
+		'type'       => 'message',
+		'subtype'    => 'external-body',
+		'attributes' => {
+			'access-type' => 'URL',
+			'url' => 'ftp://example.com/',
+		}
+	},
+
+	"application/x-stuff; title*=us-ascii'en-us'This%20is%20f%2Ad" => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => 'This is f*d'
+		}
+	},
+	"application/x-stuff; title*=us-ascii''This%20is%20f%2Ad" => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => 'This is f*d'
+		}
+	},
+	"application/x-stuff; title*=''This%20is%20f%2Ad" => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => 'This is f*d'
+		}
+	},
+	"application/x-stuff; title*='en-us'This%20is%20f%2Ad" => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => 'This is f*d'
+		}
+	},
+	q(application/x-stuff;) .
+	q( title*0*=us-ascii'en'This%20is%20even%20more%20;) .
+	q(title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => "This is even more ***fun*** isn't it!"
+		}
+	},
+	q(application/x-stuff;) .
+	q( title*0*='en'This%20is%20even%20more%20;) .
+	q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => "This is even more ***fun*** isn't it!"
+		}
+	},
+	q(application/x-stuff;) .
+	q( title*0*=''This%20is%20even%20more%20;) .
+	q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => "This is even more ***fun*** isn't it!"
+		}
+	},
+	q(application/x-stuff;).
+	q( title*0*=us-ascii''This%20is%20even%20more%20;).
+	q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!")
+	  => {
+		'type'       => 'application',
+		'subtype'    => 'x-stuff',
+		'attributes' => {
+			'title' => "This is even more ***fun*** isn't it!"
+		}
+	},
+
+	'text/plain; attribute="v\"v\\\\v\(v\>\<\)\@\,\;\:\/\]\[\?\=v v";' .
+	' charset=us-ascii' => {
+		'type'       => 'text',
+		'subtype'    => 'plain',
+		'attributes' => {
+			'attribute' => 'v"v\\v(v><)@,;:/][?=v v',
+			'charset' => 'us-ascii',
+		},
+	},
+
+	qq(text/plain;\r
+	 charset=us-ascii;\r
+	 attribute="\r value1 \r value2\r\n value3\r\n value4\r\n "\r\n ) => {
+		'type'       => 'text',
+		'subtype'    => 'plain',
+		'attributes' => {
+			'attribute' => ' value1  value2 value3 value4 ',
+			'charset'   => 'us-ascii',
+		},
+	},
+);
+
+my %non_strict_ct_tests = (
+	"text/plain;" => { type => "text", subtype => "plain", attributes => {} },
+	"text/plain; " =>
+	  { type => "text", subtype => "plain", attributes => {} },
+	'image/jpeg;' .
+	' x-mac-type="3F3F3F3F";'.
+	' x-mac-creator="3F3F3F3F" name="file name.jpg";' => {
+		type       => "image",
+		subtype    => "jpeg",
+		attributes => {
+			'x-mac-type'    => "3F3F3F3F",
+			'x-mac-creator' => "3F3F3F3F",
+			'name'          => "file name.jpg"
+		}
+	},
+	"text/plain; key=very long value" => {
+		type       => "text",
+		subtype    => "plain",
+		attributes => { key => "very long value" }
+	},
+	"text/plain; key=very long value key2=value2" => {
+		type    => "text",
+		subtype => "plain",
+		attributes => { key => "very long value", key2 => "value2" }
+	},
+	'multipart/mixed; boundary = "--=_Next_Part_24_Nov_2016_08.09.21"' => {
+		type    => "multipart",
+		subtype => "mixed",
+		attributes => {
+			boundary => "--=_Next_Part_24_Nov_2016_08.09.21"
+		}
+	},
+);
+
+sub test {
+	my ($string, $expect, $info) = @_;
+
+	# So stupid. -- rjbs, 2013-08-10
+	$expect->{discrete}  = $expect->{type};
+	$expect->{composite} = $expect->{subtype};
+
+	local $_;
+	$info =~ s/\r/\\r/g;
+	$info =~ s/\n/\\n/g;
+	is_deeply(parse_content_type($string), $expect, $info);
+}
+
+for (sort keys %ct_tests) {
+	test($_, $ct_tests{$_}, "Can parse C-T <$_>");
+}
+
+local $PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
+for (sort keys %ct_tests) {
+	test($_, $ct_tests{$_}, "Can parse non-strict C-T <$_>");
+}
+for (sort keys %non_strict_ct_tests) {
+	test(
+		$_,
+		$non_strict_ct_tests{$_},
+		"Can parse non-strict C-T <$_>"
+	);
+}
+
+done_testing;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/13] EmlContentFoo: relax Encode version requirement
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (7 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

We want to support Perl v5.10.1 out-of-the-box with minimal
download/installation time.  Installing Encode from CPAN
requires a compiler and lengthy build+install time.

So mimic find_mime_encoding() using what Perl v5.10.1 provides
out-of-the box.
---
 Makefile.PL                      |  2 +-
 lib/PublicInbox/EmlContentFoo.pm | 27 +++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Makefile.PL b/Makefile.PL
index 27bb112c..59345edb 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -130,7 +130,7 @@ WriteMakefile(
 
 		# libperl$PERL_VERSION or libencode-perl on Debian,
 		# `perl5' on FreeBSD
-		'Encode' => 0,
+		'Encode' => 2.35, # 2.35 shipped with 5.10.1
 
 		# libperl$PERL_VERSION + perl-modules-$PERL_VERSION
 		'Compress::Raw::Zlib' => 0,
diff --git a/lib/PublicInbox/EmlContentFoo.pm b/lib/PublicInbox/EmlContentFoo.pm
index f507d548..7472f8d2 100644
--- a/lib/PublicInbox/EmlContentFoo.pm
+++ b/lib/PublicInbox/EmlContentFoo.pm
@@ -9,15 +9,38 @@
 #
 # This license differs from the rest of public-inbox
 #
+# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+#
 # This is a fork of the Email::MIME::ContentType 1.022 with
 # minor improvements and incompatibilities; namely changes to
 # quiet warnings with legacy data.
 package PublicInbox::EmlContentFoo;
 use strict;
 use parent qw(Exporter);
-# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+use v5.10.1;
+
+# find_mime_encoding() only appeared in Encode 2.87+ (Perl 5.26+),
+# while we support 2.35 shipped with Perl 5.10.1
+use Encode 2.35 qw(find_encoding);
+my %mime_name_map; # $enc->mime_name => $enc object
+BEGIN {
+	eval { Encode->import('find_mime_encoding') };
+	if ($@) {
+		*find_mime_encoding = sub { $mime_name_map{lc($_[0])} };
+		%mime_name_map = map {;
+			my $enc = find_encoding($_);
+			my $m = lc($enc->mime_name // '');
+			$m => $enc;
+		} Encode->encodings(':all');
+
+		# delete fallback for encodings w/o ->mime_name:
+		delete $mime_name_map{''};
+
+		# an extra alias see Encode::MIME::NAME
+		$mime_name_map{'utf8'} = find_encoding('UTF-8');
+	}
+}
 
-use Encode 2.87 qw(find_mime_encoding);
 our @EXPORT_OK = qw(parse_content_type parse_content_disposition);
 
 our $STRICT_PARAMS = 1;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (8 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

Since Email::MIME usage is going away, Email::MIME::Encodings
might as well go away, too.  We can also use fewer branches
and just rely on hash lookups, unlike E::M::E.
---
 lib/PublicInbox/Eml.pm | 47 ++++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 1988bdb3..1adaff04 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -30,18 +30,24 @@ use v5.10.1;
 use Carp qw(croak);
 use Encode qw(find_encoding decode encode); # stdlib
 use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
+use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2
+use MIME::QuotedPrint 3.05; # ditto
 
 my $MIME_Header = find_encoding('MIME-Header');
 
 use PublicInbox::EmlContentFoo qw(parse_content_type parse_content_disposition);
-use Email::MIME::Encodings;
 $PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
 
 our $MAXPARTS = 1000; # same as SpamAssassin
 our $MAXDEPTH = 20; # seems enough, Perl sucks, here
 our $MAXBOUNDLEN = 2048; # same as postfix
 
-my $NO_ENCODE_RE = qr/\A(?:7bit|8bit|binary)[ \t]*(?:;|$)?/i;
+my %MIME_ENC = (qp => \&enc_qp, base64 => \&encode_base64);
+my %MIME_DEC = (qp => \&dec_qp, base64 => \&decode_base64);
+$MIME_ENC{quotedprint} = $MIME_ENC{'quoted-printable'} = $MIME_ENC{qp};
+$MIME_DEC{quotedprint} = $MIME_DEC{'quoted-printable'} = $MIME_DEC{qp};
+$MIME_ENC{$_} = \&identity_codec for qw(7bit 8bit binary);
+
 my %DECODE_ADDRESS = map { $_ => 1 } qw(From To Cc Sender Reply-To);
 my %DECODE_FULL = (
 	Subject => 1,
@@ -111,13 +117,6 @@ sub ct ($) {
 	$_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
 }
 
-sub body_decode ($$) {
-	my $cte = header_raw($_[0], 'Content-Transfer-Encoding');
-	($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) if $cte; # For S/MIME, etc
-	(!$cte || $cte =~ $NO_ENCODE_RE) ?
-		$_[1] : Email::MIME::Encodings::decode($cte, $_[1], '7bit');
-}
-
 # returns a queue of sub-parts iff it's worth descending into
 # TODO: descend into message/rfc822 parts (Email::MIME didn't)
 sub mp_descend ($$) {
@@ -197,6 +196,22 @@ sub each_part {
 	}
 }
 
+sub enc_qp {
+	# prevent MIME::QuotedPrint from encoding CR as =0D since it's
+	# against RFCs and breaks MUAs
+	$_[0] =~ s/\r\n/\n/sg;
+	encode_qp($_[0], "\r\n");
+}
+
+sub dec_qp {
+	# RFC 2822 requires all lines to end in CRLF, though... :<
+	$_[0] = decode_qp($_[0]);
+	$_[0] =~ s/\n/\r\n/sg;
+	$_[0]
+}
+
+sub identity_codec { $_[0] }
+
 ########### compatibility section for existing Email::MIME uses #########
 
 sub header_obj {
@@ -240,9 +255,9 @@ EOF
 sub body_set {
 	my ($self, $body) = @_;
 	my $bdy = $self->{bdy} = ref($body) ? $body : \$body;
-	my $cte = header_raw($self, 'Content-Transfer-Encoding');
-	if ($cte && $cte !~ $NO_ENCODE_RE) {
-		$$bdy = Email::MIME::Encodings::encode($cte, $$bdy)
+	if (my $cte = header_raw($self, 'Content-Transfer-Encoding')) {
+		my $enc = $MIME_ENC{lc($cte)} or croak("can't encode `$cte'");
+		$$bdy = $enc->($$bdy); # in-place
 	}
 	undef;
 }
@@ -351,7 +366,13 @@ sub header_str {
 
 sub body_raw { ${$_[0]->{bdy} // \''}; }
 
-sub body { body_decode($_[0], body_raw($_[0])) }
+sub body {
+	my $raw = body_raw($_[0]);
+	my $cte = header_raw($_[0], 'Content-Transfer-Encoding') or return $raw;
+	($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) or return $raw; # For S/MIME, etc
+	my $dec = $MIME_DEC{lc($cte)} or return $raw;
+	$dec->($raw);
+}
 
 sub body_str {
 	my ($self) = @_;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/13] xt: eml comparison tests
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (9 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-08  4:47   ` Eric Wong
  2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
  2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong
  12 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

While our codebase can still work with either MIME
implementation, add comparison tests to ensure we
handle corner cases in existing archives.
---
 MANIFEST         |   2 +
 xt/cmp-msgstr.t  | 108 +++++++++++++++++++++++++++++++++++++++++++++++
 xt/cmp-msgview.t |  95 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 205 insertions(+)
 create mode 100644 xt/cmp-msgstr.t
 create mode 100644 xt/cmp-msgview.t

diff --git a/MANIFEST b/MANIFEST
index 055c8c9a..9c804a07 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -331,6 +331,8 @@ t/www_listing.t
 t/www_static.t
 t/x-unknown-alpine.eml
 t/xcpdb-reshard.t
+xt/cmp-msgstr.t
+xt/cmp-msgview.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
 xt/mem-msgview.t
diff --git a/xt/cmp-msgstr.t b/xt/cmp-msgstr.t
new file mode 100644
index 00000000..6bae0f66
--- /dev/null
+++ b/xt/cmp-msgstr.t
@@ -0,0 +1,108 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+use PublicInbox::View;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use Digest::MD5;
+use PublicInbox::MsgIter;
+require_mods(qw(Data::Dumper Email::MIME));
+Data::Dumper->import('Dumper');
+require PublicInbox::MIME;
+require_git(2.19);
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my @cat = qw(cat-file --buffer --batch-check --batch-all-objects --unordered);
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'cmp' });
+my $git = $ibx->git;
+my $fh = $git->popen(@cat);
+vec(my $vec = '', fileno($fh), 1) = 1;
+select($vec, undef, undef, 60) or die "timed out waiting for --batch-check";
+my $n = 0;
+my $m = 0;
+my $dig_cls = 'Digest::MD5';
+sub h ($) {
+	s/\s+\z//s; # E::M leaves trailing white space
+	s/\s+/ /sg;
+	"$_[0]: $_";
+}
+
+my $cmp = sub {
+	my ($p, $cmp_arg) = @_;
+	my $part = shift @$p;
+	push @$cmp_arg, '---'.join(', ', @$p).'---';
+	my $ct = $part->content_type // 'text/plain';
+	$ct =~ s/[ \t]+.*\z//s;
+	my ($s, $err);
+	eval {
+		push @$cmp_arg, map { h 'f' } $part->header('From');
+		push @$cmp_arg, map { h 't' } $part->header('To');
+		push @$cmp_arg, map { h 'cc' } $part->header('Cc');
+		push @$cmp_arg, map { h 'mid' } $part->header('Message-ID');
+		push @$cmp_arg, map { h 'refs' } $part->header('References');
+		push @$cmp_arg, map { h 'irt' } $part->header('In-Reply-To');
+		push @$cmp_arg, map { h 's' } $part->header('Subject');
+		push @$cmp_arg, map { h 'cd' }
+					$part->header('Content-Description');
+		($s, $err) = msg_part_text($part, $ct);
+		if (defined $s) {
+			$s =~ s/\s+\z//s;
+			push @$cmp_arg, "S: ".$s;
+		} else {
+			$part = $part->body;
+			push @$cmp_arg, "T: $ct";
+			if ($part =~ /[^\p{XPosixPrint}\s]/s) { # binary
+				my $dig = $dig_cls->new;
+				$dig->add($part);
+				push @$cmp_arg, "M: ".$dig->hexdigest;
+				push @$cmp_arg, "B: ".bytes::length($part);
+			} else {
+				$part =~ s/\s+\z//s;
+				push @$cmp_arg, "X: ".$part;
+			}
+		}
+	};
+	if ($@) {
+		$err //= '';
+		push @$cmp_arg, "E: $@ ($err)";
+	}
+};
+
+my $ndiff = 0;
+my $git_cb = sub {
+	my ($bref, $oid) = @_;
+	local $SIG{__WARN__} = sub { diag "$inboxdir $oid ", @_ };
+	++$m;
+	PublicInbox::MIME->new($$bref)->each_part($cmp, my $m_ctx = [], 1);
+	PublicInbox::Eml->new($$bref)->each_part($cmp, my $e_ctx = [], 1);
+	if (join("\0", @$e_ctx) ne join("\0", @$m_ctx)) {
+		++$ndiff;
+		open my $fh, '>', "$tmpdir/mime" or die $!;
+		print $fh Dumper($m_ctx) or die $!;
+		close $fh or die $!;
+		open $fh, '>', "$tmpdir/eml" or die $!;
+		print $fh Dumper($e_ctx) or die $!;
+		close $fh or die $!;
+		diag "$inboxdir $oid differ";
+		# using `git diff', diff(1) may not be installed
+		diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]);
+	}
+};
+$git->cat_async_begin;
+my $t = timeit(1, sub {
+	while (<$fh>) {
+		my ($oid, $type) = split / /;
+		next if $type ne 'blob';
+		++$n;
+		$git->cat_async($oid, $git_cb);
+	}
+	$git->cat_async_wait;
+});
+is($m, $n, "$inboxdir rendered all $m <=> $n messages");
+is($ndiff, 0, "$inboxdir $ndiff differences");
+done_testing();
diff --git a/xt/cmp-msgview.t b/xt/cmp-msgview.t
new file mode 100644
index 00000000..66fb467e
--- /dev/null
+++ b/xt/cmp-msgview.t
@@ -0,0 +1,95 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+use PublicInbox::View;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use Digest::MD5;
+require_git(2.19);
+require_mods qw(Data::Dumper Email::MIME Plack::Util);
+Data::Dumper->import('Dumper');
+require PublicInbox::MIME;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my @cat = qw(cat-file --buffer --batch-check --batch-all-objects --unordered);
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'perf' });
+my $git = $ibx->git;
+my $fh = $git->popen(@cat);
+vec(my $vec = '', fileno($fh), 1) = 1;
+select($vec, undef, undef, 60) or die "timed out waiting for --batch-check";
+my $mime_ctx = {
+	env => { HTTP_HOST => 'example.com', 'psgi.url_scheme' => 'https' },
+	-inbox => $ibx,
+	www => Plack::Util::inline_object(style => sub {''}),
+	obuf => \(my $mime_buf = ''),
+	mhref => '../',
+};
+my $eml_ctx = { %$mime_ctx, obuf => \(my $eml_buf = '') };
+my $n = 0;
+my $m = 0;
+my $ndiff_html = 0;
+my $dig_cls = 'Digest::MD5';
+my $digest_attach = sub { # ensure ->body (not ->body_raw) matches
+	my ($p, $cmp_arg) = @_;
+	my $part = shift @$p;
+	my $dig = $cmp_arg->[0] //= $dig_cls->new;
+	$dig->add($part->body_raw);
+	push @$cmp_arg, join(', ', @$p);
+};
+
+my $git_cb = sub {
+	my ($bref, $oid) = @_;
+	local $SIG{__WARN__} = sub { diag "$inboxdir $oid ", @_ };
+	++$m;
+	my $mime = PublicInbox::MIME->new($$bref);
+	PublicInbox::View::multipart_text_as_html($mime, $mime_ctx);
+	my $eml = PublicInbox::Eml->new($$bref);
+	PublicInbox::View::multipart_text_as_html($eml, $eml_ctx);
+	if ($eml_buf ne $mime_buf) {
+		++$ndiff_html;
+		open my $fh, '>', "$tmpdir/mime" or die $!;
+		print $fh $mime_buf or die $!;
+		close $fh or die $!;
+		open $fh, '>', "$tmpdir/eml" or die $!;
+		print $fh $eml_buf or die $!;
+		close $fh or die $!;
+		# using `git diff', diff(1) may not be installed
+		diag "$inboxdir $oid differs";
+		diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]);
+	}
+	$eml_buf = $mime_buf = '';
+
+	# don't tolerate differences in attachment downloads
+	$mime = PublicInbox::MIME->new($$bref);
+	$mime->each_part($digest_attach, my $mime_cmp = [], 1);
+	$eml = PublicInbox::Eml->new($$bref);
+	$eml->each_part($digest_attach, my $eml_cmp = [], 1);
+	$mime_cmp->[0] = $mime_cmp->[0]->hexdigest;
+	$eml_cmp->[0] = $eml_cmp->[0]->hexdigest;
+	# don't have millions of "ok" lines
+	if (join("\0", @$eml_cmp) ne join("\0", @$mime_cmp)) {
+		diag Dumper([ $oid, eml => $eml_cmp, mime =>$mime_cmp ]);
+		is_deeply($eml_cmp, $mime_cmp, "$inboxdir $oid match");
+	}
+};
+$git->cat_async_begin;
+my $t = timeit(1, sub {
+	while (<$fh>) {
+		my ($oid, $type) = split / /;
+		next if $type ne 'blob';
+		++$n;
+		$git->cat_async($oid, $git_cb);
+	}
+	$git->cat_async_wait;
+});
+is($m, $n, 'rendered all messages');
+
+# we'll tolerate minor differences in HTML rendering
+diag "$ndiff_html HTML differences";
+
+done_testing();

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/13] remove most internal Email::MIME usage
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (10 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

We no longer load or use Email::MIME outside of comparison
tests.
---
 INSTALL                       | 26 +++++------
 Makefile.PL                   |  5 ---
 ci/deps.perl                  |  3 --
 lib/PublicInbox/Import.pm     |  8 ++--
 lib/PublicInbox/MIME.pm       |  3 ++
 lib/PublicInbox/MsgTime.pm    |  8 ++--
 lib/PublicInbox/TestCommon.pm |  3 +-
 t/altid.t                     |  4 +-
 t/altid_v2.t                  |  4 +-
 t/cgi.t                       |  8 ++--
 t/content_id.t                |  6 +--
 t/convert-compact.t           |  4 +-
 t/edit.t                      | 20 ++++-----
 t/feed.t                      |  6 +--
 t/filter_base.t               |  4 +-
 t/filter_mirror.t             |  2 +-
 t/filter_subjecttag.t         |  4 +-
 t/filter_vger.t               |  6 +--
 t/html_index.t                |  4 +-
 t/httpd.t                     |  4 +-
 t/import.t                    |  4 +-
 t/indexlevels-mirror.t        |  4 +-
 t/mda.t                       |  4 +-
 t/mda_filter_rubylang.t       |  2 +-
 t/mid.t                       |  4 +-
 t/mime.t                      | 83 +++++++++++++++++++----------------
 t/msg_iter.t                  |  8 ++--
 t/msgtime.t                   |  6 +--
 t/multi-mid.t                 |  6 +--
 t/nntp.t                      |  4 +-
 t/nntpd-tls.t                 |  4 +-
 t/nntpd.t                     |  6 +--
 t/nulsubject.t                |  2 +-
 t/plack.t                     | 10 ++---
 t/precheck.t                  | 10 ++---
 t/psgi_attach.t               |  2 +-
 t/psgi_bad_mids.t             |  4 +-
 t/psgi_mount.t                |  4 +-
 t/psgi_multipart_not.t        |  4 +-
 t/psgi_scan_all.t             |  4 +-
 t/psgi_search.t               |  8 ++--
 t/psgi_text.t                 |  2 +-
 t/psgi_v2.t                   |  6 +--
 t/purge.t                     |  2 +-
 t/replace.t                   | 12 ++---
 t/reply.t                     |  4 +-
 t/search-thr-index.t          |  6 +--
 t/search.t                    | 26 +++++------
 t/solver_git.t                |  4 +-
 t/spamcheck_spamc.t           |  8 ++--
 t/thread-cycle.t              |  3 +-
 t/time.t                      |  4 +-
 t/v1-add-remove-add.t         |  4 +-
 t/v1reindex.t                 |  4 +-
 t/v2-add-remove-add.t         |  4 +-
 t/v2mda.t                     |  4 +-
 t/v2mirror.t                  |  4 +-
 t/v2reindex.t                 |  8 ++--
 t/v2writable.t                |  8 ++--
 t/watch_filter_rubylang.t     |  2 +-
 t/watch_maildir.t             |  2 +-
 t/watch_maildir_v2.t          |  2 +-
 t/www_altid.t                 |  2 +-
 t/xcpdb-reshard.t             |  4 +-
 xt/msgtime_cmp.t              | 12 ++---
 xt/perf-msgview.t             |  2 +-
 66 files changed, 228 insertions(+), 226 deletions(-)

diff --git a/INSTALL b/INSTALL
index 2dd7dcff..80cee753 100644
--- a/INSTALL
+++ b/INSTALL
@@ -36,15 +36,18 @@ Beyond that, there is a long list of Perl modules required, starting with:
 * Digest::SHA                      typically installed with Perl
                                    rpm: perl-Digest-SHA
 
-* Email::MIME                      deb: libemail-mime-perl
-                                   pkg: p5-Email-MIME
-                                   rpm: perl-Email-MIME
-
 * URI::Escape                      deb: liburi-perl
                                    pkg: p5-URI
                                    rpm: perl-URI
                                    (for HTML/Atom generation)
 
+Email::MIME will be optional as of public-inbox v1.5.0,
+it may still be used in maintainer comparison tests:
+
+* Email::MIME                      deb: libemail-mime-perl
+                                   pkg: p5-Email-MIME
+                                   rpm: perl-Email-MIME
+
 Plack and Date::Parse are optional as of public-inbox v1.3.0,
 but required for older releases:
 
@@ -86,6 +89,11 @@ Numerous optional modules are likely to be useful as well:
                                    (speeds up process spawning on Linux,
                                     see public-inbox-daemon(8))
 
+- Email::Address::XS               deb: libemail-address-xs-perl
+                                   pkg: pkg-Email-Address-XS
+                                   (correct parsing of tricky email
+                                    addresses, phrases and comments)
+
 - Plack::Middleware::ReverseProxy  deb: libplack-middleware-reverseproxy-perl
                                    pkg: p5-Plack-Middleware-ReverseProxy
                                    rpm: perl-Plack-Middleware-ReverseProxy
@@ -108,16 +116,6 @@ Numerous optional modules are likely to be useful as well:
 The following modules are typically pulled in by dependencies listed
 above, so there is no need to explicitly install them:
 
-- Email::MIME::ContentType         deb: libemail-mime-contenttype-perl
-                                   pkg: p5-Email-MIME-ContentType
-                                   rpm: perl-Email-MIME-ContentType
-                                   (pulled in by Email::MIME)
-
-- Email::Simple                    deb: libemail-simple-perl
-                                   pkg: p5-Email-Simple
-                                   rpm: perl-Email-Simple
-                                   (pulled in by Email::MIME)
-
 * Encode                           deb: libperl5.$MINOR (or libencode-perl)
                                    pkg: perl5
                                    rpm: perl-Encode
diff --git a/Makefile.PL b/Makefile.PL
index 59345edb..efbb59cb 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -122,11 +122,6 @@ WriteMakefile(
 		# `perl5' on FreeBSD
 		# perl-Digest-SHA on RH-based
 		'Digest::SHA' => 0,
-		'Email::MIME' => 0,
-
-		# the following should be pulled in by Email::MIME:
-		'Email::MIME::ContentType' => 0,
-		'Email::Simple' => 0,
 
 		# libperl$PERL_VERSION or libencode-perl on Debian,
 		# `perl5' on FreeBSD
diff --git a/ci/deps.perl b/ci/deps.perl
index 06b4fbe0..48aaa9e4 100755
--- a/ci/deps.perl
+++ b/ci/deps.perl
@@ -20,9 +20,6 @@ my $profiles = {
 		perl
 		Devel::Peek
 		Digest::SHA
-		Email::Simple
-		Email::MIME
-		Email::MIME::ContentType
 		Encode
 		ExtUtils::MakeMaker
 		IO::Compress::Gzip
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 98aa7785..07d18599 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -213,13 +213,13 @@ sub get_mark {
 }
 
 # returns undef on non-existent
-# ('MISMATCH', Email::MIME) on mismatch
-# (:MARK, Email::MIME) on success
+# ('MISMATCH', PublicInbox::Eml) on mismatch
+# (:MARK, PublicInbox::Eml) on success
 #
 # v2 callers should check with Xapian before calling this as
 # it is not idempotent.
 sub remove {
-	my ($self, $mime, $msg) = @_; # mime = Email::MIME
+	my ($self, $mime, $msg) = @_; # mime = PublicInbox::Eml or Email::MIME
 
 	my $path_type = $self->{path_type};
 	my ($path, $err, $cur, $blob);
@@ -375,7 +375,7 @@ sub clean_tree_v2 ($$$) {
 # returns undef on duplicate
 # returns the :MARK of the most recent commit
 sub add {
-	my ($self, $mime, $check_cb, $smsg) = @_; # mime = Email::MIME
+	my ($self, $mime, $check_cb, $smsg) = @_;
 
 	my ($name, $email, $at, $ct, $subject) = extract_cmt_info($mime, $smsg);
 	my $path_type = $self->{path_type};
diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm
index b795b93b..9077386a 100644
--- a/lib/PublicInbox/MIME.pm
+++ b/lib/PublicInbox/MIME.pm
@@ -3,6 +3,9 @@
 #
 # The license for this file differs from the rest of public-inbox.
 #
+# We no longer load this in any of our code outside of maintainer
+# tests for compatibility.
+#
 # It monkey patches the "parts_multipart" subroutine with patches
 # from Matthew Horsfall <wolfsage@gmail.com> at:
 #
diff --git a/lib/PublicInbox/MsgTime.pm b/lib/PublicInbox/MsgTime.pm
index 920e8f8a..8596f01c 100644
--- a/lib/PublicInbox/MsgTime.pm
+++ b/lib/PublicInbox/MsgTime.pm
@@ -138,7 +138,7 @@ sub time_response ($) {
 }
 
 sub msg_received_at ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my @recvd = $hdr->header_raw('Received');
 	my ($ts);
 	foreach my $r (@recvd) {
@@ -153,7 +153,7 @@ sub msg_received_at ($) {
 }
 
 sub msg_date_only ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my @date = $hdr->header_raw('Date');
 	my ($ts);
 	foreach my $d (@date) {
@@ -168,7 +168,7 @@ sub msg_date_only ($) {
 
 # Favors Received header for sorting globally
 sub msg_timestamp ($;$) {
-	my ($hdr, $fallback) = @_; # Email::MIME::Header
+	my ($hdr, $fallback) = @_; # PublicInbox::Eml
 	my $ret;
 	$ret = msg_received_at($hdr) and return time_response($ret);
 	$ret = msg_date_only($hdr) and return time_response($ret);
@@ -177,7 +177,7 @@ sub msg_timestamp ($;$) {
 
 # Favors the Date: header for display and sorting within a thread
 sub msg_datestamp ($;$) {
-	my ($hdr, $fallback) = @_; # Email::MIME::Header
+	my ($hdr, $fallback) = @_; # PublicInbox::Eml
 	my $ret;
 	$ret = msg_date_only($hdr) and return time_response($ret);
 	$ret = msg_received_at($hdr) and return time_response($ret);
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 978c3cd7..d952ee6d 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -8,7 +8,6 @@ use parent qw(Exporter);
 use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
 use POSIX qw(dup2);
 use IO::Socket::INET;
-use PublicInbox::MIME; # temporary
 our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
 	run_script start_script key2sub xsys xqx mime_load eml_load);
 
@@ -23,7 +22,7 @@ sub mime_load ($) {
 sub eml_load ($) {
 	my ($path, $cb) = @_;
 	open(my $fh, '<', $path) or die "open $path: $!";
-	binmode $fh;
+	require PublicInbox::Eml;
 	PublicInbox::Eml->new(\(do { local $/; <$fh> }));
 }
 
diff --git a/t/altid.t b/t/altid.t
index c7a3601a..670a3963 100644
--- a/t/altid.t
+++ b/t/altid.t
@@ -4,7 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_mods(qw(DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::Msgmap';
 use_ok 'PublicInbox::SearchIdx';
@@ -27,7 +27,7 @@ my $ibx;
 	my $git = PublicInbox::Git->new($git_dir);
 	my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
 	$im->init_bare;
-	$im->add(PublicInbox::MIME->new(<<'EOF'));
+	$im->add(PublicInbox::Eml->new(<<'EOF'));
 From: a@example.com
 To: b@example.com
 Subject: boo!
diff --git a/t/altid_v2.t b/t/altid_v2.t
index 3ac294f0..28a047d9 100644
--- a/t/altid_v2.t
+++ b/t/altid_v2.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 require_git(2.6);
 require_mods(qw(DBD::SQLite Search::Xapian));
@@ -31,7 +31,7 @@ my $ibx = {
 };
 $ibx = PublicInbox::Inbox->new($ibx);
 my $v2w = PublicInbox::V2Writable->new($ibx, 1);
-$v2w->add(PublicInbox::MIME->new(<<'EOF'));
+$v2w->add(PublicInbox::Eml->new(<<'EOF'));
 From: a@example.com
 To: b@example.com
 Subject: boo!
diff --git a/t/cgi.t b/t/cgi.t
index 42a343d3..d1f97150 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -5,7 +5,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 use PublicInbox::Import;
 require_mods(qw(Plack::Handler::CGI Plack::Util));
@@ -45,7 +45,7 @@ my $im = PublicInbox::InboxWritable->new($ibx)->importer;
 	local $ENV{HOME} = $home;
 
 	# inject some messages:
-	my $mime = PublicInbox::MIME->new(<<EOF);
+	my $mime = PublicInbox::Eml->new(<<EOF);
 From: Me <me\@example.com>
 To: You <you\@example.com>
 Cc: $addr
@@ -62,7 +62,7 @@ EOF
 	ok($im->add($mime), 'added big message');
 
 	# deliver a reply, too
-	$mime = PublicInbox::MIME->new(<<EOF);
+	$mime = PublicInbox::Eml->new(<<EOF);
 From: You <you\@example.com>
 To: Me <me\@example.com>
 Cc: $addr
@@ -79,7 +79,7 @@ EOF
 	ok($im->add($mime), 'added reply');
 
 	my $slashy_mid = 'slashy/asdf@example.com';
-	my $slashy = PublicInbox::MIME->new(<<EOF);
+	my $slashy = PublicInbox::Eml->new(<<EOF);
 From: You <you\@example.com>
 To: Me <me\@example.com>
 Cc: $addr
diff --git a/t/content_id.t b/t/content_id.t
index 0325164d..9df81aa8 100644
--- a/t/content_id.t
+++ b/t/content_id.t
@@ -4,9 +4,9 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::ContentId qw(content_id);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: b@example.com
 Subject: this is a subject
@@ -17,7 +17,7 @@ hello world
 EOF
 
 my $orig = content_id($mime);
-my $reload = content_id(PublicInbox::MIME->new($mime->as_string));
+my $reload = content_id(PublicInbox::Eml->new($mime->as_string));
 is($orig, $reload, 'content_id matches after serialization');
 
 foreach my $h (qw(From To Cc)) {
diff --git a/t/convert-compact.t b/t/convert-compact.t
index 1627e019..80efc19c 100644
--- a/t/convert-compact.t
+++ b/t/convert-compact.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Spawn qw(which);
 use PublicInbox::TestCommon;
 require_git(2.6);
@@ -26,7 +26,7 @@ ok(PublicInbox::Import::run_die([qw(git) , "--git-dir=$ibx->{inboxdir}",
 	qw(config core.sharedRepository 0644)]), 'set sharedRepository');
 $ibx = PublicInbox::Inbox->new($ibx);
 my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: b@example.com
 Subject: this is a subject
diff --git a/t/edit.t b/t/edit.t
index d8833f9c..1a5698f6 100644
--- a/t/edit.t
+++ b/t/edit.t
@@ -28,7 +28,7 @@ my $file = 't/data/0001.patch';
 open my $fh, '<', $file or die "open: $!";
 my $raw = do { local $/; <$fh> };
 my $im = $ibx->importer(0);
-my $mime = PublicInbox::MIME->new($raw);
+my $mime = PublicInbox::Eml->new($raw);
 my $mid = mid_clean($mime->header('Message-Id'));
 ok($im->add($mime), 'add message to be edited');
 $im->done;
@@ -41,7 +41,7 @@ $t = '-F FILE'; {
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean prefix/bool pfx/'";
 	$cmd = [ '-edit', "-F$file", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t edit OK");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
 	like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
 }
@@ -51,7 +51,7 @@ $t = '-m MESSAGE_ID'; {
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'";
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t edit OK");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->header('Subject'), qr/boolean prefix/, "$t message edited");
 	like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
 }
@@ -63,7 +63,7 @@ $t = 'no-op -m MESSAGE_ID'; {
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds");
 	my $prev = $cur;
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	is_deeply($cur, $prev, "$t makes no change");
 	like($cur->header('Subject'), qr/boolean prefix/,
 		"$t does not change message");
@@ -79,7 +79,7 @@ $t = 'no-op -m MESSAGE_ID w/Status: header'; { # because mutt does it
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds");
 	my $prev = $cur;
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	is_deeply($cur, $prev, "$t makes no change");
 	like($cur->header('Subject'), qr/boolean prefix/,
 		"$t does not change message");
@@ -94,7 +94,7 @@ $t = '-m MESSAGE_ID can change Received: headers'; {
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^Subject:.*/Received: x\\n\$&/'";
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->header('Subject'), qr/boolean prefix/,
 		"$t does not change Subject");
 	is($cur->header('Received'), 'x', 'added Received header');
@@ -127,7 +127,7 @@ $t = 'mailEditor set in config'; {
 	local $ENV{GIT_EDITOR} = 'echo should not run';
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t edited message");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
 	unlike($out, qr/should not run/, 'did not run GIT_EDITOR');
 }
@@ -137,20 +137,20 @@ $t = '--raw and mbox escaping'; {
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^\$/\\nFrom not mbox\\n/'";
 	$cmd = [ '-edit', "-m$mid", '--raw', $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->body, qr/^From not mbox/sm, 'put "From " line into body');
 
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^>From not/\$& an/'";
 	$cmd = [ '-edit', "-m$mid", $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds with mbox escaping");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	like($cur->body, qr/^From not an mbox/sm,
 		'changed "From " line unescaped');
 
 	local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^From not an mbox\\n//s'";
 	$cmd = [ '-edit', "-m$mid", '--raw', $inboxdir ];
 	ok(run_script($cmd, undef, $opt), "$t succeeds again");
-	$cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+	$cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
 	unlike($cur->body, qr/^From not an mbox/sm, "$t restored body");
 }
 
diff --git a/t/feed.t b/t/feed.t
index 373a1de8..5ad90a07 100644
--- a/t/feed.t
+++ b/t/feed.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Feed;
 use PublicInbox::Import;
 use PublicInbox::Inbox;
@@ -36,7 +36,7 @@ my $im = PublicInbox::Import->new($git, $ibx->{name}, 'test@example');
 {
 	$im->init_bare;
 	foreach my $i (1..6) {
-		my $mime = PublicInbox::MIME->new(<<EOF);
+		my $mime = PublicInbox::Eml->new(<<EOF);
 From: ME <me\@example.com>
 To: U <u\@example.com>
 Message-Id: <$i\@example.com>
@@ -95,7 +95,7 @@ EOF
 	# add a new spam message
 	my $spam;
 	{
-		$spam = PublicInbox::MIME->new(<<EOF);
+		$spam = PublicInbox::Eml->new(<<EOF);
 From: SPAMMER <spammer\@example.com>
 To: U <u\@example.com>
 Message-Id: <this-is-spam\@example.com>
diff --git a/t/filter_base.t b/t/filter_base.t
index bbd64189..47d0220f 100644
--- a/t/filter_base.t
+++ b/t/filter_base.t
@@ -21,13 +21,13 @@ use_ok 'PublicInbox::Filter::Base';
 
 {
 	my $f = PublicInbox::Filter::Base->new;
-	my $email = mime_load 't/filter_base-xhtml.eml';
+	my $email = eml_load 't/filter_base-xhtml.eml';
 	is($f->delivery($email), 100, "xhtml rejected");
 }
 
 {
 	my $f = PublicInbox::Filter::Base->new;
-	my $email = mime_load 't/filter_base-junk.eml';
+	my $email = eml_load 't/filter_base-junk.eml';
 	is($f->delivery($email), 100, 'proprietary format rejected on glob');
 }
 
diff --git a/t/filter_mirror.t b/t/filter_mirror.t
index 0e641a03..5bc7f3f4 100644
--- a/t/filter_mirror.t
+++ b/t/filter_mirror.t
@@ -9,7 +9,7 @@ use_ok 'PublicInbox::Filter::Mirror';
 my $f = PublicInbox::Filter::Mirror->new;
 ok($f, 'created PublicInbox::Filter::Mirror object');
 {
-	my $email = mime_load 't/mda-mime.eml';
+	my $email = eml_load 't/mda-mime.eml';
 	is($f->ACCEPT, $f->delivery($email), 'accept any trash that comes');
 }
 
diff --git a/t/filter_subjecttag.t b/t/filter_subjecttag.t
index 9b397b8c..75effa27 100644
--- a/t/filter_subjecttag.t
+++ b/t/filter_subjecttag.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use_ok 'PublicInbox::Filter::SubjectTag';
 
 my $f = eval { PublicInbox::Filter::SubjectTag->new };
@@ -11,7 +11,7 @@ like($@, qr/tag not defined/, 'error without args');
 $f = PublicInbox::Filter::SubjectTag->new('-tag', '[foo]');
 is(ref $f, 'PublicInbox::Filter::SubjectTag', 'new object created');
 
-my $mime = PublicInbox::MIME->new(<<EOF);
+my $mime = PublicInbox::Eml->new(<<EOF);
 To: you <you\@example.com>
 Subject: =?UTF-8?B?UmU6IFtmb29dIEVsw4PCqWFub3I=?=
 
diff --git a/t/filter_vger.t b/t/filter_vger.t
index 9d71f16d..ca5a6ca7 100644
--- a/t/filter_vger.t
+++ b/t/filter_vger.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use_ok 'PublicInbox::Filter::Vger';
 
 my $f = PublicInbox::Filter::Vger->new;
@@ -21,7 +21,7 @@ More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 EOF
 
-	my $mime = PublicInbox::MIME->new($lkml);
+	my $mime = PublicInbox::Eml->new($lkml);
 	$mime = $f->delivery($mime);
 	is("keep this\n", $mime->body, 'normal message filtered OK');
 }
@@ -37,7 +37,7 @@ the body of a message to majordomo@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 EOF
 
-	my $mime = PublicInbox::MIME->new($no_nl);
+	my $mime = PublicInbox::Eml->new($no_nl);
 	$mime = $f->delivery($mime);
 	is('OSX users :P', $mime->body, 'missing trailing LF in original OK');
 }
diff --git a/t/html_index.t b/t/html_index.t
index 51897532..80f81577 100644
--- a/t/html_index.t
+++ b/t/html_index.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Feed;
 use PublicInbox::Git;
 use PublicInbox::Import;
@@ -32,7 +32,7 @@ my $im = PublicInbox::Import->new($git, 'tester', 'test@example');
 			$mid_line .= "In-Reply-To: $prev";
 		}
 		$prev = $mid;
-		my $mime = PublicInbox::MIME->new(<<EOF);
+		my $mime = PublicInbox::Eml->new(<<EOF);
 From: ME <me\@example.com>
 To: U <u\@example.com>
 $mid_line
diff --git a/t/httpd.t b/t/httpd.t
index f4fbd533..7404eb8b 100644
--- a/t/httpd.t
+++ b/t/httpd.t
@@ -4,7 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use Socket qw(IPPROTO_TCP SOL_SOCKET);
 require_mods(qw(Plack::Util Plack::Builder HTTP::Date HTTP::Status));
 
@@ -28,7 +28,7 @@ use_ok 'PublicInbox::Import';
 
 	# ensure successful message delivery
 	{
-		my $mime = PublicInbox::MIME->new(<<EOF);
+		my $mime = PublicInbox::Eml->new(<<EOF);
 From: Me <me\@example.com>
 To: You <you\@example.com>
 Cc: $addr
diff --git a/t/import.t b/t/import.t
index ba4abd9c..3f308299 100644
--- a/t/import.t
+++ b/t/import.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Git;
 use PublicInbox::Import;
 use PublicInbox::Spawn qw(spawn);
@@ -15,7 +15,7 @@ my ($dir, $for_destroy) = tmpdir();
 my $git = PublicInbox::Git->new($dir);
 my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
 $im->init_bare;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: b@example.com
 Subject: this is a subject
diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
index dcd5dc39..704f7e11 100644
--- a/t/indexlevels-mirror.t
+++ b/t/indexlevels-mirror.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Inbox;
 use PublicInbox::InboxWritable;
 require PublicInbox::Admin;
@@ -12,7 +12,7 @@ my $PI_TEST_VERSION = $ENV{PI_TEST_VERSION} || 2;
 require_git('2.6') if $PI_TEST_VERSION == 2;
 require_mods(qw(DBD::SQLite));
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/t/mda.t b/t/mda.t
index 03cc4bc3..759c0b02 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -62,7 +62,7 @@ local $ENV{GIT_COMMITTER_NAME} = eval {
 	use PublicInbox::MDA;
 	use PublicInbox::Address;
 	use Encode qw/encode/;
-	my $msg = mime_load 't/utf8.eml';
+	my $msg = eml_load 't/utf8.eml';
 	my $from = $msg->header('From');
 	my ($author) = PublicInbox::Address::names($from);
 	my ($email) = PublicInbox::Address::emails($from);
@@ -229,7 +229,7 @@ EOF
 		"learned ham idempotently ");
 
 	# ensure trained email is filtered, too
-	my $mime = mime_load 't/mda-mime.eml';
+	my $mime = eml_load 't/mda-mime.eml';
 	($mid) = ($mime->header_raw('message-id') =~ /<([^>]+)>/);
 	{
 		$in = $mime->as_string;
diff --git a/t/mda_filter_rubylang.t b/t/mda_filter_rubylang.t
index f2cbe9d5..483fcb85 100644
--- a/t/mda_filter_rubylang.t
+++ b/t/mda_filter_rubylang.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 require_git(2.6);
diff --git a/t/mid.t b/t/mid.t
index 0ad81d7d..3b8f4108 100644
--- a/t/mid.t
+++ b/t/mid.t
@@ -2,7 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::MID qw(mid_escape mids references mids_for_index id_compress);
 
 is(mid_escape('foo!@(bar)'), 'foo!@(bar)');
@@ -16,7 +16,7 @@ like(id_compress('foo%bar@wtf'), qr/\A[a-f0-9]{40}\z/,
 is(id_compress('foobar-wtf'), 'foobar-wtf', 'regular ID not compressed');
 
 {
-	my $mime = PublicInbox::MIME->new("Message-ID: <mid-1\@a>\n\n");
+	my $mime = PublicInbox::Eml->new("Message-ID: <mid-1\@a>\n\n");
 	$mime->header_set('X-Alt-Message-ID', '<alt-id-for-nntp>');
 	is_deeply(['mid-1@a'], mids($mime->header_obj), 'mids in common case');
 	$mime->header_set('Message-Id', '<mid-1@a>', '<mid-2@b>');
diff --git a/t/mime.t b/t/mime.t
index b9a4d66b..d17ec58e 100644
--- a/t/mime.t
+++ b/t/mime.t
@@ -1,16 +1,23 @@
+#!perl -w
 # Copyright (C) 2017-2020 all contributors <meta@public-inbox.org>
 # This library is free software; you can redistribute it and/or modify
 # it under the same terms as Perl itself.
 # Artistic or GPL-1+ <https://www.gnu.org/licenses/gpl-1.0.txt>
 use strict;
-use warnings;
 use Test::More;
-use_ok 'PublicInbox::MIME';
+use PublicInbox::TestCommon;
 use PublicInbox::MsgIter;
-
-local $SIG{__WARN__} = sub {};
-my $msg = PublicInbox::MIME->new(
-'From:   Richard Hansen <hansenr@google.com>
+my @classes = qw(PublicInbox::Eml);
+SKIP: {
+	require_mods('Email::MIME', 1);
+	push @classes, 'PublicInbox::MIME';
+};
+use_ok $_ for @classes;
+local $SIG{__WARN__} = sub {}; # needed for old Email::Simple (used by E::M)
+
+for my $cls (@classes) {
+	my $msg = PublicInbox::MIME->new(<<'EOF');
+From:   Richard Hansen <hansenr@google.com>
 To:     git@vger.kernel.org
 Cc:     Richard Hansen <hansenr@google.com>
 Subject: [PATCH 0/2] minor diff orderfile documentation improvements
@@ -40,10 +47,11 @@ Content-Description: (truncated) S/MIME Cryptographic Signature
 dkTlB69771K2eXK4LcHSH/2LqX+VYa3K44vrx1ruzjXdNWzIpKBy0weFNiwnJCGofvCysM2RCSI1
 --94eb2c0bc864b76ba30545b2bca9--
 
-');
+EOF
 
-my @parts = $msg->parts;
-my $exp = 'Richard Hansen (2):
+	my @parts = $msg->parts;
+	my $exp = <<EOF;
+Richard Hansen (2):
   diff: document behavior of relative diff.orderFile
   diff: document the pattern format for diff.orderFile
 
@@ -51,13 +59,12 @@ my $exp = 'Richard Hansen (2):
  Documentation/diff-options.txt | 3 ++-
  2 files changed, 6 insertions(+), 2 deletions(-)
 
-';
-
-ok($msg->isa('Email::MIME'), 'compatible with Email::MIME');
-is($parts[0]->body, $exp, 'body matches expected');
+EOF
 
+	is($parts[0]->body, $exp, 'body matches expected');
 
-my $raw = q^Date:   Wed, 18 Jan 2017 13:28:32 -0500
+	my $raw = <<'EOF';
+Date:   Wed, 18 Jan 2017 13:28:32 -0500
 From:   Santiago Torres <santiago@nyu.edu>
 To:     Junio C Hamano <gitster@pobox.com>
 Cc:     git@vger.kernel.org, peff@peff.net, sunshine@sunshineco.com,
@@ -92,28 +99,30 @@ Content-Type: application/pgp-signature; name="signature.asc"
 
 --r24xguofrazenjwe--
 
-^;
-
-$msg = PublicInbox::MIME->new($raw);
-my $nr = 0;
-msg_iter($msg, sub {
-	my ($part, $level, @ex) = @{$_[0]};
-	is($level, 1, 'at expected level');
-	if (join('fail if $#ex > 0', @ex) eq '1') {
-		is($part->body_str, "your tree directly? \r\n", 'body OK');
-	} elsif (join('fail if $#ex > 0', @ex) eq '2') {
-		is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
-				"=7wIb\n" .
-				"-----END PGP SIGNATURE-----\n",
-			'sig "matches"');
-	} else {
-		fail "unexpected part\n";
-	}
-	$nr++;
-});
-
-is($nr, 2, 'got 2 parts');
-is($msg->as_string, $raw,
-	'stringified sufficiently close to original');
+EOF
+
+	$msg = $cls->new($raw);
+	my $nr = 0;
+	msg_iter($msg, sub {
+		my ($part, $level, @ex) = @{$_[0]};
+		is($level, 1, 'at expected level');
+		if (join('fail if $#ex > 0', @ex) eq '1') {
+			is($part->body_str, "your tree directly? \r\n",
+			'body OK');
+		} elsif (join('fail if $#ex > 0', @ex) eq '2') {
+			is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
+					"=7wIb\n" .
+					"-----END PGP SIGNATURE-----\n",
+				'sig "matches"');
+		} else {
+			fail "unexpected part\n";
+		}
+		$nr++;
+	});
+
+	is($nr, 2, 'got 2 parts');
+	is($msg->as_string, $raw,
+		'stringified sufficiently close to original');
+}
 
 done_testing();
diff --git a/t/msg_iter.t b/t/msg_iter.t
index e8115e25..4ee3a201 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -8,7 +8,7 @@ use PublicInbox::Hval qw(ascii_html);
 use_ok('PublicInbox::MsgIter');
 
 {
-	my $mime = mime_load 't/msg_iter-order.eml';
+	my $mime = eml_load 't/msg_iter-order.eml';
 	my @parts;
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
@@ -20,7 +20,7 @@ use_ok('PublicInbox::MsgIter');
 }
 
 {
-	my $mime = mime_load 't/msg_iter-nested.eml';
+	my $mime = eml_load 't/msg_iter-nested.eml';
 	my @parts;
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
@@ -33,7 +33,7 @@ use_ok('PublicInbox::MsgIter');
 }
 
 {
-	my $mime = mime_load 't/iso-2202-jp.eml';
+	my $mime = eml_load 't/iso-2202-jp.eml';
 	my $raw = '';
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
@@ -46,7 +46,7 @@ use_ok('PublicInbox::MsgIter');
 }
 
 {
-	my $mime = mime_load 't/x-unknown-alpine.eml';
+	my $mime = eml_load 't/x-unknown-alpine.eml';
 	my $raw = '';
 	msg_iter($mime, sub {
 		my ($part, $level, @ex) = @{$_[0]};
diff --git a/t/msgtime.t b/t/msgtime.t
index d9f8e641..89fd9e37 100644
--- a/t/msgtime.t
+++ b/t/msgtime.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::MsgTime;
 use PublicInbox::TestCommon;
 
@@ -11,7 +11,7 @@ our $received_date = 'Mon, 22 Jan 2007 13:16:24 -0500';
 sub datestamp ($) {
 	my ($date) = @_;
 	local $SIG{__WARN__} = sub {};  # Suppress warnings
-	my $mime = PublicInbox::MIME->new(<<"EOF");
+	my $mime = PublicInbox::Eml->new(<<"EOF");
 From: a\@example.com
 To: b\@example.com
 Subject: this is a subject
@@ -30,7 +30,7 @@ EOF
 sub timestamp ($) {
 	my ($received) = @_;
 	local $SIG{__WARN__} = sub {};  # Suppress warnings
-	my $mime = PublicInbox::MIME->new(<<"EOF");
+	my $mime = PublicInbox::Eml->new(<<"EOF");
 From: a\@example.com
 To: b\@example.com
 Subject: this is a subject
diff --git a/t/multi-mid.t b/t/multi-mid.t
index 5afb9693..91c8597e 100644
--- a/t/multi-mid.t
+++ b/t/multi-mid.t
@@ -2,7 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 use PublicInbox::InboxWritable;
 require_git(2.6);
@@ -11,7 +11,7 @@ require PublicInbox::SearchIdx;
 my $delay = $ENV{TEST_DELAY_CONVERT};
 
 my $addr = 'test@example.com';
-my $bad = PublicInbox::MIME->new(<<EOF);
+my $bad = PublicInbox::Eml->new(<<EOF);
 Message-ID: <a\@example.com>
 Message-ID: <b\@example.com>
 From: a\@example.com
@@ -20,7 +20,7 @@ Subject: bad
 
 EOF
 
-my $good = PublicInbox::MIME->new(<<EOF);
+my $good = PublicInbox::Eml->new(<<EOF);
 Message-ID: <b\@example.com>
 From: b\@example.com
 To: $addr
diff --git a/t/nntp.t b/t/nntp.t
index 35fb55b4..2a9f3a4f 100644
--- a/t/nntp.t
+++ b/t/nntp.t
@@ -4,7 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_mods(qw(DBD::SQLite Data::Dumper));
 use_ok 'PublicInbox::NNTP';
 use_ok 'PublicInbox::Inbox';
@@ -107,7 +107,7 @@ use_ok 'PublicInbox::Inbox';
 					url => [ '//example.com/a' ]});
 	is($ng->base_url, $u, 'URL expanded');
 	my $mid = 'a@b';
-	my $mime = PublicInbox::MIME->new("Message-ID: <$mid>\r\n\r\n");
+	my $mime = PublicInbox::Eml->new("Message-ID: <$mid>\r\n\r\n");
 	my $hdr = $mime->header_obj;
 	my $mock_self = { nntpd => { grouplist => [], 
 				     servername => 'example.com' } };
diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t
index 0ad29be0..3de219f1 100644
--- a/t/nntpd-tls.t
+++ b/t/nntpd-tls.t
@@ -23,7 +23,7 @@ unless (-r $key && -r $cert) {
 use_ok 'PublicInbox::TLS';
 use_ok 'IO::Socket::SSL';
 require PublicInbox::InboxWritable;
-require PublicInbox::MIME;
+require PublicInbox::Eml;
 require PublicInbox::SearchIdx;
 our $need_zlib;
 eval { require Compress::Raw::Zlib } or
@@ -63,7 +63,7 @@ EOF
 
 {
 	my $im = $ibx->importer(0);
-	my $mime = mime_load 't/data/0001.patch';
+	my $mime = eml_load 't/data/0001.patch';
 	ok($im->add($mime), 'message added');
 	$im->done;
 	if ($version == 1) {
diff --git a/t/nntpd.t b/t/nntpd.t
index 4993b29f..69f72ce1 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -7,7 +7,7 @@ use PublicInbox::TestCommon;
 use PublicInbox::Spawn qw(which);
 require_mods(qw(DBD::SQLite));
 require PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use IO::Socket;
 use Socket qw(IPPROTO_TCP TCP_NODELAY);
 use Net::NNTP;
@@ -57,7 +57,7 @@ $ibx = PublicInbox::Inbox->new($ibx);
 
 	# ensure successful message delivery
 	{
-		my $mime = PublicInbox::MIME->new(<<EOF);
+		my $mime = PublicInbox::Eml->new(<<EOF);
 To: =?utf-8?Q?El=C3=A9anor?= <you\@example.com>
 From: =?utf-8?Q?El=C3=A9anor?= <me\@example.com>
 Cc: $addr
@@ -241,7 +241,7 @@ EOF
 		ok($date <= $t1, 'valid date before stop');
 	}
 	if ('leafnode interop') {
-		my $for_leafnode = PublicInbox::MIME->new(<<"");
+		my $for_leafnode = PublicInbox::Eml->new(<<"");
 From: longheader\@example.com
 To: $addr
 Subject: none
diff --git a/t/nulsubject.t b/t/nulsubject.t
index 03b1ee80..ccb60d52 100644
--- a/t/nulsubject.t
+++ b/t/nulsubject.t
@@ -14,7 +14,7 @@ my $git_dir = "$tmpdir/a.git";
 	my $git = PublicInbox::Git->new($git_dir);
 	my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
 	$im->init_bare;
-	$im->add(PublicInbox::MIME->new(<<'EOF'));
+	$im->add(PublicInbox::Eml->new(<<'EOF'));
 From: a@example.com
 To: b@example.com
 Subject: A subject line with a null =?iso-8859-1?q?=00?= see!
diff --git a/t/plack.t b/t/plack.t
index 4fff9773..37a6b394 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -31,7 +31,7 @@ my $git = PublicInbox::Git->new($inboxdir);
 my $im = PublicInbox::Import->new($git, 'test', $addr);
 # ensure successful message delivery
 {
-	my $mime = PublicInbox::MIME->new(<<EOF);
+	my $mime = PublicInbox::Eml->new(<<EOF);
 From: Me <me\@example.com>
 To: You <you\@example.com>
 Cc: $addr
@@ -50,15 +50,15 @@ EOF
 	chomp @ls;
 
 	# multipart with two text bodies
-	$mime = mime_load 't/plack-2-txt-bodies.eml';
+	$mime = eml_load 't/plack-2-txt-bodies.eml';
 	$im->add($mime);
 
 	# multipart with attached patch + filename
-	$mime = mime_load 't/plack-attached-patch.eml';
+	$mime = eml_load 't/plack-attached-patch.eml';
 	$im->add($mime);
 
 	# multipart collapsed to single quoted-printable text/plain
-	$mime = mime_load 't/plack-qp.eml';
+	$mime = eml_load 't/plack-qp.eml';
 	like($mime->body_raw, qr/hi =3D bye=/, 'our test used QP correctly');
 	$im->add($mime);
 
@@ -77,7 +77,7 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
 :(
 EOF
 	$crlf =~ s/\n/\r\n/sg;
-	$im->add(PublicInbox::MIME->new($crlf));
+	$im->add(PublicInbox::Eml->new($crlf));
 
 	$im->done;
 }
diff --git a/t/precheck.t b/t/precheck.t
index a8fd31b1..11193e38 100644
--- a/t/precheck.t
+++ b/t/precheck.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use Email::Simple;
+use PublicInbox::Eml;
 use PublicInbox::MDA;
 
 sub do_checks {
@@ -27,7 +27,7 @@ sub do_checks {
 }
 
 {
-	my $s = Email::Simple->new(<<'EOF');
+	my $s = PublicInbox::Eml->new(<<'EOF');
 From: abc@example.com
 To: abc@example.com
 Cc: c@example.com, another-list@example.com
@@ -43,7 +43,7 @@ EOF
 }
 
 {
-	do_checks(Email::Simple->new(<<'EOF'));
+	do_checks(PublicInbox::Eml->new(<<'EOF'));
 From: a@example.com
 To: b@example.com
 Cc: c@example.com
@@ -57,7 +57,7 @@ EOF
 }
 
 {
-	do_checks(Email::Simple->new(<<'EOF'));
+	do_checks(PublicInbox::Eml->new(<<'EOF'));
 From: a@example.com
 To: b+plus@example.com
 Cc: John Doe <c@example.com>
@@ -72,7 +72,7 @@ EOF
 
 {
 	my $recipient = 'b@example.com';
-	my $s = Email::Simple->new(<<'EOF');
+	my $s = PublicInbox::Eml->new(<<'EOF');
 To: b@example.com
 Cc: c@example.com
 Content-Type: text/plain
diff --git a/t/psgi_attach.t b/t/psgi_attach.t
index af0fbdd3..9a2b2411 100644
--- a/t/psgi_attach.t
+++ b/t/psgi_attach.t
@@ -29,7 +29,7 @@ $im->init_bare;
 	my $b64 = "b64\xde\xad\xbe\xef\n";
 	my $txt = "plain\ntext\npass\nthrough\n";
 	my $dot = "dotfile\n";
-	$im->add(mime_load('t/psgi_attach.eml'));
+	$im->add(eml_load('t/psgi_attach.eml'));
 	$im->done;
 
 	my $www = PublicInbox::WWW->new($config);
diff --git a/t/psgi_bad_mids.t b/t/psgi_bad_mids.t
index 43025a4d..81bd9356 100644
--- a/t/psgi_bad_mids.t
+++ b/t/psgi_bad_mids.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 my @mods = qw(DBD::SQLite HTTP::Request::Common Plack::Test
@@ -45,7 +45,7 @@ To: b\@example.com
 Date: Fri, 02 Oct 1993 00:00:0$i +0000
 
 
-	my $mime = PublicInbox::MIME->new(\$data);
+	my $mime = PublicInbox::Eml->new(\$data);
 	ok($im->add($mime), "added $mid");
 	$i++
 }
diff --git a/t/psgi_mount.t b/t/psgi_mount.t
index bd492dcb..b4de8274 100644
--- a/t/psgi_mount.t
+++ b/t/psgi_mount.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $maindir = "$tmpdir/main.git";
@@ -25,7 +25,7 @@ my $git = PublicInbox::Git->new($maindir);
 my $im = PublicInbox::Import->new($git, 'test', $addr);
 $im->init_bare;
 {
-	my $mime = PublicInbox::MIME->new(<<EOF);
+	my $mime = PublicInbox::Eml->new(<<EOF);
 From: Me <me\@example.com>
 To: You <you\@example.com>
 Cc: $addr
diff --git a/t/psgi_multipart_not.t b/t/psgi_multipart_not.t
index ef86c015..e36820f4 100644
--- a/t/psgi_multipart_not.t
+++ b/t/psgi_multipart_not.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 my @mods = qw(DBD::SQLite Search::Xapian HTTP::Request::Common
@@ -22,7 +22,7 @@ my $ibx = PublicInbox::Inbox->new({
 my $im = PublicInbox::V2Writable->new($ibx, 1);
 $im->{parallel} = 0;
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 Message-Id: <200308111450.h7BEoOu20077@mail.osdl.org>
 To: linux-kernel@vger.kernel.org
 Subject: [OSDL] linux-2.6.0-test3 reaim results
diff --git a/t/psgi_scan_all.t b/t/psgi_scan_all.t
index 93603a33..46eb489f 100644
--- a/t/psgi_scan_all.t
+++ b/t/psgi_scan_all.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
 my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape DBD::SQLite);
@@ -31,7 +31,7 @@ foreach my $i (1..2) {
 	my $im = PublicInbox::V2Writable->new($ibx, 1);
 	$im->{parallel} = 0;
 	$im->init_inbox(0);
-	my $mime = PublicInbox::MIME->new(<<EOF);
+	my $mime = PublicInbox::Eml->new(<<EOF);
 From: a\@example.com
 To: $addr
 Subject: s$i
diff --git a/t/psgi_search.t b/t/psgi_search.t
index 3c515d19..64f8b1ac 100644
--- a/t/psgi_search.t
+++ b/t/psgi_search.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::Inbox;
 use PublicInbox::InboxWritable;
@@ -27,7 +27,7 @@ my $im = $ibx->importer(0);
 my $digits = '10010260936330';
 my $ua = 'Pine.LNX.4.10';
 my $mid = "$ua.$digits.2460-100000\@penguin.transmeta.com";
-my $mime = PublicInbox::MIME->new(<<EOF);
+my $mime = PublicInbox::Eml->new(<<EOF);
 Subject: test
 Message-ID: <$mid>
 From: Ævar Arnfjörð Bjarmason <avarab\@example>
@@ -36,7 +36,7 @@ To: git\@vger.kernel.org
 EOF
 $im->add($mime);
 
-$mime = PublicInbox::MIME->new(<<'EOF');
+$mime = PublicInbox::Eml->new(<<'EOF');
 Subject:
 Message-ID: <blank-subject@example.com>
 From: blank subject <blank-subject@example.com>
@@ -45,7 +45,7 @@ To: git@vger.kernel.org
 EOF
 $im->add($mime);
 
-$mime = PublicInbox::MIME->new(<<'EOF');
+$mime = PublicInbox::Eml->new(<<'EOF');
 Message-ID: <no-subject-at-all@example.com>
 From: no subject at all <no-subject-at-all@example.com>
 To: git@vger.kernel.org
diff --git a/t/psgi_text.t b/t/psgi_text.t
index b7b5b2d4..833bcaba 100644
--- a/t/psgi_text.t
+++ b/t/psgi_text.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $maindir = "$tmpdir/main.git";
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 9c19b041..8f75a3fb 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -5,7 +5,7 @@ use warnings;
 use Test::More;
 use PublicInbox::TestCommon;
 require_git(2.6);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 use PublicInbox::MID qw(mids);
 require_mods(qw(DBD::SQLite Search::Xapian HTTP::Request::Common Plack::Test
@@ -26,7 +26,7 @@ my $new_mid;
 my $im = PublicInbox::V2Writable->new($ibx, 1);
 $im->{parallel} = 0;
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From oldbug-pre-a0c07cba0e5d8b6a Fri Oct  2 00:00:00 1993
 From: a@example.com
 To: test@example.com
@@ -225,7 +225,7 @@ test_psgi(sub { $www->call(@_) }, sub {
 
 	# ensure conflicted attachments can be resolved
 	foreach my $body (qw(old new)) {
-		$mime = mime_load "t/psgi_v2-$body.eml";
+		$mime = eml_load "t/psgi_v2-$body.eml";
 		ok($im->add($mime), "added attachment $body");
 	}
 	$im->done;
diff --git a/t/purge.t b/t/purge.t
index dcc44039..2ca9edca 100644
--- a/t/purge.t
+++ b/t/purge.t
@@ -36,7 +36,7 @@ local $ENV{PI_CONFIG} = $cfgfile;
 open my $cfg_fh, '>', $cfgfile or die "open: $!";
 
 my $v2w = PublicInbox::V2Writable->new($ibx, {nproc => 1});
-my $mime = PublicInbox::MIME->new($raw);
+my $mime = PublicInbox::Eml->new($raw);
 ok($v2w->add($mime), 'add message to be purged');
 $v2w->done;
 
diff --git a/t/replace.t b/t/replace.t
index 2efa25f1..cef4e7aa 100644
--- a/t/replace.t
+++ b/t/replace.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
 use PublicInbox::TestCommon;
 use Cwd qw(abs_path);
@@ -24,7 +24,7 @@ sub test_replace ($$$) {
 		indexlevel => $level,
 	});
 
-	my $orig = PublicInbox::MIME->new(<<'EOF');
+	my $orig = PublicInbox::Eml->new(<<'EOF');
 From: Barbra Streisand <effect@example.com>
 To: test@example.com
 Subject: confidential
@@ -49,7 +49,7 @@ EOF
 	my $thread_a = $ibx->over->get_thread('replace@example.com');
 
 	my %before = map {; delete($_->{blob}) => $_ } @{$ibx->recent};
-	my $reject = PublicInbox::MIME->new($orig->as_string);
+	my $reject = PublicInbox::Eml->new($orig->as_string);
 	foreach my $mid (['<replace@example.com>', '<extra@example.com>'],
 				[], ['<replaced@example.com>']) {
 		$reject->header_set('Message-ID', @$mid);
@@ -61,7 +61,7 @@ EOF
 
 	# prepare the replacement
 	my $expect = "Move along, nothing to see here\n";
-	my $repl = PublicInbox::MIME->new($orig->as_string);
+	my $repl = PublicInbox::Eml->new($orig->as_string);
 	$repl->header_set('From', '<redactor@example.com>');
 	$repl->header_set('Subject', 'redacted');
 	$repl->header_set('Date', 'Sat, 02 Oct 2010 00:00:00 +0000');
@@ -80,7 +80,7 @@ EOF
 	is($changed_epochs, 1, 'only one epoch changed');
 
 	$im->done;
-	my $m = PublicInbox::MIME->new($ibx->msg_by_mid('replace@example.com'));
+	my $m = PublicInbox::Eml->new($ibx->msg_by_mid('replace@example.com'));
 	is($m->body, $expect, 'replaced message');
 	is_deeply(\@warn, [], 'no warnings on noop');
 
@@ -159,7 +159,7 @@ sub pad_msgs {
 			($i, $irt) = each %$i;
 		}
 		my $sec = sprintf('%0d', $i);
-		my $mime = PublicInbox::MIME->new(<<EOF);
+		my $mime = PublicInbox::Eml->new(<<EOF);
 From: foo\@example.com
 To: test\@example.com
 Message-ID: <$i\@example.com>
diff --git a/t/reply.t b/t/reply.t
index a6c38cfa..53162df5 100644
--- a/t/reply.t
+++ b/t/reply.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use_ok 'PublicInbox::Reply';
 
 my @q = (
@@ -19,7 +19,7 @@ while (@q) {
 	is($res, $expect, "quote $input => $res");
 }
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: from <from@example.com>
 To: to <to@example.com>
 Cc: cc@example.com
diff --git a/t/search-thr-index.t b/t/search-thr-index.t
index 1bea59fd..914807a8 100644
--- a/t/search-thr-index.t
+++ b/t/search-thr-index.t
@@ -6,7 +6,7 @@ use bytes (); # only for bytes::length
 use Test::More;
 use PublicInbox::TestCommon;
 use PublicInbox::MID qw(mids);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_mods(qw(DBD::SQLite Search::Xapian));
 require PublicInbox::SearchIdx;
 require PublicInbox::Smsg;
@@ -42,7 +42,7 @@ my @mids;
 
 foreach (reverse split(/\n\n/, $data)) {
 	$_ .= "\n";
-	my $mime = PublicInbox::MIME->new(\$_);
+	my $mime = PublicInbox::Eml->new(\$_);
 	$mime->header_set('From' => 'bw@g');
 	$mime->header_set('To' => 'git@vger.kernel.org');
 	my $bytes = bytes::length($mime->as_string);
@@ -78,7 +78,7 @@ $rw->commit_txn_lazy;
 
 $xdb = $rw->begin_txn_lazy;
 {
-	my $mime = PublicInbox::MIME->new(<<'');
+	my $mime = PublicInbox::Eml->new(<<'');
 Subject: [RFC 00/14]
 Message-Id: <1-bw@g>
 From: bw@g
diff --git a/t/search.t b/t/search.t
index 83986837..6cd938dd 100644
--- a/t/search.t
+++ b/t/search.t
@@ -8,7 +8,7 @@ require_mods(qw(DBD::SQLite Search::Xapian));
 require PublicInbox::SearchIdx;
 require PublicInbox::Inbox;
 require PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $git_dir = "$tmpdir/a.git";
 my $ibx = PublicInbox::Inbox->new({ inboxdir => $git_dir });
@@ -60,7 +60,7 @@ sub oct_is ($$$) {
 }
 
 $ibx->with_umask(sub {
-	my $root = PublicInbox::MIME->new(<<'EOF');
+	my $root = PublicInbox::Eml->new(<<'EOF');
 Date: Fri, 02 Oct 1993 00:00:00 +0000
 Subject: Hello world
 Message-ID: <root@s>
@@ -69,7 +69,7 @@ To: list@example.com
 
 \m/
 EOF
-	my $last = PublicInbox::MIME->new(<<'EOF');
+	my $last = PublicInbox::Eml->new(<<'EOF');
 Date: Sat, 02 Oct 2010 00:00:00 +0000
 Subject: Re: Hello world
 In-Reply-To: <root@s>
@@ -126,7 +126,7 @@ sub filter_mids {
 $ibx->with_umask(sub {
 	$rw_commit->();
 	my $rmid = '<ghost-message@s>';
-	my $reply_to_ghost = PublicInbox::MIME->new(<<"EOF");
+	my $reply_to_ghost = PublicInbox::Eml->new(<<"EOF");
 Date: Sat, 02 Oct 2010 00:00:00 +0000
 Subject: Re: ghosts
 Message-ID: <ghost-reply\@s>
@@ -140,7 +140,7 @@ EOF
 	my $reply_id = $rw->add_message($reply_to_ghost);
 	is($reply_id, int($reply_id), "reply_id is an integer: $reply_id");
 
-	my $was_ghost = PublicInbox::MIME->new(<<"EOF");
+	my $was_ghost = PublicInbox::Eml->new(<<"EOF");
 Date: Sat, 02 Oct 2010 00:00:01 +0000
 Subject: ghosts
 Message-ID: $rmid
@@ -189,7 +189,7 @@ $ibx->with_umask(sub {
 	$rw_commit->();
 	$ro->reopen;
 	my $long_mid = 'last' . ('x' x 60). '@s';
-	my $long = PublicInbox::MIME->new(<<EOF);
+	my $long = PublicInbox::Eml->new(<<EOF);
 Date: Sat, 02 Oct 2010 00:00:00 +0000
 Subject: long message ID
 References: <root\@s> <last\@s>
@@ -209,7 +209,7 @@ EOF
 	my @res;
 
 	my $long_reply_mid = 'reply-to-long@1';
-	my $long_reply = PublicInbox::MIME->new(<<EOF);
+	my $long_reply = PublicInbox::Eml->new(<<EOF);
 Subject: I break references
 Date: Sat, 02 Oct 2010 00:00:00 +0000
 Message-ID: <$long_reply_mid>
@@ -233,7 +233,7 @@ EOF
 # quote prioritization
 $ibx->with_umask(sub {
 	$rw_commit->();
-	$rw->add_message(PublicInbox::MIME->new(<<'EOF'));
+	$rw->add_message(PublicInbox::Eml->new(<<'EOF'));
 Date: Sat, 02 Oct 2010 00:00:01 +0000
 Subject: Hello
 Message-ID: <quote@a>
@@ -243,7 +243,7 @@ To: list@example.com
 > theatre illusions
 fade
 EOF
-	$rw->add_message(PublicInbox::MIME->new(<<'EOF'));
+	$rw->add_message(PublicInbox::Eml->new(<<'EOF'));
 Date: Sat, 02 Oct 2010 00:00:02 +0000
 Subject: Hello
 Message-ID: <nquote@a>
@@ -267,7 +267,7 @@ EOF
 # circular references
 $ibx->with_umask(sub {
 	my $s = 'foo://'. ('Circle' x 15).'/foo';
-	my $doc_id = $rw->add_message(PublicInbox::MIME->new(<<EOF));
+	my $doc_id = $rw->add_message(PublicInbox::Eml->new(<<EOF));
 Subject: $s
 Date: Sat, 02 Oct 2010 00:00:01 +0000
 Message-ID: <circle\@a>
@@ -286,7 +286,7 @@ EOF
 });
 
 $ibx->with_umask(sub {
-	my $mime = mime_load 't/utf8.eml';
+	my $mime = eml_load 't/utf8.eml';
 	my $doc_id = $rw->add_message($mime);
 	ok($doc_id > 0, 'message indexed doc_id with UTF-8');
 	my $msg = $rw->query('m:testmessage@example.com', {limit => 1})->[0];
@@ -369,7 +369,7 @@ $ibx->with_umask(sub {
 }
 
 $ibx->with_umask(sub {
-	my $amsg = mime_load 't/search-amsg.eml';
+	my $amsg = eml_load 't/search-amsg.eml';
 	ok($rw->add_message($amsg), 'added attachment');
 	$rw_commit->();
 	$ro->reopen;
@@ -427,7 +427,7 @@ $ibx->with_umask(sub {
 	my $mid = "$ua.$digits.2460-100000\@penguin.transmeta.com";
 	is($ro->reopen->query("m:$digits", { mset => 1})->size, 0,
 		'no results yet');
-	my $pine = PublicInbox::MIME->new(<<EOF);
+	my $pine = PublicInbox::Eml->new(<<EOF);
 Subject: blah
 Message-ID: <$mid>
 From: torvalds\@transmeta
diff --git a/t/solver_git.t b/t/solver_git.t
index c483aba1..78cc0edd 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -15,7 +15,7 @@ chomp $git_dir;
 # needed for alternates, and --absolute-git-dir is only in git 2.13+
 $git_dir = abs_path($git_dir);
 
-use_ok "PublicInbox::$_" for (qw(Inbox V2Writable MIME Git SolverGit WWW));
+use_ok "PublicInbox::$_" for (qw(Inbox V2Writable Git SolverGit WWW));
 
 my ($inboxdir, $for_destroy) = tmpdir();
 my $opts = {
@@ -29,7 +29,7 @@ my $im = PublicInbox::V2Writable->new($ibx, 1);
 $im->{parallel} = 0;
 
 my $deliver_patch = sub ($) {
-	$im->add(mime_load($_[0]));
+	$im->add(eml_load($_[0]));
 	$im->done;
 };
 
diff --git a/t/spamcheck_spamc.t b/t/spamcheck_spamc.t
index edfacc62..2d9da631 100644
--- a/t/spamcheck_spamc.t
+++ b/t/spamcheck_spamc.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use Email::Simple;
+use PublicInbox::Eml;
 use IO::File;
 use Fcntl qw(:DEFAULT SEEK_SET);
 use PublicInbox::TestCommon;
@@ -28,19 +28,19 @@ Subject: test
 Message-ID: <testmessage@example.com>
 
 EOF
-ok($spamc->spamcheck(Email::Simple->new($src), \$dst), 'Email::Simple works');
+ok($spamc->spamcheck(PublicInbox::Eml->new($src), \$dst), 'PublicInbox::Eml works');
 is($dst, $src, 'input == output');
 
 $dst = '';
 $spamc->{checkcmd} = ['sh', '-c', 'cat; false'];
-ok(!$spamc->spamcheck(Email::Simple->new($src), \$dst), 'Failed check works');
+ok(!$spamc->spamcheck(PublicInbox::Eml->new($src), \$dst), 'Failed check works');
 is($dst, $src, 'input == output for spammy example');
 
 for my $l (qw(ham spam)) {
 	my $file = "$tmpdir/$l.out";
 	$spamc->{$l.'cmd'} = ['tee', $file ];
 	my $method = $l.'learn';
-	ok($spamc->$method(Email::Simple->new($src)), "$method OK");
+	ok($spamc->$method(PublicInbox::Eml->new($src)), "$method OK");
 	open my $fh, '<', $file or die "failed to open $file: $!";
 	is(eval { local $/, <$fh> }, $src, "$l command ran alright");
 }
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index d6545c6d..484ea443 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -3,8 +3,9 @@
 use strict;
 use warnings;
 use Test::More;
+use PublicInbox::TestCommon;
+require_mods 'Email::Simple';
 use_ok('PublicInbox::SearchThread');
-use Email::Simple;
 my $mt = eval {
 	require Mail::Thread;
 	no warnings 'once';
diff --git a/t/time.t b/t/time.t
index 71600b93..b491711d 100644
--- a/t/time.t
+++ b/t/time.t
@@ -4,9 +4,9 @@ use strict;
 use warnings;
 use Test::More;
 use POSIX qw(strftime);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::MsgTime qw(msg_datestamp);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: b@example.com
 Subject: this is a subject
diff --git a/t/v1-add-remove-add.t b/t/v1-add-remove-add.t
index 23f4fb11..2cd45f60 100644
--- a/t/v1-add-remove-add.t
+++ b/t/v1-add-remove-add.t
@@ -5,7 +5,7 @@ use warnings;
 use Test::More;
 use PublicInbox::Import;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_mods(qw(DBD::SQLite Search::Xapian));
 require PublicInbox::SearchIdx;
 my ($inboxdir, $for_destroy) = tmpdir();
@@ -15,7 +15,7 @@ my $ibx = {
 	-primary_address => 'test@example.com',
 };
 $ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/t/v1reindex.t b/t/v1reindex.t
index e473fe7c..13605f8b 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -6,7 +6,7 @@ use Test::More;
 use PublicInbox::ContentId qw(content_digest);
 use File::Path qw(remove_tree);
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_git(2.6);
 require_mods(qw(DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::SearchIdx';
@@ -18,7 +18,7 @@ my $ibx_config = {
 	-primary_address => 'test@example.com',
 	indexlevel => 'full',
 };
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/t/v2-add-remove-add.t b/t/v2-add-remove-add.t
index 60a869ee..cfdc8cf1 100644
--- a/t/v2-add-remove-add.t
+++ b/t/v2-add-remove-add.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::TestCommon;
 require_git(2.6);
 require_mods(qw(DBD::SQLite Search::Xapian));
@@ -16,7 +16,7 @@ my $ibx = {
 	-primary_address => 'test@example.com',
 };
 $ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/t/v2mda.t b/t/v2mda.t
index 4d3ec30d..36f43ff0 100644
--- a/t/v2mda.t
+++ b/t/v2mda.t
@@ -6,7 +6,7 @@ use Test::More;
 use Fcntl qw(SEEK_SET);
 use Cwd;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 require_git(2.6);
 
 my $V = 2;
@@ -18,7 +18,7 @@ my $ibx = {
 	name => 'test-v2writable',
 	address => [ 'test@example.com' ],
 };
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/t/v2mirror.t b/t/v2mirror.t
index ecf96891..d588808d 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -15,7 +15,7 @@ use IO::Socket;
 use POSIX qw(dup2);
 use_ok 'PublicInbox::V2Writable';
 use PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 # FIXME: too much setup
 my ($tmpdir, $for_destroy) = tmpdir();
@@ -38,7 +38,7 @@ $ibx->{version} = 2;
 my $v2w = PublicInbox::V2Writable->new($ibx, 1);
 ok $v2w, 'v2w loaded';
 $v2w->{parallel} = 0;
-my $mime = PublicInbox::MIME->new(<<'');
+my $mime = PublicInbox::Eml->new(<<'');
 From: Me <me@example.com>
 To: You <you@example.com>
 Subject: a
diff --git a/t/v2reindex.t b/t/v2reindex.t
index b97c6498..f16a0b0d 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::ContentId qw(content_digest);
 use File::Path qw(remove_tree);
 use PublicInbox::TestCommon;
@@ -24,7 +24,7 @@ my $agpl = do {
 	<$fh>;
 };
 my $phrase = q("defending all users' freedom");
-my $mime = PublicInbox::MIME->new(<<'EOF'.$agpl);
+my $mime = PublicInbox::Eml->new(<<'EOF'.$agpl);
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
@@ -434,7 +434,7 @@ ok(!-d $xap, 'Xapian directories removed again');
 	$config{indexlevel} = 'medium';
 	my $ibx = PublicInbox::Inbox->new(\%config);
 	my $im = PublicInbox::V2Writable->new($ibx);
-	my $m3 = PublicInbox::MIME->new(<<'EOF');
+	my $m3 = PublicInbox::Eml->new(<<'EOF');
 Date: Tue, 24 May 2016 14:34:22 -0700 (PDT)
 Message-Id: <20160524.143422.552507610109476444.d@example.com>
 To: t@example.com
@@ -465,7 +465,7 @@ Somehow we got a message with 3 sets of headers into one
 message, could've been something broken on the archiver side.
 EOF
 
-	my $m1 = PublicInbox::MIME->new(<<'EOF');
+	my $m1 = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: t@example.com
 Subject: [PATCH 12/13]
diff --git a/t/v2writable.t b/t/v2writable.t
index 07687052..e5a565ce 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::ContentId qw(content_digest content_id);
 use PublicInbox::TestCommon;
 use Cwd qw(abs_path);
@@ -20,7 +20,7 @@ my $ibx = {
 	-primary_address => 'test@example.com',
 };
 $ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
@@ -63,7 +63,7 @@ if ('ensure git configs are correct') {
 	@warn = ();
 	$mime->header_set('Message-Id', '<a-mid@b>', '<c@d>');
 	is($im->add($mime), undef, 'secondary MID ignored if first matches');
-	my $sec = PublicInbox::MIME->new($mime->as_string);
+	my $sec = PublicInbox::Eml->new($mime->as_string);
 	$sec->header_set('Date');
 	$sec->header_set('Message-Id', '<a-mid@b>', '<c@d>');
 	ok($im->add($sec), 'secondary MID used if data is different');
@@ -90,7 +90,7 @@ if ('ensure git configs are correct') {
 	my $hdr = $mime->header_obj;
 	my $gen = PublicInbox::Import::digest2mid(content_digest($mime), $hdr);
 	unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id');
-	my $fake = PublicInbox::MIME->new($mime->as_string);
+	my $fake = PublicInbox::Eml->new($mime->as_string);
 	$fake->header_set('Message-Id', "<$gen>");
 	ok($im->add($fake), 'fake added easily');
 	is_deeply(\@warn, [], 'no warnings from a faker');
diff --git a/t/watch_filter_rubylang.t b/t/watch_filter_rubylang.t
index 09217d94..2e7d402e 100644
--- a/t/watch_filter_rubylang.t
+++ b/t/watch_filter_rubylang.t
@@ -4,7 +4,7 @@ use strict;
 use warnings;
 use PublicInbox::TestCommon;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Config;
 require_mods(qw(Filesys::Notify::Simple DBD::SQLite Search::Xapian));
 use_ok 'PublicInbox::WatchMaildir';
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index c34d15f7..66955072 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -2,7 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use Cwd;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index dd5030ea..19a2da77 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -2,7 +2,7 @@
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
 use strict;
 use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use Cwd;
 use PublicInbox::Config;
 use PublicInbox::TestCommon;
diff --git a/t/www_altid.t b/t/www_altid.t
index a885c389..337303d9 100644
--- a/t/www_altid.t
+++ b/t/www_altid.t
@@ -26,7 +26,7 @@ if ('setup') {
 	my $ibx = PublicInbox::Inbox->new($opts);
 	$ibx = PublicInbox::InboxWritable->new($ibx, 1);
 	my $im = $ibx->importer(0);
-	my $mime = PublicInbox::MIME->new(<<'EOF');
+	my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 Message-Id: <a@example.com>
 
diff --git a/t/xcpdb-reshard.t b/t/xcpdb-reshard.t
index 0e1fea52..70012cc6 100644
--- a/t/xcpdb-reshard.t
+++ b/t/xcpdb-reshard.t
@@ -6,11 +6,11 @@ use Test::More;
 use PublicInbox::TestCommon;
 require_mods(qw(DBD::SQLite Search::Xapian));
 require_git('2.6');
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::InboxWritable;
 require PublicInbox::Search;
 
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
 From: a@example.com
 To: test@example.com
 Subject: this is a subject
diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t
index 4ebf5b2c..95d7c64b 100644
--- a/xt/msgtime_cmp.t
+++ b/xt/msgtime_cmp.t
@@ -4,7 +4,7 @@
 use strict;
 use Test::More;
 use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
 use PublicInbox::Inbox;
 use PublicInbox::Git;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
@@ -48,7 +48,7 @@ sub quiet_is_deeply ($$$$$) {
 sub compare {
 	my ($bref, $oid, $type, $size) = @_;
 	local $SIG{__WARN__} = sub { diag "$oid: ", @_ };
-	my $mime = PublicInbox::MIME->new($$bref);
+	my $mime = PublicInbox::Eml->new($$bref);
 	my $hdr = $mime->header_obj;
 	my @cur = msg_datestamp($hdr);
 	my @old = Old::msg_datestamp($hdr);
@@ -116,7 +116,7 @@ sub time_response ($) {
 }
 
 sub msg_received_at ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my @recvd = $hdr->header_raw('Received');
 	my ($ts);
 	foreach my $r (@recvd) {
@@ -131,7 +131,7 @@ sub msg_received_at ($) {
 }
 
 sub msg_date_only ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my @date = $hdr->header_raw('Date');
 	my ($ts);
 	foreach my $d (@date) {
@@ -149,7 +149,7 @@ sub msg_date_only ($) {
 
 # Favors Received header for sorting globally
 sub msg_timestamp ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my $ret;
 	$ret = msg_received_at($hdr) and return time_response($ret);
 	$ret = msg_date_only($hdr) and return time_response($ret);
@@ -158,7 +158,7 @@ sub msg_timestamp ($) {
 
 # Favors the Date: header for display and sorting within a thread
 sub msg_datestamp ($) {
-	my ($hdr) = @_; # Email::MIME::Header
+	my ($hdr) = @_; # PublicInbox::Eml
 	my $ret;
 	$ret = msg_date_only($hdr) and return time_response($ret);
 	$ret = msg_received_at($hdr) and return time_response($ret);
diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index a4445959..30fc07dc 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -38,7 +38,7 @@ my $obuf = '';
 my $m = 0;
 
 my $cb = sub {
-	$mime = PublicInbox::MIME->new(shift);
+	$mime = PublicInbox::Eml->new(shift);
 	PublicInbox::View::multipart_text_as_html($mime, $ctx);
 	++$m;
 	$obuf = '';

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 13/13] eml: drop trailing blank line on missing epilogue
  2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
                   ` (11 preceding siblings ...)
  2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
  12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
  To: meta

This improves Email::MIME compatibility when running
xt/cmp-msgview.t on some GPG-signed messages.

Its usefulness is dubious in the long term and this patch
may be reverted down the line.
---
 lib/PublicInbox/Eml.pm | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 1adaff04..4508bd84 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -131,8 +131,12 @@ sub mp_descend ($$) {
 	# Cut at the the first epilogue, not subsequent ones.
 	# *sigh* just the regexp match alone seems to bump RSS by
 	# length($$bdy) on a ~30M string:
-	$$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm and
+	my $epilogue_missing;
+	if ($$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm) {
 		substr($$bdy, pos($$bdy) - length($1)) = '';
+	} else {
+		$epilogue_missing = 1;
+	}
 
 	# *Sigh* split() doesn't work in-place and return CoW strings
 	# because Perl wants to "\0"-terminate strings.  So split()
@@ -150,6 +154,10 @@ sub mp_descend ($$) {
 
 	if (@parts) { # the usual path if we got this far:
 		undef $bdy; # release memory ASAP if $nr > 0
+
+		# compatibility with Email::MIME
+		$parts[-1] =~ s/\n\r?\n\z/\n/s if $epilogue_missing;
+
 		@parts = grep /[^ \t\r\n]/s, @parts; # ignore empty parts
 
 		# Keep "From: someone..." from preamble in old,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 11/13] xt: eml comparison tests
  2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
@ 2020-05-08  4:47   ` Eric Wong
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-08  4:47 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
>  xt/cmp-msgstr.t  | 108 +++++++++++++++++++++++++++++++++++++++++++++++
>  xt/cmp-msgview.t |  95 +++++++++++++++++++++++++++++++++++++++++

Btw, I run these in parallel on inboxes I have:

N=$(nproc)
find ~/v2/*/git/ -type d -name '*.git' -print0 | xargs -0 -P$N -n1 sh -c \
	'GIANT_INBOX_DIR=$1 perl -I lib -w xt/cmp-msgview.t' --
find ~/v1/ -type d -name '*.git' -print0 | xargs -0 -P$N -n1 sh -c \
	'GIANT_INBOX_DIR=$1 perl -I lib -w xt/cmp-msgstr.t' --

And the main differences I see are minor:

* trailing whitespace may still be different for broken messages
  missing epilogues (MIMEDefang, or some old gnus + GPG)

* trailing whitespace differences for header extraction
  (Eml strips all trailing spaces, not just LF/CRLF)

* empty parts of multipart messages are skipped for efficiency

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-08  4:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
2020-05-08  4:47   ` Eric Wong
2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).