* [PATCH 00/13] eml: pure-Perl replacement for Email::MIME
@ 2020-05-07 21:05 Eric Wong
2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
` (12 more replies)
0 siblings, 13 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Eric Wong (13):
msg_iter: make ->each_part method for PublicInbox::MIME
msg_iter: pass $idx as a scalar, not array
filter/rubylang: avoid recursing subparts to strip trailers
smsg: use capitalization for header retrieval
eml: pure-Perl replacement for Email::MIME
switch read-only Email::Simple users to Eml
replace most uses of PublicInbox::MIME with Eml
EmlContentFoo: Email::MIME::ContentType replacement
EmlContentFoo: relax Encode version requirement
eml: remove dependency on Email::MIME::Encodings
xt: eml comparison tests
remove most internal Email::MIME usage
eml: drop trailing blank line on missing epilogue
Documentation/mknews.perl | 4 +-
INSTALL | 26 +-
MANIFEST | 7 +
Makefile.PL | 7 +-
ci/deps.perl | 3 -
lib/PublicInbox/Admin.pm | 2 +-
lib/PublicInbox/Eml.pm | 421 +++++++++++++++++++++++++++++
lib/PublicInbox/EmlContentFoo.pm | 317 ++++++++++++++++++++++
lib/PublicInbox/Filter/RubyLang.pm | 32 ++-
lib/PublicInbox/Filter/Vger.pm | 4 +-
lib/PublicInbox/Import.pm | 11 +-
lib/PublicInbox/Inbox.pm | 4 +-
lib/PublicInbox/InboxWritable.pm | 4 +-
lib/PublicInbox/MDA.pm | 1 -
lib/PublicInbox/MIME.pm | 6 +
lib/PublicInbox/Mbox.pm | 16 +-
lib/PublicInbox/MboxGz.pm | 4 +-
lib/PublicInbox/MsgIter.pm | 21 +-
lib/PublicInbox/MsgTime.pm | 8 +-
lib/PublicInbox/NNTP.pm | 19 +-
lib/PublicInbox/SearchIdx.pm | 8 +-
lib/PublicInbox/SearchIdxShard.pm | 3 +-
lib/PublicInbox/Smsg.pm | 24 +-
lib/PublicInbox/SolverGit.pm | 4 +-
lib/PublicInbox/TestCommon.pm | 11 +-
lib/PublicInbox/V2Writable.pm | 17 +-
lib/PublicInbox/View.pm | 28 +-
lib/PublicInbox/WWW.pm | 8 +-
lib/PublicInbox/WatchMaildir.pm | 4 +-
lib/PublicInbox/WwwAttach.pm | 15 +-
script/public-inbox-edit | 8 +-
script/public-inbox-learn | 4 +-
script/public-inbox-mda | 16 +-
script/public-inbox-purge | 4 +-
t/altid.t | 4 +-
t/altid_v2.t | 4 +-
t/cgi.t | 8 +-
t/content_id.t | 6 +-
t/convert-compact.t | 4 +-
t/edit.t | 20 +-
t/eml.t | 363 +++++++++++++++++++++++++
t/eml_content_disposition.t | 102 +++++++
t/eml_content_type.t | 289 ++++++++++++++++++++
t/feed.t | 6 +-
t/filter_base.t | 4 +-
t/filter_mirror.t | 2 +-
t/filter_rubylang.t | 8 +-
t/filter_subjecttag.t | 4 +-
t/filter_vger.t | 6 +-
t/html_index.t | 4 +-
t/httpd.t | 4 +-
t/import.t | 6 +-
t/indexlevels-mirror.t | 4 +-
t/mda.t | 4 +-
t/mda_filter_rubylang.t | 2 +-
t/mid.t | 4 +-
t/mime.t | 82 +++---
t/msg_iter.t | 10 +-
t/msgtime.t | 6 +-
t/multi-mid.t | 6 +-
t/nntp.t | 4 +-
t/nntpd-tls.t | 4 +-
t/nntpd.t | 6 +-
t/nulsubject.t | 2 +-
t/plack.t | 10 +-
t/precheck.t | 10 +-
t/psgi_attach.t | 2 +-
t/psgi_bad_mids.t | 4 +-
t/psgi_mount.t | 4 +-
t/psgi_multipart_not.t | 4 +-
t/psgi_scan_all.t | 4 +-
t/psgi_search.t | 8 +-
t/psgi_text.t | 2 +-
t/psgi_v2.t | 6 +-
t/purge.t | 2 +-
t/replace.t | 12 +-
t/reply.t | 4 +-
t/search-thr-index.t | 6 +-
t/search.t | 26 +-
t/solver_git.t | 4 +-
t/spamcheck_spamc.t | 8 +-
t/thread-cycle.t | 3 +-
t/time.t | 4 +-
t/v1-add-remove-add.t | 4 +-
t/v1reindex.t | 4 +-
t/v2-add-remove-add.t | 4 +-
t/v2mda.t | 4 +-
t/v2mirror.t | 4 +-
t/v2reindex.t | 8 +-
t/v2writable.t | 8 +-
t/watch_filter_rubylang.t | 2 +-
t/watch_maildir.t | 2 +-
t/watch_maildir_v2.t | 2 +-
t/www_altid.t | 2 +-
t/xcpdb-reshard.t | 4 +-
xt/cmp-msgstr.t | 108 ++++++++
xt/cmp-msgview.t | 95 +++++++
xt/msgtime_cmp.t | 12 +-
xt/perf-msgview.t | 2 +-
99 files changed, 2084 insertions(+), 353 deletions(-)
create mode 100644 lib/PublicInbox/Eml.pm
create mode 100644 lib/PublicInbox/EmlContentFoo.pm
create mode 100644 t/eml.t
create mode 100644 t/eml_content_disposition.t
create mode 100644 t/eml_content_type.t
create mode 100644 xt/cmp-msgstr.t
create mode 100644 xt/cmp-msgview.t
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
` (11 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
The reliance on Email::MIME->subparts is a tad inefficient with
a work-in-progress module to replace Email::MIME. So move
towards using ->each_part as a class-specific iterator which can
take advantage of more class-specific optimizations in the
yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime
classes.
The msg_iter() sub remains for compatibility with existing
3rd-party scripts/modules which use our small public Perl API
and Email::MIME.
---
lib/PublicInbox/MIME.pm | 3 +++
lib/PublicInbox/MsgIter.pm | 15 +++++++++++++--
lib/PublicInbox/SolverGit.pm | 4 ++--
lib/PublicInbox/View.pm | 10 +++++-----
lib/PublicInbox/WwwAttach.pm | 4 ++--
5 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm
index 456eed64..b795b93b 100644
--- a/lib/PublicInbox/MIME.pm
+++ b/lib/PublicInbox/MIME.pm
@@ -24,6 +24,7 @@ use strict;
use warnings;
use base qw(Email::MIME);
use Email::MIME::ContentType;
+use PublicInbox::MsgIter ();
$Email::MIME::ContentType::STRICT_PARAMS = 0;
if ($Email::MIME::VERSION <= 1.937) {
@@ -101,4 +102,6 @@ sub parts_multipart {
}
}
+no warnings 'once';
+*each_part = \&PublicInbox::MsgIter::em_each_part;
1;
diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index fa25564a..cd5a5d99 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -7,12 +7,12 @@ use strict;
use warnings;
use base qw(Exporter);
our @EXPORT = qw(msg_iter msg_part_text);
-use PublicInbox::MIME;
+# This becomes PublicInbox::MIME->each_part:
# Like Email::MIME::walk_parts, but this is:
# * non-recursive
# * passes depth and indices to the iterator callback
-sub msg_iter ($$;$$) {
+sub em_each_part ($$;$$) {
my ($mime, $cb, $cb_arg, $do_undef) = @_;
my @parts = $mime->subparts;
if (@parts) {
@@ -36,6 +36,17 @@ sub msg_iter ($$;$$) {
}
}
+# Use this when we may accept Email::MIME from user scripts
+# (not just PublicInbox::MIME)
+sub msg_iter ($$;$$) { # $_[0] = PublicInbox::MIME/Email::MIME-like obj
+ my (undef, $cb, $cb_arg, $once) = @_;
+ if (my $ep = $_[0]->can('each_part')) { # PublicInbox::{MIME,*}
+ $ep->($_[0], $cb, $cb_arg, $once);
+ } else { # for compatibility with existing Email::MIME users:
+ em_each_part($_[0], $cb, $cb_arg, $once);
+ }
+}
+
sub msg_part_text ($$) {
my ($part, $ct) = @_;
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index c32a5bae..f718e28c 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -14,7 +14,7 @@ use 5.010_001;
use File::Temp 0.19 (); # 0.19 for ->newdir
use Fcntl qw(SEEK_SET);
use PublicInbox::Git qw(git_unquote git_quote);
-use PublicInbox::MsgIter qw(msg_iter msg_part_text);
+use PublicInbox::MsgIter qw(msg_part_text);
use PublicInbox::Qspawn;
use PublicInbox::Tmpfile;
use URI::Escape qw(uri_escape_utf8);
@@ -234,7 +234,7 @@ sub find_extract_diffs ($$$) {
my $diffs = [];
foreach my $smsg (@$msgs) {
$ibx->smsg_mime($smsg) or next;
- msg_iter(delete $smsg->{mime}, \&extract_diff,
+ delete($smsg->{mime})->each_part(\&extract_diff,
[$self, $diffs, $pre, $post, $ibx, $smsg], 1);
}
@$diffs ? $diffs : undef;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index f7a8ae32..e42fb362 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -243,7 +243,7 @@ sub index_entry {
# scan through all parts, looking for displayable text
$ctx->{mhref} = $mhref;
$ctx->{obuf} = \$rv;
- msg_iter($mime, \&add_text_body, $ctx, 1);
+ $mime->each_part(\&add_text_body, $ctx, 1);
delete $ctx->{obuf};
# add the footer
@@ -474,10 +474,10 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
}
sub multipart_text_as_html {
- # ($mime, $ctx) = @_; # msg_iter will do "$_[0] = undef"
+ # ($mime, $ctx) = @_; # each_part may do "$_[0] = undef"
# scan through all parts, looking for displayable text
- msg_iter($_[0], \&add_text_body, $_[1], 1);
+ $_[0]->each_part(\&add_text_body, $_[1], 1);
}
sub attach_link ($$$$;$) {
@@ -515,11 +515,11 @@ EOF
undef;
}
-sub add_text_body { # callback for msg_iter
+sub add_text_body { # callback for each_part
my ($p, $ctx) = @_;
my $upfx = $ctx->{mhref};
my $ibx = $ctx->{-inbox};
- # $p - from msg_iter: [ Email::MIME, depth, @idx ]
+ # $p - from each_part: [ Email::MIME-like, depth, @idx ]
my ($part, $depth, @idx) = @$p;
my $ct = $part->content_type || 'text/plain';
my $fn = $part->filename;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index f795618e..774b38ae 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -10,7 +10,7 @@ use Email::MIME::ContentType qw(parse_content_type);
use PublicInbox::MIME;
use PublicInbox::MsgIter;
-sub get_attach_i { # msg_iter callback
+sub get_attach_i { # ->each_part callback
my ($part, $depth, @idx) = @{$_[0]};
my $res = $_[1];
return if join('.', @idx) ne $res->[3]; # $idx
@@ -40,7 +40,7 @@ sub get_attach ($$$) {
my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
$mime = PublicInbox::MIME->new($mime);
$res->[3] = $idx;
- msg_iter($mime, \&get_attach_i, $res, 1);
+ $mime->each_part(\&get_attach_i, $res, 1);
pop @$res; # cleanup before letting PSGI server see it
$res
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 02/13] msg_iter: pass $idx as a scalar, not array
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
` (10 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
This doesn't make any difference for most multipart
messages (or any single part messages). However,
this starts having space savings when parts start
nesting.
It also slightly simplifies callers.
---
lib/PublicInbox/MsgIter.pm | 6 ++++--
lib/PublicInbox/SearchIdx.pm | 2 +-
lib/PublicInbox/View.pm | 18 +++++++++---------
lib/PublicInbox/WwwAttach.pm | 4 ++--
t/mime.t | 5 +++--
t/msg_iter.t | 2 +-
6 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index cd5a5d99..7c28d019 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -20,12 +20,14 @@ sub em_each_part ($$;$$) {
my $i = 0;
@parts = map { [ $_, 1, ++$i ] } @parts;
while (my $p = shift @parts) {
- my ($part, $depth, @idx) = @$p;
+ my ($part, $depth, $idx) = @$p;
my @sub = $part->subparts;
if (@sub) {
$depth++;
$i = 0;
- @sub = map { [ $_, $depth, @idx, ++$i ] } @sub;
+ @sub = map {
+ [ $_, $depth, "$idx.".(++$i) ]
+ } @sub;
@parts = (@sub, @parts);
} else {
$cb->($p, $cb_arg);
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 25118f43..a7e31b71 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -277,7 +277,7 @@ sub index_diff ($$$) {
}
sub index_xapian { # msg_iter callback
- my $part = $_[0]->[0]; # ignore $depth and @idx
+ my $part = $_[0]->[0]; # ignore $depth and $idx
my ($self, $doc) = @{$_[1]};
my $ct = $part->content_type || 'text/plain';
my $fn = $part->filename;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index e42fb362..3328c865 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -482,9 +482,8 @@ sub multipart_text_as_html {
sub attach_link ($$$$;$) {
my ($ctx, $ct, $p, $fn, $err) = @_;
- my ($part, $depth, @idx) = @$p;
- my $nl = $idx[-1] > 1 ? "\n" : '';
- my $idx = join('.', @idx);
+ my ($part, $depth, $idx) = @$p;
+ my $nl = substr($idx, -2) eq '.1' ? '' : "\n"; # like join("\n", ...)
my $size = bytes::length($part->body);
# hide attributes normally, unless we want to aid users in
@@ -519,8 +518,8 @@ sub add_text_body { # callback for each_part
my ($p, $ctx) = @_;
my $upfx = $ctx->{mhref};
my $ibx = $ctx->{-inbox};
- # $p - from each_part: [ Email::MIME-like, depth, @idx ]
- my ($part, $depth, @idx) = @$p;
+ # $p - from each_part: [ Email::MIME-like, depth, $idx ]
+ my ($part, $depth, $idx) = @$p;
my $ct = $part->content_type || 'text/plain';
my $fn = $part->filename;
my ($s, $err) = msg_part_text($part, $ct);
@@ -537,13 +536,14 @@ sub add_text_body { # callback for each_part
# headers for solver unless some coderepo are configured:
my $diff;
if ($s =~ /^--- [^\n]+\n\+{3} [^\n]+\n@@ /ms) {
- # diffstat anchors do not link across attachments or messages:
- $idx[0] = $upfx . $idx[0] if $upfx ne '';
- $ctx->{-apfx} = join('/', @idx);
+ # diffstat anchors do not link across attachments or messages,
+ # -apfx is just a stable prefix for making diffstat anchors
+ # linkable to the first diff hunk w/o crossing attachments
+ $idx =~ tr!.!/!; # compatibility with previous versions
+ $ctx->{-apfx} = $upfx . $idx;
# do attr => filename mappings for diffstats in git diffs:
$ctx->{-anchors} = {} if $s =~ /^diff --git /sm;
-
$diff = 1;
delete $ctx->{-long_path};
my $spfx;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index 774b38ae..b1009907 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -11,9 +11,9 @@ use PublicInbox::MIME;
use PublicInbox::MsgIter;
sub get_attach_i { # ->each_part callback
- my ($part, $depth, @idx) = @{$_[0]};
+ my ($part, $depth, $idx) = @{$_[0]};
my $res = $_[1];
- return if join('.', @idx) ne $res->[3]; # $idx
+ return if $idx ne $res->[3]; # [0-9]+(?:\.[0-9]+)+
$res->[0] = 200;
my $ct = $part->content_type;
$ct = parse_content_type($ct) if $ct;
diff --git a/t/mime.t b/t/mime.t
index 0d478ace..b9a4d66b 100644
--- a/t/mime.t
+++ b/t/mime.t
@@ -98,9 +98,10 @@ $msg = PublicInbox::MIME->new($raw);
my $nr = 0;
msg_iter($msg, sub {
my ($part, $level, @ex) = @{$_[0]};
- if ($ex[0] == 1) {
+ is($level, 1, 'at expected level');
+ if (join('fail if $#ex > 0', @ex) eq '1') {
is($part->body_str, "your tree directly? \r\n", 'body OK');
- } elsif ($ex[0] == 2) {
+ } elsif (join('fail if $#ex > 0', @ex) eq '2') {
is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
"=7wIb\n" .
"-----END PGP SIGNATURE-----\n",
diff --git a/t/msg_iter.t b/t/msg_iter.t
index 5c57e043..e8115e25 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -28,7 +28,7 @@ use_ok('PublicInbox::MsgIter');
$s =~ s/\s+//s;
push @parts, [ $s, $level, @ex ];
});
- is_deeply(\@parts, [ [qw(a 2 1 1)], [qw(b 2 1 2)], [qw(sig 1 2)] ],
+ is_deeply(\@parts, [ [qw(a 2 1.1)], [qw(b 2 1.2)], [qw(sig 1 2)] ],
'nested part shows up properly');
}
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
` (9 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Mailman only seems to add trailers (or signatures) as
attachments at the top-level of MIME messages. So don't bother
recursing with ->walk_parts since ->walk_parts is non-trivial to
recreate in the Email::MIME replacement I'm working on.
---
lib/PublicInbox/Filter/RubyLang.pm | 32 ++++++++++++++++++++----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/lib/PublicInbox/Filter/RubyLang.pm b/lib/PublicInbox/Filter/RubyLang.pm
index a65a5971..06e4ea75 100644
--- a/lib/PublicInbox/Filter/RubyLang.pm
+++ b/lib/PublicInbox/Filter/RubyLang.pm
@@ -28,19 +28,29 @@ sub new {
$self;
}
+sub scrub_part ($) {
+ my ($part) = @_;
+ my $ct = $part->content_type;
+ if (!$ct || $ct =~ m{\btext/plain\b}i) {
+ my $s = eval { $part->body_str };
+ if (defined $s && $s =~ s/\n?$l1\n$l2\n\z//os) {
+ $part->body_str_set($s);
+ return 1;
+ }
+ }
+ 0;
+}
+
sub scrub {
my ($self, $mime, $for_remove) = @_;
- # no msg_iter here, that is only for read-only access
- $mime->walk_parts(sub {
- my ($part) = $_[0];
- my $ct = $part->content_type;
- if (!$ct || $ct =~ m{\btext/plain\b}i) {
- my $s = eval { $part->body_str };
- if (defined $s && $s =~ s/\n?$l1\n$l2\n\z//os) {
- $part->body_str_set($s);
- }
- }
- });
+ # no msg_iter here, msg_iter is only for read-only access
+ if (my @sub = $mime->subparts) {
+ my $changed = 0;
+ $changed |= scrub_part($_) for @sub;
+ $mime->parts_set(\@sub) if $changed;
+ } else {
+ scrub_part($mime);
+ }
my $altid = $self->{-altid};
if ($altid && !$for_remove) {
my $hdr = $mime->header_obj;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 04/13] smsg: use capitalization for header retrieval
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (2 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (8 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
PublicInbox::Eml will have case-sensitive memoization to
avoid the need to call `lc' to retrieve common headers,
so ensure we call $mime->header() with the common
capitalization.
Unfortunately, we need to continue using lowercase for field
names for smsg, since NNTP requires case-insensitivity when
matching headers and method dispatch is expensive.
---
lib/PublicInbox/Smsg.pm | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 7c90b92d..7a2766d8 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -106,20 +106,18 @@ sub lines ($) { $_[0]->{lines} }
sub __hdr ($$) {
my ($self, $field) = @_;
- my $val = $self->{$field};
- return $val if defined $val;
-
- my $mime = $self->{mime} or return;
- my @raw = $mime->header($field);
- $val = join(', ', @raw);
- $val =~ tr/\t\n/ /;
- $val =~ tr/\r//d;
- $self->{$field} = $val;
+ $self->{lc($field)} //= do {
+ my $mime = $self->{mime} or return;
+ my $val = join(', ', $mime->header($field));
+ $val =~ tr/\r//d;
+ $val =~ tr/\t\n/ /;
+ $val;
+ };
}
-sub subject ($) { __hdr($_[0], 'subject') }
-sub to ($) { __hdr($_[0], 'to') }
-sub cc ($) { __hdr($_[0], 'cc') }
+sub subject ($) { __hdr($_[0], 'Subject') }
+sub to ($) { __hdr($_[0], 'To') }
+sub cc ($) { __hdr($_[0], 'Cc') }
# no strftime, that is locale-dependent and not for RFC822
my @DoW = qw(Sun Mon Tue Wed Thu Fri Sat);
@@ -137,7 +135,7 @@ sub date ($) {
sub from ($) {
my ($self) = @_;
- my $from = __hdr($self, 'from');
+ my $from = __hdr($self, 'From');
if (defined $from && !defined $self->{from_name}) {
my @n = PublicInbox::Address::names($from);
$self->{from_name} = join(', ', @n);
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 05/13] eml: pure-Perl replacement for Email::MIME
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (3 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
` (7 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Email::MIME eats memory, wastes time parsing out all the
headers, and some problems can't be fixed without breaking
compatibility for other projects which depend on it.
Informal benchmarks show a ~2x improvement in general
stats gathering scripts and ~10% improvement in HTML
view rendering.
We also don't need the ability to create MIME messages, just
parse them and maybe drop an attachment.
While this isn't the zero-copy or streaming MIME parser of my
dreams; it's still an improvement in that it doesn't keep a
scalar copy of the raw body around along with subparts. It also
doesn't parse subparts up front, so it can also replace our uses
of Email::Simple.
---
MANIFEST | 2 +
lib/PublicInbox/Eml.pm | 393 ++++++++++++++++++++++++++++++++++
lib/PublicInbox/TestCommon.pm | 9 +-
t/eml.t | 363 +++++++++++++++++++++++++++++++
4 files changed, 766 insertions(+), 1 deletion(-)
create mode 100644 lib/PublicInbox/Eml.pm
create mode 100644 t/eml.t
diff --git a/MANIFEST b/MANIFEST
index 90a05d33..0906448e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -105,6 +105,7 @@ lib/PublicInbox/DSKQXS.pm
lib/PublicInbox/DSPoll.pm
lib/PublicInbox/Daemon.pm
lib/PublicInbox/Emergency.pm
+lib/PublicInbox/Eml.pm
lib/PublicInbox/ExtMsg.pm
lib/PublicInbox/Feed.pm
lib/PublicInbox/Filter/Base.pm
@@ -229,6 +230,7 @@ t/ds-leak.t
t/ds-poll.t
t/edit.t
t/emergency.t
+t/eml.t
t/epoll.t
t/fail-bin/spamc
t/feed.t
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
new file mode 100644
index 00000000..0c23bed0
--- /dev/null
+++ b/lib/PublicInbox/Eml.pm
@@ -0,0 +1,393 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+#
+# Lazy MIME parser, it still slurps the full message but keeps short
+# lifetimes. Unlike Email::MIME, it doesn't pre-split multipart
+# messages or do any up-front parsing of headers besides splitting
+# the header string from the body.
+#
+# Contains ideas and code from Email::Simple and Email::MIME
+# (Perl Artistic License, GPL-1+)
+#
+# This aims to replace Email::MIME for our purposes, similar API
+# but internal field names are differ if they're not 100%-compatible.
+#
+# Includes some proposed fixes for Email::MIME:
+# - header-less sub parts - https://github.com/rjbs/Email-MIME/issues/14
+# - "0" as boundary - https://github.com/rjbs/Email-MIME/issues/63
+#
+# $self = {
+# bdy => scalar ref for body (may be undef),
+# hdr => scalar ref for header,
+# crlf => "\n" or "\r\n" (scalar, not a ref),
+#
+# # filled in during ->each_part
+# ct => hash ref returned by parse_content_type
+# }
+package PublicInbox::Eml;
+use strict;
+use v5.10.1;
+use Carp qw(croak);
+use Encode qw(find_encoding decode encode); # stdlib
+use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
+
+my $MIME_Header = find_encoding('MIME-Header');
+
+# TODO remove these dependencies
+use Email::MIME::ContentType;
+use Email::MIME::Encodings;
+$Email::MIME::ContentType::STRICT_PARAMS = 0;
+
+our $MAXPARTS = 1000; # same as SpamAssassin
+our $MAXDEPTH = 20; # seems enough, Perl sucks, here
+our $MAXBOUNDLEN = 2048; # same as postfix
+
+my $NO_ENCODE_RE = qr/\A(?:7bit|8bit|binary)[ \t]*(?:;|$)?/i;
+my %DECODE_ADDRESS = map { $_ => 1 } qw(From To Cc Sender Reply-To);
+my %DECODE_FULL = (
+ Subject => 1,
+ 'Content-Description' => 1,
+ 'Content-Type' => 1, # not correct, but needed, oh well
+);
+our %STR_TYPE = (text => 1);
+our %STR_SUBTYPE = (plain => 1, html => 1);
+
+my %re_memo;
+sub re_memo ($) {
+ my ($k) = @_;
+ # Do not normalize $k with lc/uc; instead strive to keep
+ # capitalization in our codebase consistent.
+ $re_memo{$k} ||= qr/^\Q$k\E:[ \t]*([^\n]*\r?\n # 1st line
+ # continuation lines:
+ (?:[^:\n]*?[ \t]+[^\n]*\r?\n)*)
+ /ismx
+}
+
+# compatible with our uses of Email::MIME
+sub new {
+ my $ref = ref($_[1]) ? $_[1] : \(my $cpy = $_[1]);
+ if ($$ref =~ /(?:\r?\n(\r?\n))/gs) { # likely
+ # This can modify $$ref in-place and to avoid memcpy/memmove
+ # on a potentially large $$ref. It does need to make a
+ # copy for $hdr, though. Idea stolen from Email::Simple
+ my $hdr = substr($$ref, 0, pos($$ref), ''); # sv_chop on $$ref
+ substr($hdr, -(length($1))) = ''; # lower SvCUR
+ bless { hdr => \$hdr, crlf => $1, bdy => $ref }, __PACKAGE__;
+ } elsif ($$ref =~ /^[a-z0-9-]+[ \t]*:/ims && $$ref =~ /(\r?\n)\z/s) {
+ # body is optional :P
+ bless { hdr => \($$ref), crlf => $1 }, __PACKAGE__;
+ } else { # nothing useful
+ my $hdr = $$ref = '';
+ bless { hdr => \$hdr, crlf => "\n" }, __PACKAGE__;
+ }
+}
+
+sub new_sub {
+ my (undef, $ref) = @_;
+ # special case for messages like <85k5su9k59.fsf_-_@lola.goethe.zz>
+ $$ref =~ /\A(?:(\r?\n))/gs or goto &new;
+ my $hdr = substr($$ref, 0, pos($$ref), ''); # sv_chop on $$ref
+ bless { hdr => \$hdr, crlf => $1, bdy => $ref }, __PACKAGE__;
+}
+
+# same output as Email::Simple::Header::header_raw, but we extract
+# headers on-demand instead of parsing them into a list which
+# requires O(n) lookups anyways
+sub header_raw {
+ my $re = re_memo($_[1]);
+ my @v = (${ $_[0]->{hdr} } =~ /$re/g);
+ for (@v) {
+ # for compatibility w/ Email::Simple::Header,
+ s/\s+\z//s;
+ s/\A\s+//s;
+ s/\r?\n[ \t]*/ /gs;
+ }
+ wantarray ? @v : $v[0];
+}
+
+# pick the first Content-Type header to match Email::MIME behavior.
+# It's usually the right one based on historical archives.
+sub ct ($) {
+ # Email::MIME::ContentType::content_type:
+ $_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
+}
+
+sub body_decode ($$) {
+ my $cte = header_raw($_[0], 'Content-Transfer-Encoding');
+ ($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) if $cte; # For S/MIME, etc
+ (!$cte || $cte =~ $NO_ENCODE_RE) ?
+ $_[1] : Email::MIME::Encodings::decode($cte, $_[1], '7bit');
+}
+
+# returns a queue of sub-parts iff it's worth descending into
+# TODO: descend into message/rfc822 parts (Email::MIME didn't)
+sub mp_descend ($$) {
+ my ($self, $nr) = @_; # or $once for top-level
+ my $bnd = ct($self)->{attributes}->{boundary} // return; # single-part
+ return if $bnd eq '' || length($bnd) >= $MAXBOUNDLEN;
+ $bnd = quotemeta($bnd);
+
+ # "multipart" messages can exist w/o a body
+ my $bdy = ($nr ? delete($self->{bdy}) : \(body_raw($self))) or return;
+
+ # Cut at the the first epilogue, not subsequent ones.
+ # *sigh* just the regexp match alone seems to bump RSS by
+ # length($$bdy) on a ~30M string:
+ $$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm and
+ substr($$bdy, pos($$bdy) - length($1)) = '';
+
+ # *Sigh* split() doesn't work in-place and return CoW strings
+ # because Perl wants to "\0"-terminate strings. So split()
+ # again bumps RSS by length($$bdy)
+
+ # Quiet warning for "Complex regular subexpression recursion limit"
+ # in case we get many empty parts, it's harmless in this case
+ no warnings 'regexp';
+ my ($pre, @parts) = split(/(?:\r?\n)?(?:^--$bnd[ \t]*\r?\n)+/ms,
+ $$bdy,
+ # + 3 since we don't want the last part
+ # processed to include any other excluded
+ # parts ($nr starts at 1, and I suck at math)
+ $MAXPARTS + 3 - $nr);
+
+ if (@parts) { # the usual path if we got this far:
+ undef $bdy; # release memory ASAP if $nr > 0
+ @parts = grep /[^ \t\r\n]/s, @parts; # ignore empty parts
+
+ # Keep "From: someone..." from preamble in old,
+ # buggy versions of git-send-email, otherwise drop it
+ # There's also a case where quoted text showed up in the
+ # preamble
+ # <20060515162817.65F0F1BBAE@citi.umich.edu>
+ unshift(@parts, $pre) if $pre =~ /:/s;
+ return \@parts;
+ }
+ # "multipart", but no boundary found, treat as single part
+ $self->{bdy} //= $bdy;
+ undef;
+}
+
+# $p = [ \@parts, $depth, $idx ]
+# $idx[0] grows as $depth grows, $idx[1] == $p->[-1] == current part
+# (callers need to be updated)
+# \@parts is a queue which empties when we're done with a parent part
+
+# same usage as PublicInbox::MsgIter::msg_iter
+# $cb - user-supplied callback sub
+# $arg - user-supplied arg (think pthread_create)
+# $once - unref body scalar during iteration
+sub each_part {
+ my ($self, $cb, $arg, $once) = @_;
+ my $p = mp_descend($self, $once // 0) or
+ return $cb->([$self, 0, 0], $arg);
+ $p = [ $p, 0 ];
+ my @s; # our virtual stack
+ my $nr = 0;
+ while ((scalar(@{$p->[0]}) || ($p = pop @s)) && ++$nr <= $MAXPARTS) {
+ ++$p->[-1]; # bump index
+ my (undef, @idx) = @$p;
+ @idx = (join('.', @idx));
+ my $depth = ($idx[0] =~ tr/././) + 1;
+ my $sub = new_sub(undef, \(shift @{$p->[0]}));
+ if ($depth < $MAXDEPTH && (my $nxt = mp_descend($sub, $nr))) {
+ push(@s, $p) if scalar @{$p->[0]};
+ $p = [ $nxt, @idx, 0 ];
+ } else { # a leaf node
+ $cb->([$sub, $depth, @idx], $arg);
+ }
+ }
+}
+
+########### compatibility section for existing Email::MIME uses #########
+
+sub header_obj {
+ bless { hdr => $_[0]->{hdr}, crlf => $_[0]->{crlf} }, __PACKAGE__;
+}
+
+sub subparts {
+ my ($self) = @_;
+ my $parts = mp_descend($self, 0) or return ();
+ my $bnd = ct($self)->{attributes}->{boundary} // die 'BUG: no boundary';
+ my $bdy = $self->{bdy};
+ if ($$bdy =~ /\A(.*?)(?:\r?\n)?^--\Q$bnd\E[ \t]*\r?$/sm) {
+ $self->{preamble} = $1;
+ }
+ if ($$bdy =~ /^--\Q$bnd\E--[ \t]*\r?\n(.+)\z/sm) {
+ $self->{epilogue} = $1;
+ }
+ map { new_sub(undef, \$_) } @$parts;
+}
+
+sub parts_set {
+ my ($self, $parts) = @_;
+
+ # we can't fully support what Email::MIME does,
+ # just what our filter code needs:
+ my $bnd = ct($self)->{attributes}->{boundary} // die <<EOF;
+->parts_set not supported for single-part messages
+EOF
+ my $crlf = $self->{crlf};
+ my $fin_bnd = "$crlf--$bnd--$crlf";
+ $bnd = "$crlf--$bnd$crlf";
+ ${$self->{bdy}} = join($bnd,
+ delete($self->{preamble}) // '',
+ map { $_->as_string } @$parts
+ ) .
+ $fin_bnd .
+ (delete($self->{epilogue}) // '');
+ undef;
+}
+
+sub body_set {
+ my ($self, $body) = @_;
+ my $bdy = $self->{bdy} = ref($body) ? $body : \$body;
+ my $cte = header_raw($self, 'Content-Transfer-Encoding');
+ if ($cte && $cte !~ $NO_ENCODE_RE) {
+ $$bdy = Email::MIME::Encodings::encode($cte, $$bdy)
+ }
+ undef;
+}
+
+sub body_str_set {
+ my ($self, $body_str) = @_;
+ my $charset = ct($self)->{attributes}->{charset} or
+ Carp::confess('body_str was given, but no charset is defined');
+ body_set($self, \(encode($charset, $body_str, Encode::FB_CROAK)));
+}
+
+sub content_type { scalar header($_[0], 'Content-Type') }
+
+# we only support raw header_set
+sub header_set {
+ my ($self, $pfx, @vals) = @_;
+ my $re = re_memo($pfx);
+ my $hdr = $self->{hdr};
+ return $$hdr =~ s!$re!!g if !@vals;
+ $pfx .= ': ';
+ my $len = 78 - length($pfx);
+ @vals = map {;
+ # folding differs from Email::Simple::Header,
+ # we favor tabs for visibility (and space savings :P)
+ if (length($_) >= $len && (/\n[^ \t]/s || !/\n/s)) {
+ local $Text::Wrap::columns = $len;
+ local $Text::Wrap::huge = 'overflow';
+ $pfx . wrap('', "\t", $_) . $self->{crlf};
+ } else {
+ $pfx . $_ . $self->{crlf};
+ }
+ } @vals;
+ $$hdr =~ s!$re!shift(@vals) // ''!ge; # replace current headers, first
+ $$hdr .= join('', @vals); # append any leftovers not replaced
+ # wantarray ? @_[2..$#_] : $_[2]; # Email::Simple::Header compat
+ undef; # we don't care for the return value
+}
+
+# note: we only call this method on Subject
+sub header_str_set {
+ my ($self, $name, @vals) = @_;
+ for (@vals) {
+ next unless /[^\x20-\x7e]/;
+ utf8::encode($_); # to octets
+ # 39: int((75 - length("Subject: =?UTF-8?B?".'?=') ) / 4) * 3;
+ s/(.{1,39})/'=?UTF-8?B?'.encode_base64($1, '').'?='/ges;
+ }
+ header_set($self, $name, @vals);
+}
+
+sub mhdr_decode ($) { eval { $MIME_Header->decode($_[0]) } // $_[0] }
+
+sub filename {
+ my $dis = header_raw($_[0], 'Content-Disposition');
+ my $attrs = parse_content_disposition($dis)->{attributes};
+ my $fn = $attrs->{filename};
+ $fn = ct($_[0])->{attributes}->{name} if !defined($fn) || $fn eq '';
+ (defined($fn) && $fn =~ /=\?/) ? mhdr_decode($fn) : $fn;
+}
+
+sub xs_addr_str { # helper for ->header / ->header_str
+ for (@_) { # array from header_raw()
+ next unless /=\?/;
+ my @g = parse_email_groups($_); # [ foo => [ E::A::X, ... ]
+ for (my $i = 0; $i < @g; $i += 2) {
+ if (defined($g[$i]) && $g[$i] =~ /=\?/) {
+ $g[$i] = mhdr_decode($g[$i]);
+ }
+ my $addrs = $g[$i + 1];
+ for my $eax (@$addrs) {
+ for my $m (qw(phrase comment)) {
+ my $v = $eax->$m;
+ $eax->$m(mhdr_decode($v)) if
+ $v && $v =~ /=\?/;
+ }
+ }
+ }
+ $_ = format_email_groups(@g);
+ }
+}
+
+eval {
+ require Email::Address::XS;
+ Email::Address::XS->import(qw(parse_email_groups format_email_groups));
+ 1;
+} or do {
+ # fallback to just decoding everything, because parsing
+ # email addresses correctly w/o C/XS is slow
+ %DECODE_FULL = (%DECODE_FULL, %DECODE_ADDRESS);
+ %DECODE_ADDRESS = ();
+};
+
+*header = \&header_str;
+sub header_str {
+ my ($self, $name) = @_;
+ my @v = header_raw($self, $name);
+ if ($DECODE_ADDRESS{$name}) {
+ xs_addr_str(@v);
+ } elsif ($DECODE_FULL{$name}) {
+ for (@v) {
+ $_ = mhdr_decode($_) if /=\?/;
+ }
+ }
+ wantarray ? @v : $v[0];
+}
+
+sub body_raw { ${$_[0]->{bdy} // \''}; }
+
+sub body { body_decode($_[0], body_raw($_[0])) }
+
+sub body_str {
+ my ($self) = @_;
+ my $ct = ct($self);
+ my $charset = $ct->{attributes}->{charset};
+ if (!$charset) {
+ if ($STR_TYPE{$ct->{type}} && $STR_SUBTYPE{$ct->{subtype}}) {
+ return body($self);
+ }
+ Carp::confess("can't get body as a string for ",
+ join("\n\t", header_raw($self, 'Content-Type')));
+ }
+ decode($charset, body($self), Encode::FB_CROAK);
+}
+
+sub as_string {
+ my ($self) = @_;
+ my $ret = ${ $self->{hdr} };
+ return $ret unless defined($self->{bdy});
+ $ret .= $self->{crlf};
+ $ret .= ${$self->{bdy}};
+}
+
+# Unlike Email::MIME::charset_set, this only changes the parsed
+# representation of charset used for search indexing and HTML display.
+# This does NOT affect what ->as_string returns.
+sub charset_set {
+ ct($_[0])->{attributes}->{charset} = $_[1];
+}
+
+sub crlf { $_[0]->{crlf} // "\n" }
+
+sub willneed { re_memo($_) for @_ }
+
+willneed(qw(From To Cc Date Subject Content-Type In-Reply-To References
+ Message-ID X-Alt-Message-ID));
+
+1;
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index cd73b5b6..600843f0 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -9,7 +9,7 @@ use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
use POSIX qw(dup2);
use IO::Socket::INET;
our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
- run_script start_script key2sub xsys xqx mime_load);
+ run_script start_script key2sub xsys xqx mime_load eml_load);
sub mime_load ($) {
my ($path) = @_;
@@ -17,6 +17,13 @@ sub mime_load ($) {
PublicInbox::MIME->new(\(do { local $/; <$fh> }));
}
+sub eml_load ($) {
+ my ($path, $cb) = @_;
+ open(my $fh, '<', $path) or die "open $path: $!";
+ binmode $fh;
+ PublicInbox::Eml->new(\(do { local $/; <$fh> }));
+}
+
sub tmpdir (;$) {
my ($base) = @_;
require File::Temp;
diff --git a/t/eml.t b/t/eml.t
new file mode 100644
index 00000000..43c735e7
--- /dev/null
+++ b/t/eml.t
@@ -0,0 +1,363 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::MsgIter qw(msg_part_text);
+my @classes = qw(PublicInbox::Eml);
+SKIP: {
+ require_mods('Email::MIME', 1);
+ push @classes, 'PublicInbox::MIME';
+};
+use_ok $_ for @classes;
+
+{
+ my $eml = PublicInbox::Eml->new(\(my $str = "a: b\n\nhi\n"));
+ is($str, "hi\n", '->new modified body like Email::Simple');
+ is($eml->body, "hi\n", '->body works');
+ is($eml->as_string, "a: b\n\nhi\n", '->as_string');
+}
+
+for my $cls (@classes) {
+ my $mime = $cls->new(my $orig = "From: x\n\nb");
+ is($mime->as_string, $orig, '->as_string works');
+ is($mime->header_obj->as_string, "From: x\n",
+ 'header ->as_string works');
+
+ # headers
+ is($mime->header_raw('From'), 'x', 'header_raw scalar context');
+ $mime = $cls->new("R:\n\tx\nR:\n 1\n");
+ is_deeply([$mime->header_raw('r')], [ 'x', '1' ], 'multi-value');
+ $mime = $cls->new("R:x\nR: 1\n");
+ is_deeply([$mime->header_raw('r')], [ 'x', '1' ], 'multi-value header');
+ $mime = $cls->new("R:x\n R: 1\nR:\n f\n");
+ is_deeply([$mime->header_raw('r')], [ 'x R: 1', 'f' ],
+ 'multi-line, multi-value header');
+
+ $mime->header_set('r');
+ is_deeply([$mime->header_raw('r')], [], 'header_set clears');
+ $mime->header_set('r');
+ is_deeply([$mime->header_raw('r')], [], 'header_set clears idempotent');
+ $mime->header_set('r', 'h');
+ is_deeply([$mime->header_raw('r')], ['h'], 'header_set');
+ $mime->header_set('r', 'h', 'i');
+ is_deeply([$mime->header_raw('r')], ['h', 'i'], 'header_set ary');
+ $mime->header_set('rr', 'b');
+ is_deeply([$mime->header_raw('r')], ['h', 'i'],
+ "header_set `rr' did not clobber `r'");
+ is($mime->header_raw('rr'), 'b', 'got set scalar');
+ $mime->header_set('rr', 'b'x100);
+ is($mime->header_raw('rr'), 'b'x100, 'got long set scalar');
+ if ($cls eq 'PublicInbox::Eml') {
+ like($mime->as_string, qr/^rr: b{100}\n(?:\n|\z)/sm,
+ 'single token not wrapped');
+ }
+ $mime->header_set('rr', ('b'x100) . ' wrap me');
+ if ($cls eq 'PublicInbox::Eml') {
+ like($mime->as_string, qr/^rr: b{100}\n\twrap me\n/sm,
+ 'wrapped after long token');
+ }
+ my $exp = "pre\tformatted\n with\n breaks";
+ $mime->header_set('r', $exp);
+ like($mime->as_string, qr/^r: \Q$exp\E/sm, 'preformatted preserved');
+} # for @classes
+
+for my $cls (@classes) { # make sure we don't add quotes if not needed
+ my $eml = $cls->new("From: John Smith <j\@example.com>\n\n");
+ is($eml->header('From'), 'John Smith <j@example.com>',
+ "name not unnecessarily quoted $cls");
+}
+
+for my $cls (@classes) {
+ my $eml = $cls->new("Subject: foo\n\n");
+ $eml->header_str_set('Subject', "\x{100}");
+ like($eml->header_raw('Subject'), qr/utf-8\?B\?/i,
+ 'MIME-B encoded UTF-8 Subject');
+ is_deeply([$eml->header_str('Subject')], [ "\x{100}" ],
+ 'got wide character back');
+}
+
+# linux-mips apparently got some messages injected w/o Message-ID
+# and long Subject: lines w/o leading whitespace.
+# What appears in the blobs was generated by V2Writable.
+for my $cls (@classes) {
+ my $eml = $cls->new(<<'EOF');
+Message-ID: <20101130193431@z>
+Subject: something really long
+and really wrong
+From: linux-mips archive injection
+Object-Id: 8c56b7abdd551b1264e6522ededbbed9890cccd0
+EOF
+ is_deeply([ $eml->header('Subject') ],
+ [ 'something really long and really wrong' ],
+ 'continued long line w/o leading spaces '.$cls);
+ is_deeply([ $eml->header('From') ],
+ [ 'linux-mips archive injection' ],
+ 'subsequent line not corrupted');
+ is_deeply([ $eml->header('Message-ID') ],
+ ['<20101130193431@z>'],
+ 'preceding line readable');
+} # for @classes
+
+{
+ my $eml = eml_load 't/msg_iter-order.eml';
+ my @parts;
+ my $orig = $eml->as_string;
+ $eml->each_part(sub {
+ my ($part, $level, @ex) = @{$_[0]};
+ my $s = $part->body_str;
+ $s =~ s/\s+//sg;
+ push @parts, [ $s, $level, @ex ];
+ });
+ is_deeply(\@parts, [ [ qw(a 1 1) ], [ qw(b 1 2) ] ], 'order is fine');
+ is($eml->as_string, $orig, 'unchanged by ->each_part');
+ $eml->each_part(sub {}, undef, 1);
+ is(defined($eml) ? $eml->body_raw : '', # old msg_iter clobbers $eml
+ '', 'each_part can clobber body');
+}
+
+# body-less, boundary-less
+for my $cls (@classes) {
+ my $call = 0;
+ $cls->new(<<'EOF')->each_part(sub { $call++ }, 0, 1);
+Content-Type: multipart/mixed; boundary="body-less"
+
+EOF
+ is($call, 1, 'called on bodyless multipart');
+
+ my @tmp;
+ $cls->new(<<'EOF')->each_part(sub { push @tmp, \@_; }, 0, 1);
+Content-Type: multipart/mixed; boundary="boundary-less"
+
+hello world
+EOF
+ is(scalar(@tmp), 1, 'got one part even w/o boundary');
+ is($tmp[0]->[0]->[0]->body, "hello world\n", 'body preserved');
+ is($tmp[0]->[0]->[1], 0, '$depth is zero');
+ is($tmp[0]->[0]->[2], 0, '@idx is zero');
+}
+
+# I guess the following only worked in PI::M because of a happy accident
+# involving inheritance:
+for my $cls (@classes) {
+ my @tmp;
+ my $header_less = <<'EOF';
+Archived-At: <85k5su9k59.fsf_-_@lola.goethe.zz>
+Content-Type: multipart/mixed; boundary="header-less"
+
+--header-less
+
+this is the body
+
+--header-less
+i-haz: header
+
+something else
+
+--header-less--
+EOF
+ my $expect = "this is the body\n";
+ $cls->new($header_less)->each_part(sub { push @tmp, \@_ }, 0, 1);
+ my $body = $tmp[0]->[0]->[0]->body;
+ if ($cls eq 'PublicInbox::Eml') {
+ is($body, $expect, 'body-only subpart in '.$cls);
+ } elsif ($body ne $expect) {
+ diag "W: $cls `$body' != `$expect'";
+ }
+ is($tmp[1]->[0]->[0]->body, "something else\n");
+ is(scalar(@tmp), 2, 'two parts');
+}
+
+if ('one newline before headers') {
+ my $eml = PublicInbox::Eml->new("\nNewline: no Header \n");
+ my @v = $eml->header_raw('Newline');
+ is_deeply(\@v, ['no Header'], 'no header');
+ is($eml->crlf, "\n", 'got CRLF as "\n"');
+ is($eml->body, "");
+}
+
+for my $cls (@classes) { # XXX: matching E::M, but not sure about this
+ my $s = <<EOF;
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+header: only
+--b--
+EOF
+ my $eml = $cls->new(\$s);
+ my $nr = 0;
+ my @v;
+ $eml->each_part(sub {
+ @v = $_[0]->[0]->header_raw('Header');
+ $nr++;
+ });
+ is($nr, 1, 'only one part');
+ is_deeply(\@v, [], "nothing w/o body $cls");
+}
+
+for my $cls (@classes) {
+ my $s = <<EOF; # double epilogue, double the fun
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+should: appear
+
+yes
+
+--b--
+
+--b
+should: not appear
+
+nope
+--b--
+EOF
+ my $eml = $cls->new(\$s);
+ my $nr = 0;
+ $eml->each_part(sub {
+ my $part = $_[0]->[0];
+ is_deeply([$part->header_raw('should')], ['appear'],
+ 'only got one header');
+ is($part->body, "yes\n", 'got expected body');
+ $nr++;
+ });
+ is($nr, 1, 'only one part');
+}
+
+for my $cls (@classes) {
+ my $s = <<EOF; # buggy git-send-email versions, again?
+Content-Type: text/plain; =?ISO-8859-1?Q?=20charset=3D=1BOF?=
+Content-Transfer-Encoding: 8bit
+Object-Id: ab0440d8cd6d843bee9a27709a459ce3b2bdb94d (lore/kvm)
+
+\xc4\x80
+EOF
+ my $eml = $cls->new(\$s);
+ my ($str, $err) = msg_part_text($eml, $eml->content_type);
+ is($str, "\x{100}\n", "got wide character by assuming utf-8");
+}
+
+if ('we differ from Email::MIME with final "\n" on missing epilogue') {
+ my $s = <<EOF;
+Content-Type: multipart/mixed; boundary="b"
+
+--b
+header: but
+
+no epilogue
+EOF
+ my $eml = PublicInbox::Eml->new(\$s);
+ is(($eml->subparts)[-1]->body, "no epilogue\n",
+ 'final "\n" preserved on missing epilogue');
+}
+
+if ('maxparts is a feature unique to us') {
+ my $eml = eml_load 't/psgi_attach.eml';
+ my @orig;
+ $eml->each_part(sub { push @orig, $_[0]->[0] });
+
+ local $PublicInbox::Eml::MAXPARTS = scalar(@orig);
+ my $i = 0;
+ $eml->each_part(sub {
+ my $cur = $_[0]->[0];
+ my $prv = $orig[$i++];
+ is($cur->body_raw, $prv->body_raw, "part #$i matches");
+ });
+ is($i, scalar(@orig), 'maxparts honored');
+ $PublicInbox::Eml::MAXPARTS--;
+ my @ltd;
+ $eml->each_part(sub { push @ltd, $_[0]->[0] });
+ for ($i = 0; $i <= $#ltd; $i++) {
+ is($ltd[$i]->body_raw, $orig[$i]->body_raw,
+ "part[$i] matches");
+ }
+ is(scalar(@ltd), scalar(@orig) - 1, 'maxparts honored');
+}
+
+SKIP: {
+ require_mods('PublicInbox::MIME', 1);
+ my $eml = eml_load 't/utf8.eml';
+ my $mime = mime_load 't/utf8.eml';
+ for my $h (qw(Subject From To)) {
+ my $v = $eml->header($h);
+ my $m = $mime->header($h);
+ is($v, $m, "decoded -8 $h matches Email::MIME");
+ ok(utf8::is_utf8($v), "$h is UTF-8");
+ ok(utf8::valid($v), "UTF-8 valid $h");
+ }
+ my $s = $eml->body_str;
+ ok(utf8::is_utf8($s), 'body_str is UTF-8');
+ ok(utf8::valid($s), 'UTF-8 valid body_str');
+ my $ref = \(my $x = 'ref');
+ for my $msg ($eml, $mime) {
+ $msg->body_str_set($s .= "\nHI\n");
+ ok(!utf8::is_utf8($msg->body_raw),
+ 'raw octets after body_str_set');
+ $s = $msg->body_str;
+ ok(utf8::is_utf8($s), 'body_str is UTF-8 after set');
+ ok(utf8::valid($s), 'UTF-8 valid body_str after set');
+ $msg->body_set($ref);
+ is($msg->body_raw, $$ref, 'body_set worked on scalar ref');
+ $msg->body_set($$ref);
+ is($msg->body_raw, $$ref, 'body_set worked on scalar');
+ }
+ $eml = eml_load 't/iso-2202-jp.eml';
+ $mime = mime_load 't/iso-2202-jp.eml';
+ $s = $eml->body_str;
+ is($s, $mime->body_str, 'ISO-2202-JP body_str');
+ ok(utf8::is_utf8($s), 'ISO-2202-JP => UTF-8 body_str');
+ ok(utf8::valid($s), 'UTF-8 valid body_str');
+
+ $eml = eml_load 't/psgi_attach.eml';
+ $mime = mime_load 't/psgi_attach.eml';
+ is_deeply([ map { $_->body_raw } $eml->subparts ],
+ [ map { $_->body_raw } $mime->subparts ],
+ 'raw ->subparts match deeply');
+ is_deeply([ map { $_->body } $eml->subparts ],
+ [ map { $_->body } $mime->subparts ],
+ '->subparts match deeply');
+ for my $msg ($eml, $mime) {
+ my @old = $msg->subparts;
+ $msg->parts_set([]);
+ is_deeply([$msg->subparts], [], 'parts_set can clear');
+ $msg->parts_set([$old[-1]]);
+ is(scalar $msg->subparts, 1, 'only last remains');
+ }
+ is($eml->as_string, $mime->as_string,
+ 'as_string matches after parts_set');
+}
+
+for my $cls (@classes) {
+ my $s = <<'EOF';
+Content-Type: text/x-patch; name="=?utf-8?q?vtpm-fakefile.patch?="
+Content-Disposition: attachment; filename="=?utf-8?q?vtpm-makefile.patch?="
+
+EOF
+ is($cls->new($s)->filename, 'vtpm-makefile.patch', 'filename decoded');
+ $s =~ s/^Content-Disposition:.*$//sm;
+ is($cls->new($s)->filename, 'vtpm-fakefile.patch', 'filename fallback');
+ is($cls->new($s)->content_type,
+ 'text/x-patch; name="vtpm-fakefile.patch"',
+ 'matches Email::MIME output, "correct" or not');
+
+ $s = <<'EOF';
+Content-Type: multipart/foo; boundary=b
+
+--b
+Content-Disposition: attachment; filename="=?utf-8?q?vtpm-makefile.patch?="
+
+a
+--b
+Content-Type: text/x-patch; name="=?utf-8?q?vtpm-fakefile.patch?="
+
+b
+--b--
+EOF
+ my @tmp;
+ $cls->new($s)->each_part(sub { push @tmp, $_[0]->[0]->filename });
+ is_deeply(['vtpm-makefile.patch', 'vtpm-fakefile.patch'], \@tmp,
+ 'got filename for both attachments');
+}
+
+done_testing;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 06/13] switch read-only Email::Simple users to Eml
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (4 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
` (6 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Since PublicInbox::Eml doesn't parse MIME subparts
up front, it can replace most uses of Email::Simple
without performance penalty.
This will eventually allow us to lower overall internal
API footprint by not having to keep the MIME vs Simple
distinction.
---
lib/PublicInbox/Mbox.pm | 16 +++++-----------
lib/PublicInbox/MboxGz.pm | 4 ++--
lib/PublicInbox/NNTP.pm | 19 ++++++++-----------
lib/PublicInbox/WWW.pm | 6 +++---
4 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 97bec5e7..94e61d4d 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -14,19 +14,13 @@ use PublicInbox::MID qw/mid_escape/;
use PublicInbox::Hval qw/to_filename/;
use PublicInbox::Smsg;
use PublicInbox::WwwStream qw(html_oneshot);
-use Email::Simple;
-use Email::MIME::Encode;
+use PublicInbox::Eml;
sub subject_fn ($) {
my ($hdr) = @_;
- my $fn = $hdr->header('Subject');
+ my $fn = $hdr->header_str('Subject');
return 'no-subject' if (!defined($fn) || $fn eq '');
- # no need for full Email::MIME, here
- if ($fn =~ /=\?/) {
- eval { $fn = Encode::decode('MIME-Header', $fn) };
- return 'no-subject' if $@;
- }
$fn =~ s/^re:\s+//i;
$fn eq '' ? 'no-subject' : to_filename($fn);
}
@@ -51,7 +45,7 @@ sub getline {
my $ibx = $ctx->{-inbox};
$next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
$mref = $ibx->msg_by_smsg($cur) or return;
- $hdr = Email::Simple->new($mref)->header_obj;
+ $hdr = PublicInbox::Eml->new($mref)->header_obj;
@$more = ($ctx, $id, $prev, $next); # $next may be undef, here
msg_hdr($ctx, $hdr) . msg_body($$mref);
}
@@ -72,7 +66,7 @@ sub emit_raw {
} else {
$mref = $ibx->msg_by_mid($mid) or return;
}
- my $hdr = Email::Simple->new($mref)->header_obj;
+ my $hdr = PublicInbox::Eml->new($mref)->header_obj;
$more = [ $ctx, $id, $prev, $next, $mref, $hdr ]; # for ->getline
my $fn = subject_fn($hdr);
my @hdr = ('Content-Type');
@@ -114,7 +108,7 @@ sub msg_hdr ($$;$) {
for (my $i = 0; $i < @append; $i += 2) {
my $k = $append[$i];
my $v = $append[$i + 1];
- my @v = $header_obj->header($k);
+ my @v = $header_obj->header_raw($k);
foreach (@v) {
if ($v eq $_) {
$v = undef;
diff --git a/lib/PublicInbox/MboxGz.pm b/lib/PublicInbox/MboxGz.pm
index e506de3d..f7fc4afc 100644
--- a/lib/PublicInbox/MboxGz.pm
+++ b/lib/PublicInbox/MboxGz.pm
@@ -3,7 +3,7 @@
package PublicInbox::MboxGz;
use strict;
use warnings;
-use Email::Simple;
+use PublicInbox::Eml;
use PublicInbox::Hval qw/to_filename/;
use PublicInbox::Mbox;
use Compress::Raw::Zlib qw(Z_FINISH Z_OK);
@@ -41,7 +41,7 @@ sub getline {
my $buf = delete($self->{buf});
while (my $smsg = $self->{cb}->($ctx)) {
my $mref = $ctx->{-inbox}->msg_by_smsg($smsg) or next;
- my $h = Email::Simple->new($mref)->header_obj;
+ my $h = PublicInbox::Eml->new($mref)->header_obj;
my $err = $gz->deflate(
PublicInbox::Mbox::msg_hdr($ctx, $h, $smsg->{mid}),
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index e9c66cd1..54207500 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -8,7 +8,7 @@ use warnings;
use base qw(PublicInbox::DS);
use fields qw(nntpd article ng long_cb);
use PublicInbox::MID qw(mid_escape $MID_EXTRACT);
-use Email::Simple;
+use PublicInbox::Eml;
use POSIX qw(strftime);
use PublicInbox::DS qw(now);
use Digest::SHA qw(sha1_hex);
@@ -383,7 +383,7 @@ sub cmd_quit ($) {
sub header_append ($$$) {
my ($hdr, $k, $v) = @_;
- my @v = $hdr->header($k);
+ my @v = $hdr->header_raw($k);
foreach (@v) {
return if $v eq $_;
}
@@ -416,11 +416,11 @@ sub set_nntp_headers ($$$$$) {
# leafnode (and maybe other NNTP clients) have trouble dealing
# with v2 messages which have multiple Message-IDs (either due
# to our own content-based dedupe or buggy git-send-email versions).
- my @mids = $hdr->header('Message-ID');
+ my @mids = $hdr->header_raw('Message-ID');
if (scalar(@mids) > 1) {
my $mid0 = "<$mid>";
$hdr->header_set('Message-ID', $mid0);
- my @alt = $hdr->header('X-Alt-Message-ID');
+ my @alt = $hdr->header_raw('X-Alt-Message-ID');
my %seen = map { $_ => 1 } (@alt, $mid0);
push(@alt, grep { !$seen{$_}++ } @mids);
$hdr->header_set('X-Alt-Message-ID', @alt);
@@ -478,10 +478,9 @@ found:
my $smsg = $ng->over->get_art($n) or return $err;
my $msg = $ng->msg_by_smsg($smsg) or return $err;
- # Email::Simple->new will modify $msg in-place as documented
- # in its manpage, so what's left is the body and we won't need
- # to call Email::Simple::body(), later
- my $hdr = Email::Simple->new($msg)->header_obj;
+ # PublicInbox::Eml->new will modify $msg in-place, so what's
+ # left is the body and we won't need to call ->body(), later
+ my $hdr = PublicInbox::Eml->new($msg)->header_obj;
set_nntp_headers($self, $hdr, $ng, $n, $mid) if $set_headers;
[ $n, $mid, $msg, $hdr ];
}
@@ -511,9 +510,7 @@ sub msg_hdr_write ($$$) {
$hdr =~ s/(?<!\r)\n/\r\n/sg; # Alpine barfs without this
# for leafnode compatibility, we need to ensure Message-ID headers
- # are only a single line. We can't subclass Email::Simple::Header
- # and override _default_fold_at in here, either; since that won't
- # affect messages already in the archive.
+ # are only a single line.
$hdr =~ s/^(Message-ID:)[ \t]*\r\n[ \t]+([^\r]+)\r\n/$1 $2\r\n/igsm;
$hdr .= "\r\n" if $body_follows;
$self->msg_more($hdr);
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 275e509f..6c016b03 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -22,6 +22,7 @@ use PublicInbox::MID qw(mid_escape);
use PublicInbox::GitHTTPBackend;
use PublicInbox::UserContent;
use PublicInbox::WwwStatic qw(r path_info_raw);
+use PublicInbox::Eml;
# TODO: consider a routing tree now that we have more endpoints:
our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
@@ -225,9 +226,8 @@ sub invalid_inbox_mid {
my ($x2, $x38) = ($1, $2);
# this is horrifically wasteful for legacy URLs:
my $str = $ctx->{-inbox}->msg_by_path("$x2/$x38") or return;
- require Email::Simple;
- my $s = Email::Simple->new($str);
- $mid = PublicInbox::MID::mid_clean($s->header('Message-ID'));
+ my $s = PublicInbox::Eml->new($str);
+ $mid = PublicInbox::MID::mid_clean($s->header_raw('Message-ID'));
return r301($ctx, $inbox, mid_escape($mid));
}
undef;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (5 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
` (5 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
---
Documentation/mknews.perl | 4 ++--
lib/PublicInbox/Admin.pm | 2 +-
lib/PublicInbox/Filter/Vger.pm | 4 ++--
lib/PublicInbox/Import.pm | 3 ++-
lib/PublicInbox/Inbox.pm | 4 ++--
lib/PublicInbox/InboxWritable.pm | 4 ++--
lib/PublicInbox/MDA.pm | 1 -
lib/PublicInbox/SearchIdx.pm | 6 +++---
lib/PublicInbox/SearchIdxShard.pm | 3 ++-
lib/PublicInbox/TestCommon.pm | 3 +++
lib/PublicInbox/V2Writable.pm | 17 +++++++++--------
lib/PublicInbox/View.pm | 2 +-
lib/PublicInbox/WWW.pm | 2 +-
lib/PublicInbox/WatchMaildir.pm | 4 ++--
lib/PublicInbox/WwwAttach.pm | 5 ++---
script/public-inbox-edit | 8 ++++----
script/public-inbox-learn | 4 ++--
script/public-inbox-mda | 16 ++++++++--------
script/public-inbox-purge | 4 ++--
t/filter_rubylang.t | 8 ++++----
t/import.t | 2 +-
21 files changed, 55 insertions(+), 51 deletions(-)
diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index a9dede00..3bdebfce 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -5,7 +5,7 @@
# this uses unstable internal APIs of public-inbox, and this script
# needs to be updated if they change.
use strict;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::View;
use PublicInbox::MsgTime qw(msg_datestamp);
use PublicInbox::MID qw(mids mid_escape);
@@ -76,7 +76,7 @@ sub release2mime {
my ($release, $mtime_ref) = @_;
my $f = "$dir/$release.eml";
open(my $fh, '<', $f) or die "open($f): $!";
- my $mime = PublicInbox::MIME->new(do { local $/; <$fh> });
+ my $mime = PublicInbox::Eml->new(\(do { local $/; <$fh> }));
# Documentation/include.mk relies on mtimes of each .eml file
# to trigger rebuild, so make sure we sync the mtime to the Date:
# header in the .eml
diff --git a/lib/PublicInbox/Admin.pm b/lib/PublicInbox/Admin.pm
index 62ddbe82..2c8d191a 100644
--- a/lib/PublicInbox/Admin.pm
+++ b/lib/PublicInbox/Admin.pm
@@ -122,7 +122,7 @@ EOF
}
# TODO: make Devel::Peek optional, only used for daemon
-my @base_mod = qw(Email::MIME Devel::Peek);
+my @base_mod = qw(Devel::Peek);
my @over_mod = qw(DBD::SQLite DBI);
my %mod_groups = (
-index => [ @base_mod, @over_mod ],
diff --git a/lib/PublicInbox/Filter/Vger.pm b/lib/PublicInbox/Filter/Vger.pm
index e746238c..2c73738d 100644
--- a/lib/PublicInbox/Filter/Vger.pm
+++ b/lib/PublicInbox/Filter/Vger.pm
@@ -5,7 +5,7 @@
package PublicInbox::Filter::Vger;
use base qw(PublicInbox::Filter::Base);
use strict;
-use warnings;
+use PublicInbox::Eml;
my $l0 = qr/-+/; # older messages only had one '-'
my $l1 =
@@ -25,7 +25,7 @@ sub scrub {
# so in multipart (e.g. GPG-signed) messages, the list trailer
# becomes invisible to MIME-aware email clients.
if ($s =~ s/$l0\n$l1\n$l2\n$l3\n($l4\n)?\z//os) {
- $mime = PublicInbox::MIME->new(\$s);
+ $mime = PublicInbox::Eml->new(\$s);
}
$self->ACCEPT($mime);
}
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index de8ff55f..98aa7785 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -15,6 +15,7 @@ use PublicInbox::Address;
use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
use PublicInbox::ContentId qw(content_digest);
use PublicInbox::MDA;
+use PublicInbox::Eml;
use POSIX qw(strftime);
sub new {
@@ -137,7 +138,7 @@ sub check_remove_v1 {
$info =~ m!\A100644 blob ([a-f0-9]{40})\t!s or die "not blob: $info";
my $oid = $1;
my $msg = _cat_blob($r, $w, $oid) or die "BUG: cat-blob $1 failed";
- my $cur = PublicInbox::MIME->new($msg);
+ my $cur = PublicInbox::Eml->new($msg);
my $cur_s = $cur->header('Subject');
$cur_s = '' unless defined $cur_s;
my $cur_m = $mime->header('Subject');
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 186eb420..617b692b 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -7,7 +7,7 @@ use strict;
use warnings;
use PublicInbox::Git;
use PublicInbox::MID qw(mid2path);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
# Long-running "git-cat-file --batch" processes won't notice
# unlinked packs, so we need to restart those processes occasionally.
@@ -328,7 +328,7 @@ sub msg_by_smsg ($$;$) {
sub smsg_mime {
my ($self, $smsg, $ref) = @_;
if (my $s = msg_by_smsg($self, $smsg, $ref)) {
- $smsg->{mime} = PublicInbox::MIME->new($s);
+ $smsg->{mime} = PublicInbox::Eml->new($s);
return $smsg;
}
}
diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index 31aa76c6..3558403b 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -117,7 +117,7 @@ sub mime_from_path ($) {
local $/;
my $str = <$fh>;
$str or return;
- return PublicInbox::MIME->new(\$str);
+ return PublicInbox::Eml->new(\$str);
} elsif ($!{ENOENT}) {
# common with Maildir
return;
@@ -162,7 +162,7 @@ sub mb_add ($$$$) {
} elsif ($variant eq 'mboxo') {
$$msg =~ s/^>From /From /gms;
}
- my $mime = PublicInbox::MIME->new($msg);
+ my $mime = PublicInbox::Eml->new($msg);
if ($filter) {
my $ret = $filter->scrub($mime) or return;
return if $ret == REJECT();
diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
index 33696528..57b436b9 100644
--- a/lib/PublicInbox/MDA.pm
+++ b/lib/PublicInbox/MDA.pm
@@ -5,7 +5,6 @@
package PublicInbox::MDA;
use strict;
use warnings;
-use Email::Simple;
use PublicInbox::MsgTime;
use constant MAX_SIZE => 1024 * 500; # same as spamc default, should be tunable
use constant MAX_MID_SIZE => 244; # max term size - 1 in Xapian
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index a7e31b71..f054bb6a 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -10,7 +10,7 @@ package PublicInbox::SearchIdx;
use strict;
use warnings;
use base qw(PublicInbox::Search PublicInbox::Lock);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::InboxWritable;
use PublicInbox::MID qw/mid_clean mid_mime mids_for_index/;
use PublicInbox::MsgIter;
@@ -365,7 +365,7 @@ sub _msgmap_init ($) {
}
sub add_message {
- # mime = Email::MIME object
+ # mime = PublicInbox::Eml or Email::MIME object
my ($self, $mime, $smsg) = @_;
my $hdr = $mime->header_obj;
my $mids = mids_for_index($hdr);
@@ -554,7 +554,7 @@ sub do_cat_mail {
my ($git, $blob, $sizeref) = @_;
my $str = $git->cat_file($blob, $sizeref) or
die "BUG: $blob not found in $git->{git_dir}";
- PublicInbox::MIME->new($str);
+ PublicInbox::Eml->new($str);
}
# called by public-inbox-index
diff --git a/lib/PublicInbox/SearchIdxShard.pm b/lib/PublicInbox/SearchIdxShard.pm
index 06bcd403..e754b038 100644
--- a/lib/PublicInbox/SearchIdxShard.pm
+++ b/lib/PublicInbox/SearchIdxShard.pm
@@ -8,6 +8,7 @@ use strict;
use warnings;
use base qw(PublicInbox::SearchIdx);
use IO::Handle (); # autoflush
+use PublicInbox::Eml;
sub new {
my ($class, $v2writable, $shard) = @_;
@@ -75,7 +76,7 @@ sub shard_worker_loop ($$$$$) {
$self->begin_txn_lazy;
my $n = read($r, my $msg, $bytes) or die "read: $!\n";
$n == $bytes or die "short read: $n != $bytes\n";
- my $mime = PublicInbox::MIME->new(\$msg);
+ my $mime = PublicInbox::Eml->new(\$msg);
my $smsg = bless {
bytes => $bytes,
num => $num + 0,
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 600843f0..978c3cd7 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -8,12 +8,15 @@ use parent qw(Exporter);
use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
use POSIX qw(dup2);
use IO::Socket::INET;
+use PublicInbox::MIME; # temporary
our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
run_script start_script key2sub xsys xqx mime_load eml_load);
sub mime_load ($) {
my ($path) = @_;
open(my $fh, '<', $path) or die "open $path: $!";
+ # test should've called: require_mods('Email::MIME')
+ require PublicInbox::MIME;
PublicInbox::MIME->new(\(do { local $/; <$fh> }));
}
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 01b8bed6..f599e0a0 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -9,7 +9,7 @@ use warnings;
use base qw(PublicInbox::Lock);
use 5.010_001;
use PublicInbox::SearchIdxShard;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Git;
use PublicInbox::Import;
use PublicInbox::MID qw(mids references);
@@ -357,9 +357,10 @@ sub content_ids ($) {
my ($mime) = @_;
my @cids = ( content_id($mime) );
+ # We still support Email::MIME, here, and
# Email::MIME->as_string doesn't always round-trip, so we may
# use a second content_id
- my $rt = content_id(PublicInbox::MIME->new(\($mime->as_string)));
+ my $rt = content_id(PublicInbox::Eml->new(\($mime->as_string)));
push @cids, $rt if $cids[0] ne $rt;
\@cids;
}
@@ -405,7 +406,7 @@ sub rewrite_internal ($$;$$$) {
next; # continue
}
my $orig = $$msg;
- my $cur = PublicInbox::MIME->new($msg);
+ my $cur = PublicInbox::Eml->new($msg);
if (content_matches($cids, $cur)) {
$gone{$smsg->{num}} = [ $smsg, $cur, \$orig ];
}
@@ -842,7 +843,7 @@ sub content_exists ($$$) {
warn "broken smsg for $mid\n";
next;
}
- my $cur = PublicInbox::MIME->new($msg);
+ my $cur = PublicInbox::Eml->new($msg);
return 1 if content_matches($cids, $cur);
# XXX DEBUG_DIFF is experimental and may be removed
@@ -870,7 +871,7 @@ sub mark_deleted ($$$$) {
my ($self, $sync, $git, $oid) = @_;
return if PublicInbox::SearchIdx::too_big($self, $git, $oid);
my $msgref = $git->cat_file($oid);
- my $mime = PublicInbox::MIME->new($$msgref);
+ my $mime = PublicInbox::Eml->new($$msgref);
my $mids = mids($mime->header_obj);
my $cid = content_id($mime);
foreach my $mid (@$mids) {
@@ -901,7 +902,7 @@ sub reindex_oid_m ($$$$;$) {
$self->{current_info} = "multi_mid $oid";
my ($num, $mid0, $len);
my $msgref = $git->cat_file($oid, \$len);
- my $mime = PublicInbox::MIME->new($$msgref);
+ my $mime = PublicInbox::Eml->new($$msgref);
my $mids = mids($mime->header_obj);
my $cid = content_id($mime);
die "BUG: reindex_oid_m called for <=1 mids" if scalar(@$mids) <= 1;
@@ -999,7 +1000,7 @@ sub reindex_oid ($$$$) {
my ($num, $mid0, $len);
my $msgref = $git->cat_file($oid, \$len);
return if $len == 0; # purged
- my $mime = PublicInbox::MIME->new($$msgref);
+ my $mime = PublicInbox::Eml->new($$msgref);
my $mids = mids($mime->header_obj);
my $cid = content_id($mime);
@@ -1193,7 +1194,7 @@ sub unindex_oid ($$$;$) {
my ($self, $git, $oid, $unindexed) = @_;
my $mm = $self->{mm};
my $msgref = $git->cat_file($oid);
- my $mime = PublicInbox::MIME->new($msgref);
+ my $mime = PublicInbox::Eml->new($msgref);
my $mids = mids($mime->header_obj);
$mime = $msgref = undef;
my $over = $self->{over};
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3328c865..ef5f4b3a 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -56,7 +56,7 @@ sub msg_page {
} else {
$first = $ibx->msg_by_mid($mid) or return;
}
- my $mime = PublicInbox::MIME->new($first);
+ my $mime = PublicInbox::Eml->new($first);
$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
my $hdr = $ctx->{hdr} = $mime->header_obj;
$ctx->{obuf} = _msg_page_prepare_obuf($hdr, $ctx, 0);
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index 6c016b03..71fe1f4b 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -146,7 +146,7 @@ sub preload {
require PublicInbox::Feed;
require PublicInbox::View;
require PublicInbox::SearchThread;
- require PublicInbox::MIME;
+ require PublicInbox::Eml;
require PublicInbox::Mbox;
require PublicInbox::ViewVCS;
require PublicInbox::WwwText;
diff --git a/lib/PublicInbox/WatchMaildir.pm b/lib/PublicInbox/WatchMaildir.pm
index 71bd84fc..7ca35403 100644
--- a/lib/PublicInbox/WatchMaildir.pm
+++ b/lib/PublicInbox/WatchMaildir.pm
@@ -6,7 +6,7 @@
package PublicInbox::WatchMaildir;
use strict;
use warnings;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::InboxWritable;
use File::Temp 0.19 (); # 0.19 for ->newdir
use PublicInbox::Filter::Base qw(REJECT);
@@ -282,7 +282,7 @@ sub _spamcheck_cb {
my ($mime) = @_;
my $tmp = '';
if ($sc->spamcheck($mime, \$tmp)) {
- return PublicInbox::MIME->new(\$tmp);
+ return PublicInbox::Eml->new(\$tmp);
}
warn $mime->header('Message-ID')." failed spam check\n";
undef;
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index b1009907..5b2914b3 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -7,8 +7,7 @@ use strict;
use warnings;
use bytes (); # only for bytes::length
use Email::MIME::ContentType qw(parse_content_type);
-use PublicInbox::MIME;
-use PublicInbox::MsgIter;
+use PublicInbox::Eml;
sub get_attach_i { # ->each_part callback
my ($part, $depth, $idx) = @{$_[0]};
@@ -38,7 +37,7 @@ sub get_attach ($$$) {
my ($ctx, $idx, $fn) = @_;
my $res = [ 404, [ 'Content-Type', 'text/plain' ], [ "Not found\n" ] ];
my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
- $mime = PublicInbox::MIME->new($mime);
+ $mime = PublicInbox::Eml->new($mime);
$res->[3] = $idx;
$mime->each_part(\&get_attach_i, $res, 1);
pop @$res; # cleanup before letting PSGI server see it
diff --git a/script/public-inbox-edit b/script/public-inbox-edit
index 42f914a8..e895a228 100755
--- a/script/public-inbox-edit
+++ b/script/public-inbox-edit
@@ -12,7 +12,7 @@ use File::Temp 0.19 (); # 0.19 for TMPDIR
use PublicInbox::ContentId qw(content_id);
use PublicInbox::MID qw(mid_clean mids);
PublicInbox::Admin::check_require('-index');
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::InboxWritable;
use PublicInbox::Import;
@@ -52,7 +52,7 @@ sub find_mid ($$$) {
my ($id, $prev);
while (my $smsg = $over->next_by_mid($mid, \$id, \$prev)) {
my $ref = $ibx->msg_by_smsg($smsg);
- my $mime = PublicInbox::MIME->new($ref);
+ my $mime = PublicInbox::Eml->new($ref);
my $cid = content_id($mime);
my $tuple = [ $ibx, $smsg ];
push @{$found->{$cid} ||= []}, $tuple
@@ -205,8 +205,8 @@ W: possible message boundary splitting error
$new_raw =~ s/^>(>*From )/$1/gm;
}
- my $new_mime = PublicInbox::MIME->new(\$new_raw);
- my $old_mime = PublicInbox::MIME->new($old_raw);
+ my $new_mime = PublicInbox::Eml->new(\$new_raw);
+ my $old_mime = PublicInbox::Eml->new($old_raw);
# make sure we don't compare unwanted headers, since mutt adds
# Content-Length, Status, and Lines headers:
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 4c10b68b..a33d813a 100644
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -9,7 +9,7 @@ use strict;
use warnings;
use PublicInbox::Config;
use PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Address;
use PublicInbox::Spamcheck::Spamc;
my $train = shift or die "usage: $usage\n";
@@ -20,7 +20,7 @@ if ($train !~ /\A(?:ham|spam|rm)\z/) {
my $spamc = PublicInbox::Spamcheck::Spamc->new;
my $pi_config = PublicInbox::Config->new;
my $err;
-my $mime = PublicInbox::MIME->new(do{
+my $mime = PublicInbox::Eml->new(do{
local $/;
my $data = <STDIN>;
$data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 54d0af01..42d0e00c 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -15,8 +15,7 @@ my $do_exit = sub {
exit $code;
};
-use Email::Simple;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::MDA;
use PublicInbox::Config;
use PublicInbox::Emergency;
@@ -32,7 +31,7 @@ $ems = PublicInbox::Emergency->new($emergency);
my $str = do { local $/; <STDIN> };
$str =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
$ems->prepare(\$str);
-my $simple = Email::Simple->new(\$str);
+my $eml = PublicInbox::Eml->new(\$str);
my $config = PublicInbox::Config->new;
my $key = 'publicinboxmda.spamcheck';
my $default = 'PublicInbox::Spamcheck::Spamc';
@@ -44,7 +43,7 @@ if (defined $recipient) {
push @$dests, $ibx if $ibx;
}
if (!scalar(@$dests)) {
- $dests = PublicInbox::MDA->inboxes_for_list_id($config, $simple);
+ $dests = PublicInbox::MDA->inboxes_for_list_id($config, $eml);
if (!scalar(@$dests) && !defined($recipient)) {
die "ORIGINAL_RECIPIENT not defined in ENV\n";
}
@@ -61,7 +60,7 @@ my $err;
0;
# pre-check, MDA has stricter rules than an importer might;
} elsif ($precheck) {
- !!PublicInbox::MDA->precheck($simple, $ibx->{address});
+ !!PublicInbox::MDA->precheck($eml, $ibx->{address});
} else {
1;
}
@@ -69,7 +68,7 @@ my $err;
$do_exit->(67) if $err && scalar(@$dests) == 0;
-$simple = undef;
+$eml = undef;
my $spam_ok;
if ($spamc) {
$str = '';
@@ -101,9 +100,10 @@ my @rejects;
for my $ibx (@$dests) {
mda_filter_adjust($ibx);
my $filter = $ibx->filter;
- my $mime = PublicInbox::MIME->new($str);
+ my $mime = PublicInbox::Eml->new($str);
my $ret = $filter->delivery($mime);
- if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message
+ if (ref($ret) && ($ret->isa('PublicInbox::Eml') ||
+ $ret->isa('Email::MIME'))) { # filter altered message
$mime = $ret;
} elsif ($ret == PublicInbox::Filter::Base::IGNORE) {
next; # nothing, keep looping
diff --git a/script/public-inbox-purge b/script/public-inbox-purge
index 8301b06d..82a63b80 100755
--- a/script/public-inbox-purge
+++ b/script/public-inbox-purge
@@ -10,7 +10,7 @@ use Getopt::Long qw(:config gnu_getopt no_ignore_case auto_abbrev);
use PublicInbox::AdminEdit;
PublicInbox::Admin::check_require('-index');
use PublicInbox::Filter::Base qw(REJECT);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require PublicInbox::V2Writable;
my $usage = "$0 [--all] [INBOX_DIRS] </path/to/message";
@@ -26,7 +26,7 @@ $data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
my $n_purged = 0;
foreach my $ibx (@ibxs) {
- my $mime = PublicInbox::MIME->new($data);
+ my $mime = PublicInbox::Eml->new($data);
my $v2w = PublicInbox::V2Writable->new($ibx, 0);
my $commits = $v2w->purge($mime) || [];
diff --git a/t/filter_rubylang.t b/t/filter_rubylang.t
index 05e1b324..e6c53f98 100644
--- a/t/filter_rubylang.t
+++ b/t/filter_rubylang.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
use_ok 'PublicInbox::Filter::RubyLang';
@@ -17,7 +17,7 @@ keep this
Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>
EOF
-my $mime = PublicInbox::MIME->new($msg);
+my $mime = PublicInbox::Eml->new($msg);
my $ret = $f->delivery($mime);
is($ret, $mime, "delivery successful");
is($mime->body, "keep this\n", 'normal message filtered OK');
@@ -41,7 +41,7 @@ X-Mail-Count: 12
Message-ID: <a@b>
EOF
- $mime = PublicInbox::MIME->new($msg);
+ $mime = PublicInbox::Eml->new($msg);
$ret = $f->delivery($mime);
is($ret, $mime, "delivery successful");
my $mm = PublicInbox::Msgmap->new($git_dir);
@@ -53,7 +53,7 @@ Message-ID: <b@b>
EOF
- $mime = PublicInbox::MIME->new($msg);
+ $mime = PublicInbox::Eml->new($msg);
$ret = $f->delivery($mime);
is($ret, 100, "delivery rejected without X-Mail-Count");
}
diff --git a/t/import.t b/t/import.t
index d2264102..ba4abd9c 100644
--- a/t/import.t
+++ b/t/import.t
@@ -75,7 +75,7 @@ $im->done;
is(scalar @revs, 26, '26 revisions exist after mass import');
my ($mark, $msg) = $im->remove($mime);
like($mark, qr/\A:\d+\z/, 'got mark');
-is(ref($msg), 'PublicInbox::MIME', 'got old message deleted');
+like(ref($msg), qr/\bPublicInbox::(?:Eml|MIME)\b/, 'got old message deleted');
is(undef, $im->remove($mime), 'remove is idempotent');
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (6 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
` (4 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Since we're getting rid of Email::MIME, get rid of
Email::MIME::ContentType, too; since we may introduce
speedups down the line specific to our codebase.
---
MANIFEST | 3 +
lib/PublicInbox/Eml.pm | 7 +-
lib/PublicInbox/EmlContentFoo.pm | 294 +++++++++++++++++++++++++++++++
lib/PublicInbox/WwwAttach.pm | 2 +-
t/eml_content_disposition.t | 102 +++++++++++
t/eml_content_type.t | 289 ++++++++++++++++++++++++++++++
6 files changed, 692 insertions(+), 5 deletions(-)
create mode 100644 lib/PublicInbox/EmlContentFoo.pm
create mode 100644 t/eml_content_disposition.t
create mode 100644 t/eml_content_type.t
diff --git a/MANIFEST b/MANIFEST
index 0906448e..055c8c9a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -106,6 +106,7 @@ lib/PublicInbox/DSPoll.pm
lib/PublicInbox/Daemon.pm
lib/PublicInbox/Emergency.pm
lib/PublicInbox/Eml.pm
+lib/PublicInbox/EmlContentFoo.pm
lib/PublicInbox/ExtMsg.pm
lib/PublicInbox/Feed.pm
lib/PublicInbox/Filter/Base.pm
@@ -231,6 +232,8 @@ t/ds-poll.t
t/edit.t
t/emergency.t
t/eml.t
+t/eml_content_disposition.t
+t/eml_content_type.t
t/epoll.t
t/fail-bin/spamc
t/feed.t
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 0c23bed0..1988bdb3 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -33,10 +33,9 @@ use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
my $MIME_Header = find_encoding('MIME-Header');
-# TODO remove these dependencies
-use Email::MIME::ContentType;
+use PublicInbox::EmlContentFoo qw(parse_content_type parse_content_disposition);
use Email::MIME::Encodings;
-$Email::MIME::ContentType::STRICT_PARAMS = 0;
+$PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
our $MAXPARTS = 1000; # same as SpamAssassin
our $MAXDEPTH = 20; # seems enough, Perl sucks, here
@@ -108,7 +107,7 @@ sub header_raw {
# pick the first Content-Type header to match Email::MIME behavior.
# It's usually the right one based on historical archives.
sub ct ($) {
- # Email::MIME::ContentType::content_type:
+ # PublicInbox::EmlContentFoo::content_type:
$_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
}
diff --git a/lib/PublicInbox/EmlContentFoo.pm b/lib/PublicInbox/EmlContentFoo.pm
new file mode 100644
index 00000000..f507d548
--- /dev/null
+++ b/lib/PublicInbox/EmlContentFoo.pm
@@ -0,0 +1,294 @@
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+# <https://www.gnu.org/licenses/gpl-1.0.txt>
+# <https://dev.perl.org/licenses/artistic.html>
+#
+# This license differs from the rest of public-inbox
+#
+# This is a fork of the Email::MIME::ContentType 1.022 with
+# minor improvements and incompatibilities; namely changes to
+# quiet warnings with legacy data.
+package PublicInbox::EmlContentFoo;
+use strict;
+use parent qw(Exporter);
+# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+
+use Encode 2.87 qw(find_mime_encoding);
+our @EXPORT_OK = qw(parse_content_type parse_content_disposition);
+
+our $STRICT_PARAMS = 1;
+
+my $ct_default = 'text/plain; charset=us-ascii';
+
+my $re_token = # US-ASCII except SPACE, CTLs and tspecials ()<>@,;:\\"/[]?=
+ qr/[\x21\x23-\x27\x2A\x2B\x2D\x2E\x30-\x39\x41-\x5A\x5E-\x7E]+/;
+
+my $re_token_non_strict = # allow CTLs and above ASCII
+ qr/([\x00-\x08\x0B\x0C\x0E-\x1F\x7E-\xFF]+|$re_token)/;
+
+my $re_qtext = # US-ASCII except CR, LF, white space, backslash and quote
+ qr/[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7E\x7F]/;
+my $re_quoted_pair = qr/\\[\x00-\x7F]/;
+my $re_quoted_string = qr/"((?:[ \t]*(?:$re_qtext|$re_quoted_pair))*[ \t]*)"/;
+
+my $re_qtext_non_strict = qr/[\x80-\xFF]|$re_qtext/;
+my $re_quoted_pair_non_strict = qr/\\[\x00-\xFF]/;
+my $re_quoted_string_non_strict =
+qr/"((?:[ \t]*(?:$re_qtext_non_strict|$re_quoted_pair_non_strict))*[ \t]*)"/;
+
+my $re_charset = qr/[!"#\$%&'+\-0-9A-Z\\\^_`a-z\{\|\}~]+/;
+my $re_language = qr/[A-Za-z]{1,8}(?:-[0-9A-Za-z]{1,8})*/;
+my $re_exvalue = qr/($re_charset)?'(?:$re_language)?'(.*)/;
+
+sub parse_content_type {
+ my ($ct) = @_;
+
+ # If the header isn't there or is empty, give default answer.
+ $ct = $ct_default unless defined($ct) && length($ct);
+
+ _unfold_lines($ct);
+ _clean_comments($ct);
+
+ # It is also recommend (sic.) that this default be assumed when a
+ # syntactically invalid Content-Type header field is encountered.
+ unless ($ct =~ s/^($re_token)\/($re_token)//) {
+ unless ($STRICT_PARAMS && $ct =~ s/^($re_token_non_strict)\/
+ ($re_token_non_strict)//x) {
+ #carp "Invalid Content-Type '$ct'";
+ return parse_content_type($ct_default);
+ }
+ }
+
+ my ($type, $subtype) = (lc $1, lc $2);
+
+ _clean_comments($ct);
+ $ct =~ s/\s+$//;
+
+ my $attributes = {};
+ if ($STRICT_PARAMS && length($ct) && $ct !~ /^;/) {
+ # carp "Missing ';' before first Content-Type parameter '$ct'";
+ } else {
+ $attributes = _process_rfc2231(_parse_attributes($ct));
+ }
+
+ {
+ type => $type,
+ subtype => $subtype,
+ attributes => $attributes,
+
+ # This is dumb. Really really dumb. For backcompat. -- rjbs,
+ # 2013-08-10
+ discrete => $type,
+ composite => $subtype,
+ };
+}
+
+my $cd_default = 'attachment';
+
+sub parse_content_disposition {
+ my ($cd) = @_;
+
+ $cd = $cd_default unless defined($cd) && length($cd);
+
+ _unfold_lines($cd);
+ _clean_comments($cd);
+
+ unless ($cd =~ s/^($re_token)//) {
+ unless ($STRICT_PARAMS and $cd =~ s/^($re_token_non_strict)//) {
+ #carp "Invalid Content-Disposition '$cd'";
+ return parse_content_disposition($cd_default);
+ }
+ }
+
+ my $type = lc $1;
+
+ _clean_comments($cd);
+ $cd =~ s/\s+$//;
+
+ my $attributes = {};
+ if ($STRICT_PARAMS && length($cd) && $cd !~ /^;/) {
+# carp "Missing ';' before first Content-Disposition parameter '$cd'";
+ } else {
+ $attributes = _process_rfc2231(_parse_attributes($cd));
+ }
+
+ {
+ type => $type,
+ attributes => $attributes,
+ };
+}
+
+sub _unfold_lines {
+ $_[0] =~ s/(?:\r\n|[\r\n])(?=[ \t])//g;
+}
+
+sub _clean_comments {
+ my $ret = ($_[0] =~ s/^\s+//);
+ while (length $_[0]) {
+ last unless $_[0] =~ s/^\(//;
+ my $level = 1;
+ while (length $_[0]) {
+ my $ch = substr $_[0], 0, 1, '';
+ if ($ch eq '(') {
+ $level++;
+ } elsif ($ch eq ')') {
+ $level--;
+ last if $level == 0;
+ } elsif ($ch eq '\\') {
+ substr $_[0], 0, 1, '';
+ }
+ }
+ # carp "Unbalanced comment" if $level != 0 and $STRICT_PARAMS;
+ $ret |= ($_[0] =~ s/^\s+//);
+ }
+ $ret;
+}
+
+sub _process_rfc2231 {
+ my ($attribs) = @_;
+ my %cont;
+ my %encoded;
+ foreach (keys %{$attribs}) {
+ next unless $_ =~ m/^(.*)\*([0-9])\*?$/;
+ my ($attr, $sec) = ($1, $2);
+ $cont{$attr}->[$sec] = $attribs->{$_};
+ $encoded{$attr}->[$sec] = 1 if $_ =~ m/\*$/;
+ delete $attribs->{$_};
+ }
+ foreach (keys %cont) {
+ my $key = $_;
+ $key .= '*' if $encoded{$_};
+ $attribs->{$key} = join '', @{$cont{$_}};
+ }
+ foreach (keys %{$attribs}) {
+ next unless $_ =~ m/^(.*)\*$/;
+ my $key = $1;
+ next unless $attribs->{$_} =~ m/^$re_exvalue$/;
+ my ($charset, $value) = ($1, $2);
+ $value =~ s/%([0-9A-Fa-f]{2})/pack('C', hex($1))/eg;
+ if (length $charset) {
+ my $enc = find_mime_encoding($charset);
+ if (defined $enc) {
+ $value = $enc->decode($value);
+ # } else {
+ #carp "Unknown charset '$charset' in
+ #attribute '$key' value";
+ }
+ }
+ $attribs->{$key} = $value;
+ delete $attribs->{$_};
+ }
+ $attribs;
+}
+
+sub _parse_attributes {
+ local $_ = shift;
+ substr($_, 0, 0, '; ') if length $_ and $_ !~ /^;/;
+ my $attribs = {};
+ while (length $_) {
+ s/^;// or $STRICT_PARAMS and do {
+ #carp "Missing semicolon before parameter '$_'";
+ return $attribs;
+ };
+ _clean_comments($_);
+ unless (length $_) {
+ # Some mail software generates a Content-Type like this:
+ # "Content-Type: text/plain;"
+ # RFC 1521 section 3 says a parameter must exist if
+ # there is a semicolon.
+ #carp "Extra semicolon after last parameter" if
+ #$STRICT_PARAMS;
+ return $attribs;
+ }
+ my $attribute;
+ if (s/^($re_token)=//) {
+ $attribute = lc $1;
+ } else {
+ if ($STRICT_PARAMS) {
+ # carp "Illegal parameter '$_'";
+ return $attribs;
+ }
+ if (s/^($re_token_non_strict)=//) {
+ $attribute = lc $1;
+ } else {
+ unless (s/^([^;=\s]+)\s*=//) {
+ #carp "Cannot parse parameter '$_'";
+ return $attribs;
+ }
+ $attribute = lc $1;
+ }
+ }
+ _clean_comments($_);
+ my $value = _extract_attribute_value();
+ $attribs->{$attribute} = $value;
+ _clean_comments($_);
+ }
+ $attribs;
+}
+
+sub _extract_attribute_value { # EXPECTS AND MODIFIES $_
+ my $value;
+ while (length $_) {
+ if (s/^($re_token)//) {
+ $value .= $1;
+ } elsif (s/^$re_quoted_string//) {
+ my $sub = $1;
+ $sub =~ s/\\(.)/$1/g;
+ $value .= $sub;
+ } elsif ($STRICT_PARAMS) {
+ #my $char = substr $_, 0, 1;
+ #carp "Unquoted '$char' not allowed";
+ return;
+ } elsif (s/^($re_token_non_strict)//) {
+ $value .= $1;
+ } elsif (s/^$re_quoted_string_non_strict//) {
+ my $sub = $1;
+ $sub =~ s/\\(.)/$1/g;
+ $value .= $sub;
+ }
+ my $erased = _clean_comments($_);
+ last if !length $_ or /^;/;
+ if ($STRICT_PARAMS) {
+ #my $char = substr $_, 0, 1;
+ #carp "Extra '$char' found after parameter";
+ return;
+ }
+ if ($erased) {
+ # Sometimes semicolon is missing, so check for = char
+ last if m/^$re_token_non_strict=/;
+ $value .= ' ';
+ }
+ $value .= substr $_, 0, 1, '';
+ }
+ $value;
+}
+
+1;
+__END__
+=func parse_content_type
+
+This routine is exported by default.
+
+This routine parses email content type headers according to section 5.1 of RFC
+2045 and also RFC 2231 (Character Set and Parameter Continuations). It returns
+a hash as above, with entries for the C<type>, the C<subtype>, and a hash of
+C<attributes>.
+
+For backward compatibility with a really unfortunate misunderstanding of RFC
+2045 by the early implementors of this module, C<discrete> and C<composite> are
+also present in the returned hashref, with the values of C<type> and C<subtype>
+respectively.
+
+=func parse_content_disposition
+
+This routine is exported by default.
+
+This routine parses email Content-Disposition headers according to RFC 2183 and
+RFC 2231. It returns a hash as above, with entries for the C<type>, and a hash
+of C<attributes>.
+
+=cut
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index 5b2914b3..754da13f 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -6,7 +6,7 @@ package PublicInbox::WwwAttach; # internal package
use strict;
use warnings;
use bytes (); # only for bytes::length
-use Email::MIME::ContentType qw(parse_content_type);
+use PublicInbox::EmlContentFoo qw(parse_content_type);
use PublicInbox::Eml;
sub get_attach_i { # ->each_part callback
diff --git a/t/eml_content_disposition.t b/t/eml_content_disposition.t
new file mode 100644
index 00000000..9bdacc05
--- /dev/null
+++ b/t/eml_content_disposition.t
@@ -0,0 +1,102 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+# <https://www.gnu.org/licenses/gpl-1.0.txt>
+# <https://dev.perl.org/licenses/artistic.html>
+use strict;
+use Test::More;
+use PublicInbox::EmlContentFoo qw(parse_content_disposition);
+
+my %cd_tests = (
+ '' => { type => 'attachment', attributes => {} },
+ 'inline' => { type => 'inline', attributes => {} },
+ 'attachment' => { type => 'attachment', attributes => {} },
+
+ 'attachment; filename=genome.jpeg;' .
+ ' modification-date="Wed, 12 Feb 1997 16:29:51 -0500"' => {
+ type => 'attachment',
+ attributes => {
+ filename => 'genome.jpeg',
+ 'modification-date' => 'Wed, 12 Feb 1997 16:29:51 -0500'
+ }
+ },
+
+ q(attachment; filename*=UTF-8''genome.jpeg;) .
+ q( modification-date="Wed, 12 Feb 1997 16:29:51 -0500") => {
+ type => 'attachment',
+ attributes => {
+ filename => 'genome.jpeg',
+ 'modification-date' => 'Wed, 12 Feb 1997 16:29:51 -0500'
+ }
+ },
+
+ q(attachment; filename*0*=us-ascii'en'This%20is%20even%20more%20;) .
+ q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+ type => 'attachment',
+ attributes => {
+ filename => "This is even more ***fun*** isn't it!"
+ }
+ },
+
+ q(attachment; filename*0*='en'This%20is%20even%20more%20;) .
+ q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+ type => 'attachment',
+ attributes => {
+ filename => "This is even more ***fun*** isn't it!"
+ }
+ },
+
+ q(attachment; filename*0*=''This%20is%20even%20more%20;) .
+ q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+ type => 'attachment',
+ attributes => {
+ filename => "This is even more ***fun*** isn't it!"
+ }
+ },
+
+ q(attachment; filename*0*=us-ascii''This%20is%20even%20more%20;).
+ q( filename*1*=%2A%2A%2Afun%2A%2A%2A%20; filename*2="isn't it!") => {
+ type => 'attachment',
+ attributes => {
+ filename => "This is even more ***fun*** isn't it!"
+ }
+ },
+);
+
+my %non_strict_cd_tests = (
+ 'attachment; filename=genome.jpeg;' .
+ ' modification-date="Wed, 12 Feb 1997 16:29:51 -0500";' => {
+ type => 'attachment',
+ attributes => {
+ filename => 'genome.jpeg',
+ 'modification-date' =>
+ 'Wed, 12 Feb 1997 16:29:51 -0500'
+ }
+ },
+);
+
+sub test {
+ my ($string, $expect, $info) = @_;
+ local $_;
+ $info =~ s/\r/\\r/g;
+ $info =~ s/\n/\\n/g;
+ is_deeply(parse_content_disposition($string), $expect, $info);
+}
+
+for (sort keys %cd_tests) {
+ test($_, $cd_tests{$_}, "Can parse C-D <$_>");
+}
+
+local $PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
+for (sort keys %cd_tests) {
+ test($_, $cd_tests{$_}, "Can parse non-strict C-D <$_>");
+}
+for (sort keys %non_strict_cd_tests) {
+ test($_, $non_strict_cd_tests{$_}, "Can parse non-strict C-D <$_>");
+}
+
+done_testing;
diff --git a/t/eml_content_type.t b/t/eml_content_type.t
new file mode 100644
index 00000000..5fd7d1d9
--- /dev/null
+++ b/t/eml_content_type.t
@@ -0,0 +1,289 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# Copyright (C) 2004- Simon Cozens, Casey West, Ricardo SIGNES
+# This library is free software; you can redistribute it and/or modify
+# it under the same terms as Perl itself.
+#
+# License: GPL-1.0+ or Artistic-1.0-Perl
+# <https://www.gnu.org/licenses/gpl-1.0.txt>
+# <https://dev.perl.org/licenses/artistic.html>
+use strict;
+use Test::More;
+use PublicInbox::EmlContentFoo qw(parse_content_type);
+
+my %ct_tests = (
+ '' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "us-ascii" }
+ },
+
+ "text/plain" => {
+ type => "text",
+ subtype => "plain",
+ attributes => {}
+ },
+ 'text/plain; charset=us-ascii' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "us-ascii" }
+ },
+ 'text/plain; charset="us-ascii"' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "us-ascii" }
+ },
+ "text/plain; charset=us-ascii (Plain text)" => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "us-ascii" }
+ },
+
+ 'text/plain; charset=ISO-8859-1' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "ISO-8859-1" }
+ },
+ 'text/plain; charset="ISO-8859-1"' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "ISO-8859-1" }
+ },
+ 'text/plain; charset="ISO-8859-1" (comment)' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "ISO-8859-1" }
+ },
+
+ '(c) text/plain (c); (c) charset=ISO-8859-1 (c)' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "ISO-8859-1" }
+ },
+ '(c \( \\\\) (c) text/plain (c) (c) ; (c) (c) charset=utf-8 (c)' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "utf-8" }
+ },
+ 'text/plain; (c (nested ()c)another c)() charset=ISO-8859-1' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "ISO-8859-1" }
+ },
+ 'text/plain (c \(!nested ()c\)\)(nested\(c())); charset=utf-8' => {
+ type => "text",
+ subtype => "plain",
+ attributes => { charset => "utf-8" }
+ },
+
+ "application/foo" => {
+ type => "application",
+ subtype => "foo",
+ attributes => {}
+ },
+ "multipart/mixed; boundary=unique-boundary-1" => {
+ type => "multipart",
+ subtype => "mixed",
+ attributes => { boundary => "unique-boundary-1" }
+ },
+ 'message/external-body; access-type=local-file; name="/u/n/m.jpg"' => {
+ type => "message",
+ subtype => "external-body",
+ attributes => {
+ "access-type" => "local-file",
+ "name" => "/u/n/m.jpg"
+ }
+ },
+ 'multipart/mixed; boundary="----------=_1026452699-10321-0" ' => {
+ 'type' => 'multipart',
+ 'subtype' => 'mixed',
+ 'attributes' => {
+ 'boundary' => '----------=_1026452699-10321-0'
+ }
+ },
+ 'multipart/report; boundary= "=_0=73e476c3-cd5a-5ba3-b910-2="' => {
+ 'type' => 'multipart',
+ 'subtype' => 'report',
+ 'attributes' => {
+ 'boundary' => '=_0=73e476c3-cd5a-5ba3-b910-2='
+ }
+ },
+ 'multipart/report; boundary=' . " \t" . '"=_0=7-c-5-b-2="' => {
+ 'type' => 'multipart',
+ 'subtype' => 'report',
+ 'attributes' => {
+ 'boundary' => '=_0=7-c-5-b-2='
+ }
+ },
+
+ 'message/external-body; access-type=URL;' .
+ ' URL*0="ftp://";' .
+ ' URL*1="example.com/"' => {
+ 'type' => 'message',
+ 'subtype' => 'external-body',
+ 'attributes' => {
+ 'access-type' => 'URL',
+ 'url' => 'ftp://example.com/'
+ }
+ },
+ 'message/external-body; access-type=URL; URL="ftp://example.com/"' => {
+ 'type' => 'message',
+ 'subtype' => 'external-body',
+ 'attributes' => {
+ 'access-type' => 'URL',
+ 'url' => 'ftp://example.com/',
+ }
+ },
+
+ "application/x-stuff; title*=us-ascii'en-us'This%20is%20f%2Ad" => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => 'This is f*d'
+ }
+ },
+ "application/x-stuff; title*=us-ascii''This%20is%20f%2Ad" => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => 'This is f*d'
+ }
+ },
+ "application/x-stuff; title*=''This%20is%20f%2Ad" => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => 'This is f*d'
+ }
+ },
+ "application/x-stuff; title*='en-us'This%20is%20f%2Ad" => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => 'This is f*d'
+ }
+ },
+ q(application/x-stuff;) .
+ q( title*0*=us-ascii'en'This%20is%20even%20more%20;) .
+ q(title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => "This is even more ***fun*** isn't it!"
+ }
+ },
+ q(application/x-stuff;) .
+ q( title*0*='en'This%20is%20even%20more%20;) .
+ q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => "This is even more ***fun*** isn't it!"
+ }
+ },
+ q(application/x-stuff;) .
+ q( title*0*=''This%20is%20even%20more%20;) .
+ q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!") => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => "This is even more ***fun*** isn't it!"
+ }
+ },
+ q(application/x-stuff;).
+ q( title*0*=us-ascii''This%20is%20even%20more%20;).
+ q( title*1*=%2A%2A%2Afun%2A%2A%2A%20; title*2="isn't it!")
+ => {
+ 'type' => 'application',
+ 'subtype' => 'x-stuff',
+ 'attributes' => {
+ 'title' => "This is even more ***fun*** isn't it!"
+ }
+ },
+
+ 'text/plain; attribute="v\"v\\\\v\(v\>\<\)\@\,\;\:\/\]\[\?\=v v";' .
+ ' charset=us-ascii' => {
+ 'type' => 'text',
+ 'subtype' => 'plain',
+ 'attributes' => {
+ 'attribute' => 'v"v\\v(v><)@,;:/][?=v v',
+ 'charset' => 'us-ascii',
+ },
+ },
+
+ qq(text/plain;\r
+ charset=us-ascii;\r
+ attribute="\r value1 \r value2\r\n value3\r\n value4\r\n "\r\n ) => {
+ 'type' => 'text',
+ 'subtype' => 'plain',
+ 'attributes' => {
+ 'attribute' => ' value1 value2 value3 value4 ',
+ 'charset' => 'us-ascii',
+ },
+ },
+);
+
+my %non_strict_ct_tests = (
+ "text/plain;" => { type => "text", subtype => "plain", attributes => {} },
+ "text/plain; " =>
+ { type => "text", subtype => "plain", attributes => {} },
+ 'image/jpeg;' .
+ ' x-mac-type="3F3F3F3F";'.
+ ' x-mac-creator="3F3F3F3F" name="file name.jpg";' => {
+ type => "image",
+ subtype => "jpeg",
+ attributes => {
+ 'x-mac-type' => "3F3F3F3F",
+ 'x-mac-creator' => "3F3F3F3F",
+ 'name' => "file name.jpg"
+ }
+ },
+ "text/plain; key=very long value" => {
+ type => "text",
+ subtype => "plain",
+ attributes => { key => "very long value" }
+ },
+ "text/plain; key=very long value key2=value2" => {
+ type => "text",
+ subtype => "plain",
+ attributes => { key => "very long value", key2 => "value2" }
+ },
+ 'multipart/mixed; boundary = "--=_Next_Part_24_Nov_2016_08.09.21"' => {
+ type => "multipart",
+ subtype => "mixed",
+ attributes => {
+ boundary => "--=_Next_Part_24_Nov_2016_08.09.21"
+ }
+ },
+);
+
+sub test {
+ my ($string, $expect, $info) = @_;
+
+ # So stupid. -- rjbs, 2013-08-10
+ $expect->{discrete} = $expect->{type};
+ $expect->{composite} = $expect->{subtype};
+
+ local $_;
+ $info =~ s/\r/\\r/g;
+ $info =~ s/\n/\\n/g;
+ is_deeply(parse_content_type($string), $expect, $info);
+}
+
+for (sort keys %ct_tests) {
+ test($_, $ct_tests{$_}, "Can parse C-T <$_>");
+}
+
+local $PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
+for (sort keys %ct_tests) {
+ test($_, $ct_tests{$_}, "Can parse non-strict C-T <$_>");
+}
+for (sort keys %non_strict_ct_tests) {
+ test(
+ $_,
+ $non_strict_ct_tests{$_},
+ "Can parse non-strict C-T <$_>"
+ );
+}
+
+done_testing;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 09/13] EmlContentFoo: relax Encode version requirement
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (7 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
` (3 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
We want to support Perl v5.10.1 out-of-the-box with minimal
download/installation time. Installing Encode from CPAN
requires a compiler and lengthy build+install time.
So mimic find_mime_encoding() using what Perl v5.10.1 provides
out-of-the box.
---
Makefile.PL | 2 +-
lib/PublicInbox/EmlContentFoo.pm | 27 +++++++++++++++++++++++++--
2 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/Makefile.PL b/Makefile.PL
index 27bb112c..59345edb 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -130,7 +130,7 @@ WriteMakefile(
# libperl$PERL_VERSION or libencode-perl on Debian,
# `perl5' on FreeBSD
- 'Encode' => 0,
+ 'Encode' => 2.35, # 2.35 shipped with 5.10.1
# libperl$PERL_VERSION + perl-modules-$PERL_VERSION
'Compress::Raw::Zlib' => 0,
diff --git a/lib/PublicInbox/EmlContentFoo.pm b/lib/PublicInbox/EmlContentFoo.pm
index f507d548..7472f8d2 100644
--- a/lib/PublicInbox/EmlContentFoo.pm
+++ b/lib/PublicInbox/EmlContentFoo.pm
@@ -9,15 +9,38 @@
#
# This license differs from the rest of public-inbox
#
+# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+#
# This is a fork of the Email::MIME::ContentType 1.022 with
# minor improvements and incompatibilities; namely changes to
# quiet warnings with legacy data.
package PublicInbox::EmlContentFoo;
use strict;
use parent qw(Exporter);
-# ABSTRACT: Parse a MIME Content-Type or Content-Disposition Header
+use v5.10.1;
+
+# find_mime_encoding() only appeared in Encode 2.87+ (Perl 5.26+),
+# while we support 2.35 shipped with Perl 5.10.1
+use Encode 2.35 qw(find_encoding);
+my %mime_name_map; # $enc->mime_name => $enc object
+BEGIN {
+ eval { Encode->import('find_mime_encoding') };
+ if ($@) {
+ *find_mime_encoding = sub { $mime_name_map{lc($_[0])} };
+ %mime_name_map = map {;
+ my $enc = find_encoding($_);
+ my $m = lc($enc->mime_name // '');
+ $m => $enc;
+ } Encode->encodings(':all');
+
+ # delete fallback for encodings w/o ->mime_name:
+ delete $mime_name_map{''};
+
+ # an extra alias see Encode::MIME::NAME
+ $mime_name_map{'utf8'} = find_encoding('UTF-8');
+ }
+}
-use Encode 2.87 qw(find_mime_encoding);
our @EXPORT_OK = qw(parse_content_type parse_content_disposition);
our $STRICT_PARAMS = 1;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (8 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
` (2 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
Since Email::MIME usage is going away, Email::MIME::Encodings
might as well go away, too. We can also use fewer branches
and just rely on hash lookups, unlike E::M::E.
---
lib/PublicInbox/Eml.pm | 47 ++++++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 13 deletions(-)
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 1988bdb3..1adaff04 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -30,18 +30,24 @@ use v5.10.1;
use Carp qw(croak);
use Encode qw(find_encoding decode encode); # stdlib
use Text::Wrap qw(wrap); # stdlib, we need Perl 5.6+ for $huge
+use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2
+use MIME::QuotedPrint 3.05; # ditto
my $MIME_Header = find_encoding('MIME-Header');
use PublicInbox::EmlContentFoo qw(parse_content_type parse_content_disposition);
-use Email::MIME::Encodings;
$PublicInbox::EmlContentFoo::STRICT_PARAMS = 0;
our $MAXPARTS = 1000; # same as SpamAssassin
our $MAXDEPTH = 20; # seems enough, Perl sucks, here
our $MAXBOUNDLEN = 2048; # same as postfix
-my $NO_ENCODE_RE = qr/\A(?:7bit|8bit|binary)[ \t]*(?:;|$)?/i;
+my %MIME_ENC = (qp => \&enc_qp, base64 => \&encode_base64);
+my %MIME_DEC = (qp => \&dec_qp, base64 => \&decode_base64);
+$MIME_ENC{quotedprint} = $MIME_ENC{'quoted-printable'} = $MIME_ENC{qp};
+$MIME_DEC{quotedprint} = $MIME_DEC{'quoted-printable'} = $MIME_DEC{qp};
+$MIME_ENC{$_} = \&identity_codec for qw(7bit 8bit binary);
+
my %DECODE_ADDRESS = map { $_ => 1 } qw(From To Cc Sender Reply-To);
my %DECODE_FULL = (
Subject => 1,
@@ -111,13 +117,6 @@ sub ct ($) {
$_[0]->{ct} //= parse_content_type(header($_[0], 'Content-Type'));
}
-sub body_decode ($$) {
- my $cte = header_raw($_[0], 'Content-Transfer-Encoding');
- ($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) if $cte; # For S/MIME, etc
- (!$cte || $cte =~ $NO_ENCODE_RE) ?
- $_[1] : Email::MIME::Encodings::decode($cte, $_[1], '7bit');
-}
-
# returns a queue of sub-parts iff it's worth descending into
# TODO: descend into message/rfc822 parts (Email::MIME didn't)
sub mp_descend ($$) {
@@ -197,6 +196,22 @@ sub each_part {
}
}
+sub enc_qp {
+ # prevent MIME::QuotedPrint from encoding CR as =0D since it's
+ # against RFCs and breaks MUAs
+ $_[0] =~ s/\r\n/\n/sg;
+ encode_qp($_[0], "\r\n");
+}
+
+sub dec_qp {
+ # RFC 2822 requires all lines to end in CRLF, though... :<
+ $_[0] = decode_qp($_[0]);
+ $_[0] =~ s/\n/\r\n/sg;
+ $_[0]
+}
+
+sub identity_codec { $_[0] }
+
########### compatibility section for existing Email::MIME uses #########
sub header_obj {
@@ -240,9 +255,9 @@ EOF
sub body_set {
my ($self, $body) = @_;
my $bdy = $self->{bdy} = ref($body) ? $body : \$body;
- my $cte = header_raw($self, 'Content-Transfer-Encoding');
- if ($cte && $cte !~ $NO_ENCODE_RE) {
- $$bdy = Email::MIME::Encodings::encode($cte, $$bdy)
+ if (my $cte = header_raw($self, 'Content-Transfer-Encoding')) {
+ my $enc = $MIME_ENC{lc($cte)} or croak("can't encode `$cte'");
+ $$bdy = $enc->($$bdy); # in-place
}
undef;
}
@@ -351,7 +366,13 @@ sub header_str {
sub body_raw { ${$_[0]->{bdy} // \''}; }
-sub body { body_decode($_[0], body_raw($_[0])) }
+sub body {
+ my $raw = body_raw($_[0]);
+ my $cte = header_raw($_[0], 'Content-Transfer-Encoding') or return $raw;
+ ($cte) = ($cte =~ /([a-zA-Z0-9\-]+)/) or return $raw; # For S/MIME, etc
+ my $dec = $MIME_DEC{lc($cte)} or return $raw;
+ $dec->($raw);
+}
sub body_str {
my ($self) = @_;
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 11/13] xt: eml comparison tests
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (9 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-08 4:47 ` Eric Wong
2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong
12 siblings, 1 reply; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
While our codebase can still work with either MIME
implementation, add comparison tests to ensure we
handle corner cases in existing archives.
---
MANIFEST | 2 +
xt/cmp-msgstr.t | 108 +++++++++++++++++++++++++++++++++++++++++++++++
xt/cmp-msgview.t | 95 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 205 insertions(+)
create mode 100644 xt/cmp-msgstr.t
create mode 100644 xt/cmp-msgview.t
diff --git a/MANIFEST b/MANIFEST
index 055c8c9a..9c804a07 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -331,6 +331,8 @@ t/www_listing.t
t/www_static.t
t/x-unknown-alpine.eml
t/xcpdb-reshard.t
+xt/cmp-msgstr.t
+xt/cmp-msgview.t
xt/git-http-backend.t
xt/git_async_cmp.t
xt/mem-msgview.t
diff --git a/xt/cmp-msgstr.t b/xt/cmp-msgstr.t
new file mode 100644
index 00000000..6bae0f66
--- /dev/null
+++ b/xt/cmp-msgstr.t
@@ -0,0 +1,108 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+use PublicInbox::View;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use Digest::MD5;
+use PublicInbox::MsgIter;
+require_mods(qw(Data::Dumper Email::MIME));
+Data::Dumper->import('Dumper');
+require PublicInbox::MIME;
+require_git(2.19);
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my @cat = qw(cat-file --buffer --batch-check --batch-all-objects --unordered);
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'cmp' });
+my $git = $ibx->git;
+my $fh = $git->popen(@cat);
+vec(my $vec = '', fileno($fh), 1) = 1;
+select($vec, undef, undef, 60) or die "timed out waiting for --batch-check";
+my $n = 0;
+my $m = 0;
+my $dig_cls = 'Digest::MD5';
+sub h ($) {
+ s/\s+\z//s; # E::M leaves trailing white space
+ s/\s+/ /sg;
+ "$_[0]: $_";
+}
+
+my $cmp = sub {
+ my ($p, $cmp_arg) = @_;
+ my $part = shift @$p;
+ push @$cmp_arg, '---'.join(', ', @$p).'---';
+ my $ct = $part->content_type // 'text/plain';
+ $ct =~ s/[ \t]+.*\z//s;
+ my ($s, $err);
+ eval {
+ push @$cmp_arg, map { h 'f' } $part->header('From');
+ push @$cmp_arg, map { h 't' } $part->header('To');
+ push @$cmp_arg, map { h 'cc' } $part->header('Cc');
+ push @$cmp_arg, map { h 'mid' } $part->header('Message-ID');
+ push @$cmp_arg, map { h 'refs' } $part->header('References');
+ push @$cmp_arg, map { h 'irt' } $part->header('In-Reply-To');
+ push @$cmp_arg, map { h 's' } $part->header('Subject');
+ push @$cmp_arg, map { h 'cd' }
+ $part->header('Content-Description');
+ ($s, $err) = msg_part_text($part, $ct);
+ if (defined $s) {
+ $s =~ s/\s+\z//s;
+ push @$cmp_arg, "S: ".$s;
+ } else {
+ $part = $part->body;
+ push @$cmp_arg, "T: $ct";
+ if ($part =~ /[^\p{XPosixPrint}\s]/s) { # binary
+ my $dig = $dig_cls->new;
+ $dig->add($part);
+ push @$cmp_arg, "M: ".$dig->hexdigest;
+ push @$cmp_arg, "B: ".bytes::length($part);
+ } else {
+ $part =~ s/\s+\z//s;
+ push @$cmp_arg, "X: ".$part;
+ }
+ }
+ };
+ if ($@) {
+ $err //= '';
+ push @$cmp_arg, "E: $@ ($err)";
+ }
+};
+
+my $ndiff = 0;
+my $git_cb = sub {
+ my ($bref, $oid) = @_;
+ local $SIG{__WARN__} = sub { diag "$inboxdir $oid ", @_ };
+ ++$m;
+ PublicInbox::MIME->new($$bref)->each_part($cmp, my $m_ctx = [], 1);
+ PublicInbox::Eml->new($$bref)->each_part($cmp, my $e_ctx = [], 1);
+ if (join("\0", @$e_ctx) ne join("\0", @$m_ctx)) {
+ ++$ndiff;
+ open my $fh, '>', "$tmpdir/mime" or die $!;
+ print $fh Dumper($m_ctx) or die $!;
+ close $fh or die $!;
+ open $fh, '>', "$tmpdir/eml" or die $!;
+ print $fh Dumper($e_ctx) or die $!;
+ close $fh or die $!;
+ diag "$inboxdir $oid differ";
+ # using `git diff', diff(1) may not be installed
+ diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]);
+ }
+};
+$git->cat_async_begin;
+my $t = timeit(1, sub {
+ while (<$fh>) {
+ my ($oid, $type) = split / /;
+ next if $type ne 'blob';
+ ++$n;
+ $git->cat_async($oid, $git_cb);
+ }
+ $git->cat_async_wait;
+});
+is($m, $n, "$inboxdir rendered all $m <=> $n messages");
+is($ndiff, 0, "$inboxdir $ndiff differences");
+done_testing();
diff --git a/xt/cmp-msgview.t b/xt/cmp-msgview.t
new file mode 100644
index 00000000..66fb467e
--- /dev/null
+++ b/xt/cmp-msgview.t
@@ -0,0 +1,95 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use Benchmark qw(:all);
+use PublicInbox::Inbox;
+use PublicInbox::View;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use Digest::MD5;
+require_git(2.19);
+require_mods qw(Data::Dumper Email::MIME Plack::Util);
+Data::Dumper->import('Dumper');
+require PublicInbox::MIME;
+my ($tmpdir, $for_destroy) = tmpdir();
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my @cat = qw(cat-file --buffer --batch-check --batch-all-objects --unordered);
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'perf' });
+my $git = $ibx->git;
+my $fh = $git->popen(@cat);
+vec(my $vec = '', fileno($fh), 1) = 1;
+select($vec, undef, undef, 60) or die "timed out waiting for --batch-check";
+my $mime_ctx = {
+ env => { HTTP_HOST => 'example.com', 'psgi.url_scheme' => 'https' },
+ -inbox => $ibx,
+ www => Plack::Util::inline_object(style => sub {''}),
+ obuf => \(my $mime_buf = ''),
+ mhref => '../',
+};
+my $eml_ctx = { %$mime_ctx, obuf => \(my $eml_buf = '') };
+my $n = 0;
+my $m = 0;
+my $ndiff_html = 0;
+my $dig_cls = 'Digest::MD5';
+my $digest_attach = sub { # ensure ->body (not ->body_raw) matches
+ my ($p, $cmp_arg) = @_;
+ my $part = shift @$p;
+ my $dig = $cmp_arg->[0] //= $dig_cls->new;
+ $dig->add($part->body_raw);
+ push @$cmp_arg, join(', ', @$p);
+};
+
+my $git_cb = sub {
+ my ($bref, $oid) = @_;
+ local $SIG{__WARN__} = sub { diag "$inboxdir $oid ", @_ };
+ ++$m;
+ my $mime = PublicInbox::MIME->new($$bref);
+ PublicInbox::View::multipart_text_as_html($mime, $mime_ctx);
+ my $eml = PublicInbox::Eml->new($$bref);
+ PublicInbox::View::multipart_text_as_html($eml, $eml_ctx);
+ if ($eml_buf ne $mime_buf) {
+ ++$ndiff_html;
+ open my $fh, '>', "$tmpdir/mime" or die $!;
+ print $fh $mime_buf or die $!;
+ close $fh or die $!;
+ open $fh, '>', "$tmpdir/eml" or die $!;
+ print $fh $eml_buf or die $!;
+ close $fh or die $!;
+ # using `git diff', diff(1) may not be installed
+ diag "$inboxdir $oid differs";
+ diag xqx([qw(git diff), "$tmpdir/mime", "$tmpdir/eml"]);
+ }
+ $eml_buf = $mime_buf = '';
+
+ # don't tolerate differences in attachment downloads
+ $mime = PublicInbox::MIME->new($$bref);
+ $mime->each_part($digest_attach, my $mime_cmp = [], 1);
+ $eml = PublicInbox::Eml->new($$bref);
+ $eml->each_part($digest_attach, my $eml_cmp = [], 1);
+ $mime_cmp->[0] = $mime_cmp->[0]->hexdigest;
+ $eml_cmp->[0] = $eml_cmp->[0]->hexdigest;
+ # don't have millions of "ok" lines
+ if (join("\0", @$eml_cmp) ne join("\0", @$mime_cmp)) {
+ diag Dumper([ $oid, eml => $eml_cmp, mime =>$mime_cmp ]);
+ is_deeply($eml_cmp, $mime_cmp, "$inboxdir $oid match");
+ }
+};
+$git->cat_async_begin;
+my $t = timeit(1, sub {
+ while (<$fh>) {
+ my ($oid, $type) = split / /;
+ next if $type ne 'blob';
+ ++$n;
+ $git->cat_async($oid, $git_cb);
+ }
+ $git->cat_async_wait;
+});
+is($m, $n, 'rendered all messages');
+
+# we'll tolerate minor differences in HTML rendering
+diag "$ndiff_html HTML differences";
+
+done_testing();
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 12/13] remove most internal Email::MIME usage
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (10 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
We no longer load or use Email::MIME outside of comparison
tests.
---
INSTALL | 26 +++++------
Makefile.PL | 5 ---
ci/deps.perl | 3 --
lib/PublicInbox/Import.pm | 8 ++--
lib/PublicInbox/MIME.pm | 3 ++
lib/PublicInbox/MsgTime.pm | 8 ++--
lib/PublicInbox/TestCommon.pm | 3 +-
t/altid.t | 4 +-
t/altid_v2.t | 4 +-
t/cgi.t | 8 ++--
t/content_id.t | 6 +--
t/convert-compact.t | 4 +-
t/edit.t | 20 ++++-----
t/feed.t | 6 +--
t/filter_base.t | 4 +-
t/filter_mirror.t | 2 +-
t/filter_subjecttag.t | 4 +-
t/filter_vger.t | 6 +--
t/html_index.t | 4 +-
t/httpd.t | 4 +-
t/import.t | 4 +-
t/indexlevels-mirror.t | 4 +-
t/mda.t | 4 +-
t/mda_filter_rubylang.t | 2 +-
t/mid.t | 4 +-
t/mime.t | 83 +++++++++++++++++++----------------
t/msg_iter.t | 8 ++--
t/msgtime.t | 6 +--
t/multi-mid.t | 6 +--
t/nntp.t | 4 +-
t/nntpd-tls.t | 4 +-
t/nntpd.t | 6 +--
t/nulsubject.t | 2 +-
t/plack.t | 10 ++---
t/precheck.t | 10 ++---
t/psgi_attach.t | 2 +-
t/psgi_bad_mids.t | 4 +-
t/psgi_mount.t | 4 +-
t/psgi_multipart_not.t | 4 +-
t/psgi_scan_all.t | 4 +-
t/psgi_search.t | 8 ++--
t/psgi_text.t | 2 +-
t/psgi_v2.t | 6 +--
t/purge.t | 2 +-
t/replace.t | 12 ++---
t/reply.t | 4 +-
t/search-thr-index.t | 6 +--
t/search.t | 26 +++++------
t/solver_git.t | 4 +-
t/spamcheck_spamc.t | 8 ++--
t/thread-cycle.t | 3 +-
t/time.t | 4 +-
t/v1-add-remove-add.t | 4 +-
t/v1reindex.t | 4 +-
t/v2-add-remove-add.t | 4 +-
t/v2mda.t | 4 +-
t/v2mirror.t | 4 +-
t/v2reindex.t | 8 ++--
t/v2writable.t | 8 ++--
t/watch_filter_rubylang.t | 2 +-
t/watch_maildir.t | 2 +-
t/watch_maildir_v2.t | 2 +-
t/www_altid.t | 2 +-
t/xcpdb-reshard.t | 4 +-
xt/msgtime_cmp.t | 12 ++---
xt/perf-msgview.t | 2 +-
66 files changed, 228 insertions(+), 226 deletions(-)
diff --git a/INSTALL b/INSTALL
index 2dd7dcff..80cee753 100644
--- a/INSTALL
+++ b/INSTALL
@@ -36,15 +36,18 @@ Beyond that, there is a long list of Perl modules required, starting with:
* Digest::SHA typically installed with Perl
rpm: perl-Digest-SHA
-* Email::MIME deb: libemail-mime-perl
- pkg: p5-Email-MIME
- rpm: perl-Email-MIME
-
* URI::Escape deb: liburi-perl
pkg: p5-URI
rpm: perl-URI
(for HTML/Atom generation)
+Email::MIME will be optional as of public-inbox v1.5.0,
+it may still be used in maintainer comparison tests:
+
+* Email::MIME deb: libemail-mime-perl
+ pkg: p5-Email-MIME
+ rpm: perl-Email-MIME
+
Plack and Date::Parse are optional as of public-inbox v1.3.0,
but required for older releases:
@@ -86,6 +89,11 @@ Numerous optional modules are likely to be useful as well:
(speeds up process spawning on Linux,
see public-inbox-daemon(8))
+- Email::Address::XS deb: libemail-address-xs-perl
+ pkg: pkg-Email-Address-XS
+ (correct parsing of tricky email
+ addresses, phrases and comments)
+
- Plack::Middleware::ReverseProxy deb: libplack-middleware-reverseproxy-perl
pkg: p5-Plack-Middleware-ReverseProxy
rpm: perl-Plack-Middleware-ReverseProxy
@@ -108,16 +116,6 @@ Numerous optional modules are likely to be useful as well:
The following modules are typically pulled in by dependencies listed
above, so there is no need to explicitly install them:
-- Email::MIME::ContentType deb: libemail-mime-contenttype-perl
- pkg: p5-Email-MIME-ContentType
- rpm: perl-Email-MIME-ContentType
- (pulled in by Email::MIME)
-
-- Email::Simple deb: libemail-simple-perl
- pkg: p5-Email-Simple
- rpm: perl-Email-Simple
- (pulled in by Email::MIME)
-
* Encode deb: libperl5.$MINOR (or libencode-perl)
pkg: perl5
rpm: perl-Encode
diff --git a/Makefile.PL b/Makefile.PL
index 59345edb..efbb59cb 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -122,11 +122,6 @@ WriteMakefile(
# `perl5' on FreeBSD
# perl-Digest-SHA on RH-based
'Digest::SHA' => 0,
- 'Email::MIME' => 0,
-
- # the following should be pulled in by Email::MIME:
- 'Email::MIME::ContentType' => 0,
- 'Email::Simple' => 0,
# libperl$PERL_VERSION or libencode-perl on Debian,
# `perl5' on FreeBSD
diff --git a/ci/deps.perl b/ci/deps.perl
index 06b4fbe0..48aaa9e4 100755
--- a/ci/deps.perl
+++ b/ci/deps.perl
@@ -20,9 +20,6 @@ my $profiles = {
perl
Devel::Peek
Digest::SHA
- Email::Simple
- Email::MIME
- Email::MIME::ContentType
Encode
ExtUtils::MakeMaker
IO::Compress::Gzip
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 98aa7785..07d18599 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -213,13 +213,13 @@ sub get_mark {
}
# returns undef on non-existent
-# ('MISMATCH', Email::MIME) on mismatch
-# (:MARK, Email::MIME) on success
+# ('MISMATCH', PublicInbox::Eml) on mismatch
+# (:MARK, PublicInbox::Eml) on success
#
# v2 callers should check with Xapian before calling this as
# it is not idempotent.
sub remove {
- my ($self, $mime, $msg) = @_; # mime = Email::MIME
+ my ($self, $mime, $msg) = @_; # mime = PublicInbox::Eml or Email::MIME
my $path_type = $self->{path_type};
my ($path, $err, $cur, $blob);
@@ -375,7 +375,7 @@ sub clean_tree_v2 ($$$) {
# returns undef on duplicate
# returns the :MARK of the most recent commit
sub add {
- my ($self, $mime, $check_cb, $smsg) = @_; # mime = Email::MIME
+ my ($self, $mime, $check_cb, $smsg) = @_;
my ($name, $email, $at, $ct, $subject) = extract_cmt_info($mime, $smsg);
my $path_type = $self->{path_type};
diff --git a/lib/PublicInbox/MIME.pm b/lib/PublicInbox/MIME.pm
index b795b93b..9077386a 100644
--- a/lib/PublicInbox/MIME.pm
+++ b/lib/PublicInbox/MIME.pm
@@ -3,6 +3,9 @@
#
# The license for this file differs from the rest of public-inbox.
#
+# We no longer load this in any of our code outside of maintainer
+# tests for compatibility.
+#
# It monkey patches the "parts_multipart" subroutine with patches
# from Matthew Horsfall <wolfsage@gmail.com> at:
#
diff --git a/lib/PublicInbox/MsgTime.pm b/lib/PublicInbox/MsgTime.pm
index 920e8f8a..8596f01c 100644
--- a/lib/PublicInbox/MsgTime.pm
+++ b/lib/PublicInbox/MsgTime.pm
@@ -138,7 +138,7 @@ sub time_response ($) {
}
sub msg_received_at ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my @recvd = $hdr->header_raw('Received');
my ($ts);
foreach my $r (@recvd) {
@@ -153,7 +153,7 @@ sub msg_received_at ($) {
}
sub msg_date_only ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my @date = $hdr->header_raw('Date');
my ($ts);
foreach my $d (@date) {
@@ -168,7 +168,7 @@ sub msg_date_only ($) {
# Favors Received header for sorting globally
sub msg_timestamp ($;$) {
- my ($hdr, $fallback) = @_; # Email::MIME::Header
+ my ($hdr, $fallback) = @_; # PublicInbox::Eml
my $ret;
$ret = msg_received_at($hdr) and return time_response($ret);
$ret = msg_date_only($hdr) and return time_response($ret);
@@ -177,7 +177,7 @@ sub msg_timestamp ($;$) {
# Favors the Date: header for display and sorting within a thread
sub msg_datestamp ($;$) {
- my ($hdr, $fallback) = @_; # Email::MIME::Header
+ my ($hdr, $fallback) = @_; # PublicInbox::Eml
my $ret;
$ret = msg_date_only($hdr) and return time_response($ret);
$ret = msg_received_at($hdr) and return time_response($ret);
diff --git a/lib/PublicInbox/TestCommon.pm b/lib/PublicInbox/TestCommon.pm
index 978c3cd7..d952ee6d 100644
--- a/lib/PublicInbox/TestCommon.pm
+++ b/lib/PublicInbox/TestCommon.pm
@@ -8,7 +8,6 @@ use parent qw(Exporter);
use Fcntl qw(FD_CLOEXEC F_SETFD F_GETFD :seek);
use POSIX qw(dup2);
use IO::Socket::INET;
-use PublicInbox::MIME; # temporary
our @EXPORT = qw(tmpdir tcp_server tcp_connect require_git require_mods
run_script start_script key2sub xsys xqx mime_load eml_load);
@@ -23,7 +22,7 @@ sub mime_load ($) {
sub eml_load ($) {
my ($path, $cb) = @_;
open(my $fh, '<', $path) or die "open $path: $!";
- binmode $fh;
+ require PublicInbox::Eml;
PublicInbox::Eml->new(\(do { local $/; <$fh> }));
}
diff --git a/t/altid.t b/t/altid.t
index c7a3601a..670a3963 100644
--- a/t/altid.t
+++ b/t/altid.t
@@ -4,7 +4,7 @@ use strict;
use warnings;
use Test::More;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_mods(qw(DBD::SQLite Search::Xapian));
use_ok 'PublicInbox::Msgmap';
use_ok 'PublicInbox::SearchIdx';
@@ -27,7 +27,7 @@ my $ibx;
my $git = PublicInbox::Git->new($git_dir);
my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
$im->init_bare;
- $im->add(PublicInbox::MIME->new(<<'EOF'));
+ $im->add(PublicInbox::Eml->new(<<'EOF'));
From: a@example.com
To: b@example.com
Subject: boo!
diff --git a/t/altid_v2.t b/t/altid_v2.t
index 3ac294f0..28a047d9 100644
--- a/t/altid_v2.t
+++ b/t/altid_v2.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
require_git(2.6);
require_mods(qw(DBD::SQLite Search::Xapian));
@@ -31,7 +31,7 @@ my $ibx = {
};
$ibx = PublicInbox::Inbox->new($ibx);
my $v2w = PublicInbox::V2Writable->new($ibx, 1);
-$v2w->add(PublicInbox::MIME->new(<<'EOF'));
+$v2w->add(PublicInbox::Eml->new(<<'EOF'));
From: a@example.com
To: b@example.com
Subject: boo!
diff --git a/t/cgi.t b/t/cgi.t
index 42a343d3..d1f97150 100644
--- a/t/cgi.t
+++ b/t/cgi.t
@@ -5,7 +5,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
use PublicInbox::Import;
require_mods(qw(Plack::Handler::CGI Plack::Util));
@@ -45,7 +45,7 @@ my $im = PublicInbox::InboxWritable->new($ibx)->importer;
local $ENV{HOME} = $home;
# inject some messages:
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: Me <me\@example.com>
To: You <you\@example.com>
Cc: $addr
@@ -62,7 +62,7 @@ EOF
ok($im->add($mime), 'added big message');
# deliver a reply, too
- $mime = PublicInbox::MIME->new(<<EOF);
+ $mime = PublicInbox::Eml->new(<<EOF);
From: You <you\@example.com>
To: Me <me\@example.com>
Cc: $addr
@@ -79,7 +79,7 @@ EOF
ok($im->add($mime), 'added reply');
my $slashy_mid = 'slashy/asdf@example.com';
- my $slashy = PublicInbox::MIME->new(<<EOF);
+ my $slashy = PublicInbox::Eml->new(<<EOF);
From: You <you\@example.com>
To: Me <me\@example.com>
Cc: $addr
diff --git a/t/content_id.t b/t/content_id.t
index 0325164d..9df81aa8 100644
--- a/t/content_id.t
+++ b/t/content_id.t
@@ -4,9 +4,9 @@ use strict;
use warnings;
use Test::More;
use PublicInbox::ContentId qw(content_id);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: b@example.com
Subject: this is a subject
@@ -17,7 +17,7 @@ hello world
EOF
my $orig = content_id($mime);
-my $reload = content_id(PublicInbox::MIME->new($mime->as_string));
+my $reload = content_id(PublicInbox::Eml->new($mime->as_string));
is($orig, $reload, 'content_id matches after serialization');
foreach my $h (qw(From To Cc)) {
diff --git a/t/convert-compact.t b/t/convert-compact.t
index 1627e019..80efc19c 100644
--- a/t/convert-compact.t
+++ b/t/convert-compact.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Spawn qw(which);
use PublicInbox::TestCommon;
require_git(2.6);
@@ -26,7 +26,7 @@ ok(PublicInbox::Import::run_die([qw(git) , "--git-dir=$ibx->{inboxdir}",
qw(config core.sharedRepository 0644)]), 'set sharedRepository');
$ibx = PublicInbox::Inbox->new($ibx);
my $im = PublicInbox::Import->new($ibx->git, undef, undef, $ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: b@example.com
Subject: this is a subject
diff --git a/t/edit.t b/t/edit.t
index d8833f9c..1a5698f6 100644
--- a/t/edit.t
+++ b/t/edit.t
@@ -28,7 +28,7 @@ my $file = 't/data/0001.patch';
open my $fh, '<', $file or die "open: $!";
my $raw = do { local $/; <$fh> };
my $im = $ibx->importer(0);
-my $mime = PublicInbox::MIME->new($raw);
+my $mime = PublicInbox::Eml->new($raw);
my $mid = mid_clean($mime->header('Message-Id'));
ok($im->add($mime), 'add message to be edited');
$im->done;
@@ -41,7 +41,7 @@ $t = '-F FILE'; {
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/boolean prefix/bool pfx/'";
$cmd = [ '-edit', "-F$file", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t edit OK");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
}
@@ -51,7 +51,7 @@ $t = '-m MESSAGE_ID'; {
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/bool pfx/boolean prefix/'";
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t edit OK");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->header('Subject'), qr/boolean prefix/, "$t message edited");
like($out, qr/[a-f0-9]{40}/, "$t shows commit on success");
}
@@ -63,7 +63,7 @@ $t = 'no-op -m MESSAGE_ID'; {
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds");
my $prev = $cur;
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
is_deeply($cur, $prev, "$t makes no change");
like($cur->header('Subject'), qr/boolean prefix/,
"$t does not change message");
@@ -79,7 +79,7 @@ $t = 'no-op -m MESSAGE_ID w/Status: header'; { # because mutt does it
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds");
my $prev = $cur;
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
is_deeply($cur, $prev, "$t makes no change");
like($cur->header('Subject'), qr/boolean prefix/,
"$t does not change message");
@@ -94,7 +94,7 @@ $t = '-m MESSAGE_ID can change Received: headers'; {
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^Subject:.*/Received: x\\n\$&/'";
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->header('Subject'), qr/boolean prefix/,
"$t does not change Subject");
is($cur->header('Received'), 'x', 'added Received header');
@@ -127,7 +127,7 @@ $t = 'mailEditor set in config'; {
local $ENV{GIT_EDITOR} = 'echo should not run';
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t edited message");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->header('Subject'), qr/bool pfx/, "$t message edited");
unlike($out, qr/should not run/, 'did not run GIT_EDITOR');
}
@@ -137,20 +137,20 @@ $t = '--raw and mbox escaping'; {
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^\$/\\nFrom not mbox\\n/'";
$cmd = [ '-edit', "-m$mid", '--raw', $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->body, qr/^From not mbox/sm, 'put "From " line into body');
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^>From not/\$& an/'";
$cmd = [ '-edit', "-m$mid", $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds with mbox escaping");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
like($cur->body, qr/^From not an mbox/sm,
'changed "From " line unescaped');
local $ENV{MAIL_EDITOR} = "$^X -i -p -e 's/^From not an mbox\\n//s'";
$cmd = [ '-edit', "-m$mid", '--raw', $inboxdir ];
ok(run_script($cmd, undef, $opt), "$t succeeds again");
- $cur = PublicInbox::MIME->new($ibx->msg_by_mid($mid));
+ $cur = PublicInbox::Eml->new($ibx->msg_by_mid($mid));
unlike($cur->body, qr/^From not an mbox/sm, "$t restored body");
}
diff --git a/t/feed.t b/t/feed.t
index 373a1de8..5ad90a07 100644
--- a/t/feed.t
+++ b/t/feed.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Feed;
use PublicInbox::Import;
use PublicInbox::Inbox;
@@ -36,7 +36,7 @@ my $im = PublicInbox::Import->new($git, $ibx->{name}, 'test@example');
{
$im->init_bare;
foreach my $i (1..6) {
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: ME <me\@example.com>
To: U <u\@example.com>
Message-Id: <$i\@example.com>
@@ -95,7 +95,7 @@ EOF
# add a new spam message
my $spam;
{
- $spam = PublicInbox::MIME->new(<<EOF);
+ $spam = PublicInbox::Eml->new(<<EOF);
From: SPAMMER <spammer\@example.com>
To: U <u\@example.com>
Message-Id: <this-is-spam\@example.com>
diff --git a/t/filter_base.t b/t/filter_base.t
index bbd64189..47d0220f 100644
--- a/t/filter_base.t
+++ b/t/filter_base.t
@@ -21,13 +21,13 @@ use_ok 'PublicInbox::Filter::Base';
{
my $f = PublicInbox::Filter::Base->new;
- my $email = mime_load 't/filter_base-xhtml.eml';
+ my $email = eml_load 't/filter_base-xhtml.eml';
is($f->delivery($email), 100, "xhtml rejected");
}
{
my $f = PublicInbox::Filter::Base->new;
- my $email = mime_load 't/filter_base-junk.eml';
+ my $email = eml_load 't/filter_base-junk.eml';
is($f->delivery($email), 100, 'proprietary format rejected on glob');
}
diff --git a/t/filter_mirror.t b/t/filter_mirror.t
index 0e641a03..5bc7f3f4 100644
--- a/t/filter_mirror.t
+++ b/t/filter_mirror.t
@@ -9,7 +9,7 @@ use_ok 'PublicInbox::Filter::Mirror';
my $f = PublicInbox::Filter::Mirror->new;
ok($f, 'created PublicInbox::Filter::Mirror object');
{
- my $email = mime_load 't/mda-mime.eml';
+ my $email = eml_load 't/mda-mime.eml';
is($f->ACCEPT, $f->delivery($email), 'accept any trash that comes');
}
diff --git a/t/filter_subjecttag.t b/t/filter_subjecttag.t
index 9b397b8c..75effa27 100644
--- a/t/filter_subjecttag.t
+++ b/t/filter_subjecttag.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use_ok 'PublicInbox::Filter::SubjectTag';
my $f = eval { PublicInbox::Filter::SubjectTag->new };
@@ -11,7 +11,7 @@ like($@, qr/tag not defined/, 'error without args');
$f = PublicInbox::Filter::SubjectTag->new('-tag', '[foo]');
is(ref $f, 'PublicInbox::Filter::SubjectTag', 'new object created');
-my $mime = PublicInbox::MIME->new(<<EOF);
+my $mime = PublicInbox::Eml->new(<<EOF);
To: you <you\@example.com>
Subject: =?UTF-8?B?UmU6IFtmb29dIEVsw4PCqWFub3I=?=
diff --git a/t/filter_vger.t b/t/filter_vger.t
index 9d71f16d..ca5a6ca7 100644
--- a/t/filter_vger.t
+++ b/t/filter_vger.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use_ok 'PublicInbox::Filter::Vger';
my $f = PublicInbox::Filter::Vger->new;
@@ -21,7 +21,7 @@ More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
EOF
- my $mime = PublicInbox::MIME->new($lkml);
+ my $mime = PublicInbox::Eml->new($lkml);
$mime = $f->delivery($mime);
is("keep this\n", $mime->body, 'normal message filtered OK');
}
@@ -37,7 +37,7 @@ the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
EOF
- my $mime = PublicInbox::MIME->new($no_nl);
+ my $mime = PublicInbox::Eml->new($no_nl);
$mime = $f->delivery($mime);
is('OSX users :P', $mime->body, 'missing trailing LF in original OK');
}
diff --git a/t/html_index.t b/t/html_index.t
index 51897532..80f81577 100644
--- a/t/html_index.t
+++ b/t/html_index.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Feed;
use PublicInbox::Git;
use PublicInbox::Import;
@@ -32,7 +32,7 @@ my $im = PublicInbox::Import->new($git, 'tester', 'test@example');
$mid_line .= "In-Reply-To: $prev";
}
$prev = $mid;
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: ME <me\@example.com>
To: U <u\@example.com>
$mid_line
diff --git a/t/httpd.t b/t/httpd.t
index f4fbd533..7404eb8b 100644
--- a/t/httpd.t
+++ b/t/httpd.t
@@ -4,7 +4,7 @@ use strict;
use warnings;
use Test::More;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use Socket qw(IPPROTO_TCP SOL_SOCKET);
require_mods(qw(Plack::Util Plack::Builder HTTP::Date HTTP::Status));
@@ -28,7 +28,7 @@ use_ok 'PublicInbox::Import';
# ensure successful message delivery
{
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: Me <me\@example.com>
To: You <you\@example.com>
Cc: $addr
diff --git a/t/import.t b/t/import.t
index ba4abd9c..3f308299 100644
--- a/t/import.t
+++ b/t/import.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Git;
use PublicInbox::Import;
use PublicInbox::Spawn qw(spawn);
@@ -15,7 +15,7 @@ my ($dir, $for_destroy) = tmpdir();
my $git = PublicInbox::Git->new($dir);
my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
$im->init_bare;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: b@example.com
Subject: this is a subject
diff --git a/t/indexlevels-mirror.t b/t/indexlevels-mirror.t
index dcd5dc39..704f7e11 100644
--- a/t/indexlevels-mirror.t
+++ b/t/indexlevels-mirror.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Inbox;
use PublicInbox::InboxWritable;
require PublicInbox::Admin;
@@ -12,7 +12,7 @@ my $PI_TEST_VERSION = $ENV{PI_TEST_VERSION} || 2;
require_git('2.6') if $PI_TEST_VERSION == 2;
require_mods(qw(DBD::SQLite));
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/t/mda.t b/t/mda.t
index 03cc4bc3..759c0b02 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -62,7 +62,7 @@ local $ENV{GIT_COMMITTER_NAME} = eval {
use PublicInbox::MDA;
use PublicInbox::Address;
use Encode qw/encode/;
- my $msg = mime_load 't/utf8.eml';
+ my $msg = eml_load 't/utf8.eml';
my $from = $msg->header('From');
my ($author) = PublicInbox::Address::names($from);
my ($email) = PublicInbox::Address::emails($from);
@@ -229,7 +229,7 @@ EOF
"learned ham idempotently ");
# ensure trained email is filtered, too
- my $mime = mime_load 't/mda-mime.eml';
+ my $mime = eml_load 't/mda-mime.eml';
($mid) = ($mime->header_raw('message-id') =~ /<([^>]+)>/);
{
$in = $mime->as_string;
diff --git a/t/mda_filter_rubylang.t b/t/mda_filter_rubylang.t
index f2cbe9d5..483fcb85 100644
--- a/t/mda_filter_rubylang.t
+++ b/t/mda_filter_rubylang.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::TestCommon;
require_git(2.6);
diff --git a/t/mid.t b/t/mid.t
index 0ad81d7d..3b8f4108 100644
--- a/t/mid.t
+++ b/t/mid.t
@@ -2,7 +2,7 @@
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
use strict;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::MID qw(mid_escape mids references mids_for_index id_compress);
is(mid_escape('foo!@(bar)'), 'foo!@(bar)');
@@ -16,7 +16,7 @@ like(id_compress('foo%bar@wtf'), qr/\A[a-f0-9]{40}\z/,
is(id_compress('foobar-wtf'), 'foobar-wtf', 'regular ID not compressed');
{
- my $mime = PublicInbox::MIME->new("Message-ID: <mid-1\@a>\n\n");
+ my $mime = PublicInbox::Eml->new("Message-ID: <mid-1\@a>\n\n");
$mime->header_set('X-Alt-Message-ID', '<alt-id-for-nntp>');
is_deeply(['mid-1@a'], mids($mime->header_obj), 'mids in common case');
$mime->header_set('Message-Id', '<mid-1@a>', '<mid-2@b>');
diff --git a/t/mime.t b/t/mime.t
index b9a4d66b..d17ec58e 100644
--- a/t/mime.t
+++ b/t/mime.t
@@ -1,16 +1,23 @@
+#!perl -w
# Copyright (C) 2017-2020 all contributors <meta@public-inbox.org>
# This library is free software; you can redistribute it and/or modify
# it under the same terms as Perl itself.
# Artistic or GPL-1+ <https://www.gnu.org/licenses/gpl-1.0.txt>
use strict;
-use warnings;
use Test::More;
-use_ok 'PublicInbox::MIME';
+use PublicInbox::TestCommon;
use PublicInbox::MsgIter;
-
-local $SIG{__WARN__} = sub {};
-my $msg = PublicInbox::MIME->new(
-'From: Richard Hansen <hansenr@google.com>
+my @classes = qw(PublicInbox::Eml);
+SKIP: {
+ require_mods('Email::MIME', 1);
+ push @classes, 'PublicInbox::MIME';
+};
+use_ok $_ for @classes;
+local $SIG{__WARN__} = sub {}; # needed for old Email::Simple (used by E::M)
+
+for my $cls (@classes) {
+ my $msg = PublicInbox::MIME->new(<<'EOF');
+From: Richard Hansen <hansenr@google.com>
To: git@vger.kernel.org
Cc: Richard Hansen <hansenr@google.com>
Subject: [PATCH 0/2] minor diff orderfile documentation improvements
@@ -40,10 +47,11 @@ Content-Description: (truncated) S/MIME Cryptographic Signature
dkTlB69771K2eXK4LcHSH/2LqX+VYa3K44vrx1ruzjXdNWzIpKBy0weFNiwnJCGofvCysM2RCSI1
--94eb2c0bc864b76ba30545b2bca9--
-');
+EOF
-my @parts = $msg->parts;
-my $exp = 'Richard Hansen (2):
+ my @parts = $msg->parts;
+ my $exp = <<EOF;
+Richard Hansen (2):
diff: document behavior of relative diff.orderFile
diff: document the pattern format for diff.orderFile
@@ -51,13 +59,12 @@ my $exp = 'Richard Hansen (2):
Documentation/diff-options.txt | 3 ++-
2 files changed, 6 insertions(+), 2 deletions(-)
-';
-
-ok($msg->isa('Email::MIME'), 'compatible with Email::MIME');
-is($parts[0]->body, $exp, 'body matches expected');
+EOF
+ is($parts[0]->body, $exp, 'body matches expected');
-my $raw = q^Date: Wed, 18 Jan 2017 13:28:32 -0500
+ my $raw = <<'EOF';
+Date: Wed, 18 Jan 2017 13:28:32 -0500
From: Santiago Torres <santiago@nyu.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, peff@peff.net, sunshine@sunshineco.com,
@@ -92,28 +99,30 @@ Content-Type: application/pgp-signature; name="signature.asc"
--r24xguofrazenjwe--
-^;
-
-$msg = PublicInbox::MIME->new($raw);
-my $nr = 0;
-msg_iter($msg, sub {
- my ($part, $level, @ex) = @{$_[0]};
- is($level, 1, 'at expected level');
- if (join('fail if $#ex > 0', @ex) eq '1') {
- is($part->body_str, "your tree directly? \r\n", 'body OK');
- } elsif (join('fail if $#ex > 0', @ex) eq '2') {
- is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
- "=7wIb\n" .
- "-----END PGP SIGNATURE-----\n",
- 'sig "matches"');
- } else {
- fail "unexpected part\n";
- }
- $nr++;
-});
-
-is($nr, 2, 'got 2 parts');
-is($msg->as_string, $raw,
- 'stringified sufficiently close to original');
+EOF
+
+ $msg = $cls->new($raw);
+ my $nr = 0;
+ msg_iter($msg, sub {
+ my ($part, $level, @ex) = @{$_[0]};
+ is($level, 1, 'at expected level');
+ if (join('fail if $#ex > 0', @ex) eq '1') {
+ is($part->body_str, "your tree directly? \r\n",
+ 'body OK');
+ } elsif (join('fail if $#ex > 0', @ex) eq '2') {
+ is($part->body, "-----BEGIN PGP SIGNATURE-----\n\n" .
+ "=7wIb\n" .
+ "-----END PGP SIGNATURE-----\n",
+ 'sig "matches"');
+ } else {
+ fail "unexpected part\n";
+ }
+ $nr++;
+ });
+
+ is($nr, 2, 'got 2 parts');
+ is($msg->as_string, $raw,
+ 'stringified sufficiently close to original');
+}
done_testing();
diff --git a/t/msg_iter.t b/t/msg_iter.t
index e8115e25..4ee3a201 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -8,7 +8,7 @@ use PublicInbox::Hval qw(ascii_html);
use_ok('PublicInbox::MsgIter');
{
- my $mime = mime_load 't/msg_iter-order.eml';
+ my $mime = eml_load 't/msg_iter-order.eml';
my @parts;
msg_iter($mime, sub {
my ($part, $level, @ex) = @{$_[0]};
@@ -20,7 +20,7 @@ use_ok('PublicInbox::MsgIter');
}
{
- my $mime = mime_load 't/msg_iter-nested.eml';
+ my $mime = eml_load 't/msg_iter-nested.eml';
my @parts;
msg_iter($mime, sub {
my ($part, $level, @ex) = @{$_[0]};
@@ -33,7 +33,7 @@ use_ok('PublicInbox::MsgIter');
}
{
- my $mime = mime_load 't/iso-2202-jp.eml';
+ my $mime = eml_load 't/iso-2202-jp.eml';
my $raw = '';
msg_iter($mime, sub {
my ($part, $level, @ex) = @{$_[0]};
@@ -46,7 +46,7 @@ use_ok('PublicInbox::MsgIter');
}
{
- my $mime = mime_load 't/x-unknown-alpine.eml';
+ my $mime = eml_load 't/x-unknown-alpine.eml';
my $raw = '';
msg_iter($mime, sub {
my ($part, $level, @ex) = @{$_[0]};
diff --git a/t/msgtime.t b/t/msgtime.t
index d9f8e641..89fd9e37 100644
--- a/t/msgtime.t
+++ b/t/msgtime.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::MsgTime;
use PublicInbox::TestCommon;
@@ -11,7 +11,7 @@ our $received_date = 'Mon, 22 Jan 2007 13:16:24 -0500';
sub datestamp ($) {
my ($date) = @_;
local $SIG{__WARN__} = sub {}; # Suppress warnings
- my $mime = PublicInbox::MIME->new(<<"EOF");
+ my $mime = PublicInbox::Eml->new(<<"EOF");
From: a\@example.com
To: b\@example.com
Subject: this is a subject
@@ -30,7 +30,7 @@ EOF
sub timestamp ($) {
my ($received) = @_;
local $SIG{__WARN__} = sub {}; # Suppress warnings
- my $mime = PublicInbox::MIME->new(<<"EOF");
+ my $mime = PublicInbox::Eml->new(<<"EOF");
From: a\@example.com
To: b\@example.com
Subject: this is a subject
diff --git a/t/multi-mid.t b/t/multi-mid.t
index 5afb9693..91c8597e 100644
--- a/t/multi-mid.t
+++ b/t/multi-mid.t
@@ -2,7 +2,7 @@
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
use strict;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
use PublicInbox::InboxWritable;
require_git(2.6);
@@ -11,7 +11,7 @@ require PublicInbox::SearchIdx;
my $delay = $ENV{TEST_DELAY_CONVERT};
my $addr = 'test@example.com';
-my $bad = PublicInbox::MIME->new(<<EOF);
+my $bad = PublicInbox::Eml->new(<<EOF);
Message-ID: <a\@example.com>
Message-ID: <b\@example.com>
From: a\@example.com
@@ -20,7 +20,7 @@ Subject: bad
EOF
-my $good = PublicInbox::MIME->new(<<EOF);
+my $good = PublicInbox::Eml->new(<<EOF);
Message-ID: <b\@example.com>
From: b\@example.com
To: $addr
diff --git a/t/nntp.t b/t/nntp.t
index 35fb55b4..2a9f3a4f 100644
--- a/t/nntp.t
+++ b/t/nntp.t
@@ -4,7 +4,7 @@ use strict;
use warnings;
use Test::More;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_mods(qw(DBD::SQLite Data::Dumper));
use_ok 'PublicInbox::NNTP';
use_ok 'PublicInbox::Inbox';
@@ -107,7 +107,7 @@ use_ok 'PublicInbox::Inbox';
url => [ '//example.com/a' ]});
is($ng->base_url, $u, 'URL expanded');
my $mid = 'a@b';
- my $mime = PublicInbox::MIME->new("Message-ID: <$mid>\r\n\r\n");
+ my $mime = PublicInbox::Eml->new("Message-ID: <$mid>\r\n\r\n");
my $hdr = $mime->header_obj;
my $mock_self = { nntpd => { grouplist => [],
servername => 'example.com' } };
diff --git a/t/nntpd-tls.t b/t/nntpd-tls.t
index 0ad29be0..3de219f1 100644
--- a/t/nntpd-tls.t
+++ b/t/nntpd-tls.t
@@ -23,7 +23,7 @@ unless (-r $key && -r $cert) {
use_ok 'PublicInbox::TLS';
use_ok 'IO::Socket::SSL';
require PublicInbox::InboxWritable;
-require PublicInbox::MIME;
+require PublicInbox::Eml;
require PublicInbox::SearchIdx;
our $need_zlib;
eval { require Compress::Raw::Zlib } or
@@ -63,7 +63,7 @@ EOF
{
my $im = $ibx->importer(0);
- my $mime = mime_load 't/data/0001.patch';
+ my $mime = eml_load 't/data/0001.patch';
ok($im->add($mime), 'message added');
$im->done;
if ($version == 1) {
diff --git a/t/nntpd.t b/t/nntpd.t
index 4993b29f..69f72ce1 100644
--- a/t/nntpd.t
+++ b/t/nntpd.t
@@ -7,7 +7,7 @@ use PublicInbox::TestCommon;
use PublicInbox::Spawn qw(which);
require_mods(qw(DBD::SQLite));
require PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use IO::Socket;
use Socket qw(IPPROTO_TCP TCP_NODELAY);
use Net::NNTP;
@@ -57,7 +57,7 @@ $ibx = PublicInbox::Inbox->new($ibx);
# ensure successful message delivery
{
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
To: =?utf-8?Q?El=C3=A9anor?= <you\@example.com>
From: =?utf-8?Q?El=C3=A9anor?= <me\@example.com>
Cc: $addr
@@ -241,7 +241,7 @@ EOF
ok($date <= $t1, 'valid date before stop');
}
if ('leafnode interop') {
- my $for_leafnode = PublicInbox::MIME->new(<<"");
+ my $for_leafnode = PublicInbox::Eml->new(<<"");
From: longheader\@example.com
To: $addr
Subject: none
diff --git a/t/nulsubject.t b/t/nulsubject.t
index 03b1ee80..ccb60d52 100644
--- a/t/nulsubject.t
+++ b/t/nulsubject.t
@@ -14,7 +14,7 @@ my $git_dir = "$tmpdir/a.git";
my $git = PublicInbox::Git->new($git_dir);
my $im = PublicInbox::Import->new($git, 'testbox', 'test@example');
$im->init_bare;
- $im->add(PublicInbox::MIME->new(<<'EOF'));
+ $im->add(PublicInbox::Eml->new(<<'EOF'));
From: a@example.com
To: b@example.com
Subject: A subject line with a null =?iso-8859-1?q?=00?= see!
diff --git a/t/plack.t b/t/plack.t
index 4fff9773..37a6b394 100644
--- a/t/plack.t
+++ b/t/plack.t
@@ -31,7 +31,7 @@ my $git = PublicInbox::Git->new($inboxdir);
my $im = PublicInbox::Import->new($git, 'test', $addr);
# ensure successful message delivery
{
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: Me <me\@example.com>
To: You <you\@example.com>
Cc: $addr
@@ -50,15 +50,15 @@ EOF
chomp @ls;
# multipart with two text bodies
- $mime = mime_load 't/plack-2-txt-bodies.eml';
+ $mime = eml_load 't/plack-2-txt-bodies.eml';
$im->add($mime);
# multipart with attached patch + filename
- $mime = mime_load 't/plack-attached-patch.eml';
+ $mime = eml_load 't/plack-attached-patch.eml';
$im->add($mime);
# multipart collapsed to single quoted-printable text/plain
- $mime = mime_load 't/plack-qp.eml';
+ $mime = eml_load 't/plack-qp.eml';
like($mime->body_raw, qr/hi =3D bye=/, 'our test used QP correctly');
$im->add($mime);
@@ -77,7 +77,7 @@ Date: Fri, 02 Oct 1993 00:00:00 +0000
:(
EOF
$crlf =~ s/\n/\r\n/sg;
- $im->add(PublicInbox::MIME->new($crlf));
+ $im->add(PublicInbox::Eml->new($crlf));
$im->done;
}
diff --git a/t/precheck.t b/t/precheck.t
index a8fd31b1..11193e38 100644
--- a/t/precheck.t
+++ b/t/precheck.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use Email::Simple;
+use PublicInbox::Eml;
use PublicInbox::MDA;
sub do_checks {
@@ -27,7 +27,7 @@ sub do_checks {
}
{
- my $s = Email::Simple->new(<<'EOF');
+ my $s = PublicInbox::Eml->new(<<'EOF');
From: abc@example.com
To: abc@example.com
Cc: c@example.com, another-list@example.com
@@ -43,7 +43,7 @@ EOF
}
{
- do_checks(Email::Simple->new(<<'EOF'));
+ do_checks(PublicInbox::Eml->new(<<'EOF'));
From: a@example.com
To: b@example.com
Cc: c@example.com
@@ -57,7 +57,7 @@ EOF
}
{
- do_checks(Email::Simple->new(<<'EOF'));
+ do_checks(PublicInbox::Eml->new(<<'EOF'));
From: a@example.com
To: b+plus@example.com
Cc: John Doe <c@example.com>
@@ -72,7 +72,7 @@ EOF
{
my $recipient = 'b@example.com';
- my $s = Email::Simple->new(<<'EOF');
+ my $s = PublicInbox::Eml->new(<<'EOF');
To: b@example.com
Cc: c@example.com
Content-Type: text/plain
diff --git a/t/psgi_attach.t b/t/psgi_attach.t
index af0fbdd3..9a2b2411 100644
--- a/t/psgi_attach.t
+++ b/t/psgi_attach.t
@@ -29,7 +29,7 @@ $im->init_bare;
my $b64 = "b64\xde\xad\xbe\xef\n";
my $txt = "plain\ntext\npass\nthrough\n";
my $dot = "dotfile\n";
- $im->add(mime_load('t/psgi_attach.eml'));
+ $im->add(eml_load('t/psgi_attach.eml'));
$im->done;
my $www = PublicInbox::WWW->new($config);
diff --git a/t/psgi_bad_mids.t b/t/psgi_bad_mids.t
index 43025a4d..81bd9356 100644
--- a/t/psgi_bad_mids.t
+++ b/t/psgi_bad_mids.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::TestCommon;
my @mods = qw(DBD::SQLite HTTP::Request::Common Plack::Test
@@ -45,7 +45,7 @@ To: b\@example.com
Date: Fri, 02 Oct 1993 00:00:0$i +0000
- my $mime = PublicInbox::MIME->new(\$data);
+ my $mime = PublicInbox::Eml->new(\$data);
ok($im->add($mime), "added $mid");
$i++
}
diff --git a/t/psgi_mount.t b/t/psgi_mount.t
index bd492dcb..b4de8274 100644
--- a/t/psgi_mount.t
+++ b/t/psgi_mount.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
my ($tmpdir, $for_destroy) = tmpdir();
my $maindir = "$tmpdir/main.git";
@@ -25,7 +25,7 @@ my $git = PublicInbox::Git->new($maindir);
my $im = PublicInbox::Import->new($git, 'test', $addr);
$im->init_bare;
{
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: Me <me\@example.com>
To: You <you\@example.com>
Cc: $addr
diff --git a/t/psgi_multipart_not.t b/t/psgi_multipart_not.t
index ef86c015..e36820f4 100644
--- a/t/psgi_multipart_not.t
+++ b/t/psgi_multipart_not.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::TestCommon;
my @mods = qw(DBD::SQLite Search::Xapian HTTP::Request::Common
@@ -22,7 +22,7 @@ my $ibx = PublicInbox::Inbox->new({
my $im = PublicInbox::V2Writable->new($ibx, 1);
$im->{parallel} = 0;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
Message-Id: <200308111450.h7BEoOu20077@mail.osdl.org>
To: linux-kernel@vger.kernel.org
Subject: [OSDL] linux-2.6.0-test3 reaim results
diff --git a/t/psgi_scan_all.t b/t/psgi_scan_all.t
index 93603a33..46eb489f 100644
--- a/t/psgi_scan_all.t
+++ b/t/psgi_scan_all.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::TestCommon;
my @mods = qw(HTTP::Request::Common Plack::Test URI::Escape DBD::SQLite);
@@ -31,7 +31,7 @@ foreach my $i (1..2) {
my $im = PublicInbox::V2Writable->new($ibx, 1);
$im->{parallel} = 0;
$im->init_inbox(0);
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: a\@example.com
To: $addr
Subject: s$i
diff --git a/t/psgi_search.t b/t/psgi_search.t
index 3c515d19..64f8b1ac 100644
--- a/t/psgi_search.t
+++ b/t/psgi_search.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::Inbox;
use PublicInbox::InboxWritable;
@@ -27,7 +27,7 @@ my $im = $ibx->importer(0);
my $digits = '10010260936330';
my $ua = 'Pine.LNX.4.10';
my $mid = "$ua.$digits.2460-100000\@penguin.transmeta.com";
-my $mime = PublicInbox::MIME->new(<<EOF);
+my $mime = PublicInbox::Eml->new(<<EOF);
Subject: test
Message-ID: <$mid>
From: Ævar Arnfjörð Bjarmason <avarab\@example>
@@ -36,7 +36,7 @@ To: git\@vger.kernel.org
EOF
$im->add($mime);
-$mime = PublicInbox::MIME->new(<<'EOF');
+$mime = PublicInbox::Eml->new(<<'EOF');
Subject:
Message-ID: <blank-subject@example.com>
From: blank subject <blank-subject@example.com>
@@ -45,7 +45,7 @@ To: git@vger.kernel.org
EOF
$im->add($mime);
-$mime = PublicInbox::MIME->new(<<'EOF');
+$mime = PublicInbox::Eml->new(<<'EOF');
Message-ID: <no-subject-at-all@example.com>
From: no subject at all <no-subject-at-all@example.com>
To: git@vger.kernel.org
diff --git a/t/psgi_text.t b/t/psgi_text.t
index b7b5b2d4..833bcaba 100644
--- a/t/psgi_text.t
+++ b/t/psgi_text.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
my ($tmpdir, $for_destroy) = tmpdir();
my $maindir = "$tmpdir/main.git";
diff --git a/t/psgi_v2.t b/t/psgi_v2.t
index 9c19b041..8f75a3fb 100644
--- a/t/psgi_v2.t
+++ b/t/psgi_v2.t
@@ -5,7 +5,7 @@ use warnings;
use Test::More;
use PublicInbox::TestCommon;
require_git(2.6);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
use PublicInbox::MID qw(mids);
require_mods(qw(DBD::SQLite Search::Xapian HTTP::Request::Common Plack::Test
@@ -26,7 +26,7 @@ my $new_mid;
my $im = PublicInbox::V2Writable->new($ibx, 1);
$im->{parallel} = 0;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From oldbug-pre-a0c07cba0e5d8b6a Fri Oct 2 00:00:00 1993
From: a@example.com
To: test@example.com
@@ -225,7 +225,7 @@ test_psgi(sub { $www->call(@_) }, sub {
# ensure conflicted attachments can be resolved
foreach my $body (qw(old new)) {
- $mime = mime_load "t/psgi_v2-$body.eml";
+ $mime = eml_load "t/psgi_v2-$body.eml";
ok($im->add($mime), "added attachment $body");
}
$im->done;
diff --git a/t/purge.t b/t/purge.t
index dcc44039..2ca9edca 100644
--- a/t/purge.t
+++ b/t/purge.t
@@ -36,7 +36,7 @@ local $ENV{PI_CONFIG} = $cfgfile;
open my $cfg_fh, '>', $cfgfile or die "open: $!";
my $v2w = PublicInbox::V2Writable->new($ibx, {nproc => 1});
-my $mime = PublicInbox::MIME->new($raw);
+my $mime = PublicInbox::Eml->new($raw);
ok($v2w->add($mime), 'add message to be purged');
$v2w->done;
diff --git a/t/replace.t b/t/replace.t
index 2efa25f1..cef4e7aa 100644
--- a/t/replace.t
+++ b/t/replace.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::InboxWritable;
use PublicInbox::TestCommon;
use Cwd qw(abs_path);
@@ -24,7 +24,7 @@ sub test_replace ($$$) {
indexlevel => $level,
});
- my $orig = PublicInbox::MIME->new(<<'EOF');
+ my $orig = PublicInbox::Eml->new(<<'EOF');
From: Barbra Streisand <effect@example.com>
To: test@example.com
Subject: confidential
@@ -49,7 +49,7 @@ EOF
my $thread_a = $ibx->over->get_thread('replace@example.com');
my %before = map {; delete($_->{blob}) => $_ } @{$ibx->recent};
- my $reject = PublicInbox::MIME->new($orig->as_string);
+ my $reject = PublicInbox::Eml->new($orig->as_string);
foreach my $mid (['<replace@example.com>', '<extra@example.com>'],
[], ['<replaced@example.com>']) {
$reject->header_set('Message-ID', @$mid);
@@ -61,7 +61,7 @@ EOF
# prepare the replacement
my $expect = "Move along, nothing to see here\n";
- my $repl = PublicInbox::MIME->new($orig->as_string);
+ my $repl = PublicInbox::Eml->new($orig->as_string);
$repl->header_set('From', '<redactor@example.com>');
$repl->header_set('Subject', 'redacted');
$repl->header_set('Date', 'Sat, 02 Oct 2010 00:00:00 +0000');
@@ -80,7 +80,7 @@ EOF
is($changed_epochs, 1, 'only one epoch changed');
$im->done;
- my $m = PublicInbox::MIME->new($ibx->msg_by_mid('replace@example.com'));
+ my $m = PublicInbox::Eml->new($ibx->msg_by_mid('replace@example.com'));
is($m->body, $expect, 'replaced message');
is_deeply(\@warn, [], 'no warnings on noop');
@@ -159,7 +159,7 @@ sub pad_msgs {
($i, $irt) = each %$i;
}
my $sec = sprintf('%0d', $i);
- my $mime = PublicInbox::MIME->new(<<EOF);
+ my $mime = PublicInbox::Eml->new(<<EOF);
From: foo\@example.com
To: test\@example.com
Message-ID: <$i\@example.com>
diff --git a/t/reply.t b/t/reply.t
index a6c38cfa..53162df5 100644
--- a/t/reply.t
+++ b/t/reply.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use_ok 'PublicInbox::Reply';
my @q = (
@@ -19,7 +19,7 @@ while (@q) {
is($res, $expect, "quote $input => $res");
}
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: from <from@example.com>
To: to <to@example.com>
Cc: cc@example.com
diff --git a/t/search-thr-index.t b/t/search-thr-index.t
index 1bea59fd..914807a8 100644
--- a/t/search-thr-index.t
+++ b/t/search-thr-index.t
@@ -6,7 +6,7 @@ use bytes (); # only for bytes::length
use Test::More;
use PublicInbox::TestCommon;
use PublicInbox::MID qw(mids);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_mods(qw(DBD::SQLite Search::Xapian));
require PublicInbox::SearchIdx;
require PublicInbox::Smsg;
@@ -42,7 +42,7 @@ my @mids;
foreach (reverse split(/\n\n/, $data)) {
$_ .= "\n";
- my $mime = PublicInbox::MIME->new(\$_);
+ my $mime = PublicInbox::Eml->new(\$_);
$mime->header_set('From' => 'bw@g');
$mime->header_set('To' => 'git@vger.kernel.org');
my $bytes = bytes::length($mime->as_string);
@@ -78,7 +78,7 @@ $rw->commit_txn_lazy;
$xdb = $rw->begin_txn_lazy;
{
- my $mime = PublicInbox::MIME->new(<<'');
+ my $mime = PublicInbox::Eml->new(<<'');
Subject: [RFC 00/14]
Message-Id: <1-bw@g>
From: bw@g
diff --git a/t/search.t b/t/search.t
index 83986837..6cd938dd 100644
--- a/t/search.t
+++ b/t/search.t
@@ -8,7 +8,7 @@ require_mods(qw(DBD::SQLite Search::Xapian));
require PublicInbox::SearchIdx;
require PublicInbox::Inbox;
require PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
my ($tmpdir, $for_destroy) = tmpdir();
my $git_dir = "$tmpdir/a.git";
my $ibx = PublicInbox::Inbox->new({ inboxdir => $git_dir });
@@ -60,7 +60,7 @@ sub oct_is ($$$) {
}
$ibx->with_umask(sub {
- my $root = PublicInbox::MIME->new(<<'EOF');
+ my $root = PublicInbox::Eml->new(<<'EOF');
Date: Fri, 02 Oct 1993 00:00:00 +0000
Subject: Hello world
Message-ID: <root@s>
@@ -69,7 +69,7 @@ To: list@example.com
\m/
EOF
- my $last = PublicInbox::MIME->new(<<'EOF');
+ my $last = PublicInbox::Eml->new(<<'EOF');
Date: Sat, 02 Oct 2010 00:00:00 +0000
Subject: Re: Hello world
In-Reply-To: <root@s>
@@ -126,7 +126,7 @@ sub filter_mids {
$ibx->with_umask(sub {
$rw_commit->();
my $rmid = '<ghost-message@s>';
- my $reply_to_ghost = PublicInbox::MIME->new(<<"EOF");
+ my $reply_to_ghost = PublicInbox::Eml->new(<<"EOF");
Date: Sat, 02 Oct 2010 00:00:00 +0000
Subject: Re: ghosts
Message-ID: <ghost-reply\@s>
@@ -140,7 +140,7 @@ EOF
my $reply_id = $rw->add_message($reply_to_ghost);
is($reply_id, int($reply_id), "reply_id is an integer: $reply_id");
- my $was_ghost = PublicInbox::MIME->new(<<"EOF");
+ my $was_ghost = PublicInbox::Eml->new(<<"EOF");
Date: Sat, 02 Oct 2010 00:00:01 +0000
Subject: ghosts
Message-ID: $rmid
@@ -189,7 +189,7 @@ $ibx->with_umask(sub {
$rw_commit->();
$ro->reopen;
my $long_mid = 'last' . ('x' x 60). '@s';
- my $long = PublicInbox::MIME->new(<<EOF);
+ my $long = PublicInbox::Eml->new(<<EOF);
Date: Sat, 02 Oct 2010 00:00:00 +0000
Subject: long message ID
References: <root\@s> <last\@s>
@@ -209,7 +209,7 @@ EOF
my @res;
my $long_reply_mid = 'reply-to-long@1';
- my $long_reply = PublicInbox::MIME->new(<<EOF);
+ my $long_reply = PublicInbox::Eml->new(<<EOF);
Subject: I break references
Date: Sat, 02 Oct 2010 00:00:00 +0000
Message-ID: <$long_reply_mid>
@@ -233,7 +233,7 @@ EOF
# quote prioritization
$ibx->with_umask(sub {
$rw_commit->();
- $rw->add_message(PublicInbox::MIME->new(<<'EOF'));
+ $rw->add_message(PublicInbox::Eml->new(<<'EOF'));
Date: Sat, 02 Oct 2010 00:00:01 +0000
Subject: Hello
Message-ID: <quote@a>
@@ -243,7 +243,7 @@ To: list@example.com
> theatre illusions
fade
EOF
- $rw->add_message(PublicInbox::MIME->new(<<'EOF'));
+ $rw->add_message(PublicInbox::Eml->new(<<'EOF'));
Date: Sat, 02 Oct 2010 00:00:02 +0000
Subject: Hello
Message-ID: <nquote@a>
@@ -267,7 +267,7 @@ EOF
# circular references
$ibx->with_umask(sub {
my $s = 'foo://'. ('Circle' x 15).'/foo';
- my $doc_id = $rw->add_message(PublicInbox::MIME->new(<<EOF));
+ my $doc_id = $rw->add_message(PublicInbox::Eml->new(<<EOF));
Subject: $s
Date: Sat, 02 Oct 2010 00:00:01 +0000
Message-ID: <circle\@a>
@@ -286,7 +286,7 @@ EOF
});
$ibx->with_umask(sub {
- my $mime = mime_load 't/utf8.eml';
+ my $mime = eml_load 't/utf8.eml';
my $doc_id = $rw->add_message($mime);
ok($doc_id > 0, 'message indexed doc_id with UTF-8');
my $msg = $rw->query('m:testmessage@example.com', {limit => 1})->[0];
@@ -369,7 +369,7 @@ $ibx->with_umask(sub {
}
$ibx->with_umask(sub {
- my $amsg = mime_load 't/search-amsg.eml';
+ my $amsg = eml_load 't/search-amsg.eml';
ok($rw->add_message($amsg), 'added attachment');
$rw_commit->();
$ro->reopen;
@@ -427,7 +427,7 @@ $ibx->with_umask(sub {
my $mid = "$ua.$digits.2460-100000\@penguin.transmeta.com";
is($ro->reopen->query("m:$digits", { mset => 1})->size, 0,
'no results yet');
- my $pine = PublicInbox::MIME->new(<<EOF);
+ my $pine = PublicInbox::Eml->new(<<EOF);
Subject: blah
Message-ID: <$mid>
From: torvalds\@transmeta
diff --git a/t/solver_git.t b/t/solver_git.t
index c483aba1..78cc0edd 100644
--- a/t/solver_git.t
+++ b/t/solver_git.t
@@ -15,7 +15,7 @@ chomp $git_dir;
# needed for alternates, and --absolute-git-dir is only in git 2.13+
$git_dir = abs_path($git_dir);
-use_ok "PublicInbox::$_" for (qw(Inbox V2Writable MIME Git SolverGit WWW));
+use_ok "PublicInbox::$_" for (qw(Inbox V2Writable Git SolverGit WWW));
my ($inboxdir, $for_destroy) = tmpdir();
my $opts = {
@@ -29,7 +29,7 @@ my $im = PublicInbox::V2Writable->new($ibx, 1);
$im->{parallel} = 0;
my $deliver_patch = sub ($) {
- $im->add(mime_load($_[0]));
+ $im->add(eml_load($_[0]));
$im->done;
};
diff --git a/t/spamcheck_spamc.t b/t/spamcheck_spamc.t
index edfacc62..2d9da631 100644
--- a/t/spamcheck_spamc.t
+++ b/t/spamcheck_spamc.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use Email::Simple;
+use PublicInbox::Eml;
use IO::File;
use Fcntl qw(:DEFAULT SEEK_SET);
use PublicInbox::TestCommon;
@@ -28,19 +28,19 @@ Subject: test
Message-ID: <testmessage@example.com>
EOF
-ok($spamc->spamcheck(Email::Simple->new($src), \$dst), 'Email::Simple works');
+ok($spamc->spamcheck(PublicInbox::Eml->new($src), \$dst), 'PublicInbox::Eml works');
is($dst, $src, 'input == output');
$dst = '';
$spamc->{checkcmd} = ['sh', '-c', 'cat; false'];
-ok(!$spamc->spamcheck(Email::Simple->new($src), \$dst), 'Failed check works');
+ok(!$spamc->spamcheck(PublicInbox::Eml->new($src), \$dst), 'Failed check works');
is($dst, $src, 'input == output for spammy example');
for my $l (qw(ham spam)) {
my $file = "$tmpdir/$l.out";
$spamc->{$l.'cmd'} = ['tee', $file ];
my $method = $l.'learn';
- ok($spamc->$method(Email::Simple->new($src)), "$method OK");
+ ok($spamc->$method(PublicInbox::Eml->new($src)), "$method OK");
open my $fh, '<', $file or die "failed to open $file: $!";
is(eval { local $/, <$fh> }, $src, "$l command ran alright");
}
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index d6545c6d..484ea443 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -3,8 +3,9 @@
use strict;
use warnings;
use Test::More;
+use PublicInbox::TestCommon;
+require_mods 'Email::Simple';
use_ok('PublicInbox::SearchThread');
-use Email::Simple;
my $mt = eval {
require Mail::Thread;
no warnings 'once';
diff --git a/t/time.t b/t/time.t
index 71600b93..b491711d 100644
--- a/t/time.t
+++ b/t/time.t
@@ -4,9 +4,9 @@ use strict;
use warnings;
use Test::More;
use POSIX qw(strftime);
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::MsgTime qw(msg_datestamp);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: b@example.com
Subject: this is a subject
diff --git a/t/v1-add-remove-add.t b/t/v1-add-remove-add.t
index 23f4fb11..2cd45f60 100644
--- a/t/v1-add-remove-add.t
+++ b/t/v1-add-remove-add.t
@@ -5,7 +5,7 @@ use warnings;
use Test::More;
use PublicInbox::Import;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_mods(qw(DBD::SQLite Search::Xapian));
require PublicInbox::SearchIdx;
my ($inboxdir, $for_destroy) = tmpdir();
@@ -15,7 +15,7 @@ my $ibx = {
-primary_address => 'test@example.com',
};
$ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/t/v1reindex.t b/t/v1reindex.t
index e473fe7c..13605f8b 100644
--- a/t/v1reindex.t
+++ b/t/v1reindex.t
@@ -6,7 +6,7 @@ use Test::More;
use PublicInbox::ContentId qw(content_digest);
use File::Path qw(remove_tree);
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_git(2.6);
require_mods(qw(DBD::SQLite Search::Xapian));
use_ok 'PublicInbox::SearchIdx';
@@ -18,7 +18,7 @@ my $ibx_config = {
-primary_address => 'test@example.com',
indexlevel => 'full',
};
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/t/v2-add-remove-add.t b/t/v2-add-remove-add.t
index 60a869ee..cfdc8cf1 100644
--- a/t/v2-add-remove-add.t
+++ b/t/v2-add-remove-add.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::TestCommon;
require_git(2.6);
require_mods(qw(DBD::SQLite Search::Xapian));
@@ -16,7 +16,7 @@ my $ibx = {
-primary_address => 'test@example.com',
};
$ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/t/v2mda.t b/t/v2mda.t
index 4d3ec30d..36f43ff0 100644
--- a/t/v2mda.t
+++ b/t/v2mda.t
@@ -6,7 +6,7 @@ use Test::More;
use Fcntl qw(SEEK_SET);
use Cwd;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
require_git(2.6);
my $V = 2;
@@ -18,7 +18,7 @@ my $ibx = {
name => 'test-v2writable',
address => [ 'test@example.com' ],
};
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/t/v2mirror.t b/t/v2mirror.t
index ecf96891..d588808d 100644
--- a/t/v2mirror.t
+++ b/t/v2mirror.t
@@ -15,7 +15,7 @@ use IO::Socket;
use POSIX qw(dup2);
use_ok 'PublicInbox::V2Writable';
use PublicInbox::InboxWritable;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
# FIXME: too much setup
my ($tmpdir, $for_destroy) = tmpdir();
@@ -38,7 +38,7 @@ $ibx->{version} = 2;
my $v2w = PublicInbox::V2Writable->new($ibx, 1);
ok $v2w, 'v2w loaded';
$v2w->{parallel} = 0;
-my $mime = PublicInbox::MIME->new(<<'');
+my $mime = PublicInbox::Eml->new(<<'');
From: Me <me@example.com>
To: You <you@example.com>
Subject: a
diff --git a/t/v2reindex.t b/t/v2reindex.t
index b97c6498..f16a0b0d 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::ContentId qw(content_digest);
use File::Path qw(remove_tree);
use PublicInbox::TestCommon;
@@ -24,7 +24,7 @@ my $agpl = do {
<$fh>;
};
my $phrase = q("defending all users' freedom");
-my $mime = PublicInbox::MIME->new(<<'EOF'.$agpl);
+my $mime = PublicInbox::Eml->new(<<'EOF'.$agpl);
From: a@example.com
To: test@example.com
Subject: this is a subject
@@ -434,7 +434,7 @@ ok(!-d $xap, 'Xapian directories removed again');
$config{indexlevel} = 'medium';
my $ibx = PublicInbox::Inbox->new(\%config);
my $im = PublicInbox::V2Writable->new($ibx);
- my $m3 = PublicInbox::MIME->new(<<'EOF');
+ my $m3 = PublicInbox::Eml->new(<<'EOF');
Date: Tue, 24 May 2016 14:34:22 -0700 (PDT)
Message-Id: <20160524.143422.552507610109476444.d@example.com>
To: t@example.com
@@ -465,7 +465,7 @@ Somehow we got a message with 3 sets of headers into one
message, could've been something broken on the archiver side.
EOF
- my $m1 = PublicInbox::MIME->new(<<'EOF');
+ my $m1 = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: t@example.com
Subject: [PATCH 12/13]
diff --git a/t/v2writable.t b/t/v2writable.t
index 07687052..e5a565ce 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -3,7 +3,7 @@
use strict;
use warnings;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::ContentId qw(content_digest content_id);
use PublicInbox::TestCommon;
use Cwd qw(abs_path);
@@ -20,7 +20,7 @@ my $ibx = {
-primary_address => 'test@example.com',
};
$ibx = PublicInbox::Inbox->new($ibx);
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
@@ -63,7 +63,7 @@ if ('ensure git configs are correct') {
@warn = ();
$mime->header_set('Message-Id', '<a-mid@b>', '<c@d>');
is($im->add($mime), undef, 'secondary MID ignored if first matches');
- my $sec = PublicInbox::MIME->new($mime->as_string);
+ my $sec = PublicInbox::Eml->new($mime->as_string);
$sec->header_set('Date');
$sec->header_set('Message-Id', '<a-mid@b>', '<c@d>');
ok($im->add($sec), 'secondary MID used if data is different');
@@ -90,7 +90,7 @@ if ('ensure git configs are correct') {
my $hdr = $mime->header_obj;
my $gen = PublicInbox::Import::digest2mid(content_digest($mime), $hdr);
unlike($gen, qr![\+/=]!, 'no URL-unfriendly chars in Message-Id');
- my $fake = PublicInbox::MIME->new($mime->as_string);
+ my $fake = PublicInbox::Eml->new($mime->as_string);
$fake->header_set('Message-Id', "<$gen>");
ok($im->add($fake), 'fake added easily');
is_deeply(\@warn, [], 'no warnings from a faker');
diff --git a/t/watch_filter_rubylang.t b/t/watch_filter_rubylang.t
index 09217d94..2e7d402e 100644
--- a/t/watch_filter_rubylang.t
+++ b/t/watch_filter_rubylang.t
@@ -4,7 +4,7 @@ use strict;
use warnings;
use PublicInbox::TestCommon;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Config;
require_mods(qw(Filesys::Notify::Simple DBD::SQLite Search::Xapian));
use_ok 'PublicInbox::WatchMaildir';
diff --git a/t/watch_maildir.t b/t/watch_maildir.t
index c34d15f7..66955072 100644
--- a/t/watch_maildir.t
+++ b/t/watch_maildir.t
@@ -2,7 +2,7 @@
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
use strict;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use Cwd;
use PublicInbox::Config;
use PublicInbox::TestCommon;
diff --git a/t/watch_maildir_v2.t b/t/watch_maildir_v2.t
index dd5030ea..19a2da77 100644
--- a/t/watch_maildir_v2.t
+++ b/t/watch_maildir_v2.t
@@ -2,7 +2,7 @@
# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
use strict;
use Test::More;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use Cwd;
use PublicInbox::Config;
use PublicInbox::TestCommon;
diff --git a/t/www_altid.t b/t/www_altid.t
index a885c389..337303d9 100644
--- a/t/www_altid.t
+++ b/t/www_altid.t
@@ -26,7 +26,7 @@ if ('setup') {
my $ibx = PublicInbox::Inbox->new($opts);
$ibx = PublicInbox::InboxWritable->new($ibx, 1);
my $im = $ibx->importer(0);
- my $mime = PublicInbox::MIME->new(<<'EOF');
+ my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
Message-Id: <a@example.com>
diff --git a/t/xcpdb-reshard.t b/t/xcpdb-reshard.t
index 0e1fea52..70012cc6 100644
--- a/t/xcpdb-reshard.t
+++ b/t/xcpdb-reshard.t
@@ -6,11 +6,11 @@ use Test::More;
use PublicInbox::TestCommon;
require_mods(qw(DBD::SQLite Search::Xapian));
require_git('2.6');
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::InboxWritable;
require PublicInbox::Search;
-my $mime = PublicInbox::MIME->new(<<'EOF');
+my $mime = PublicInbox::Eml->new(<<'EOF');
From: a@example.com
To: test@example.com
Subject: this is a subject
diff --git a/xt/msgtime_cmp.t b/xt/msgtime_cmp.t
index 4ebf5b2c..95d7c64b 100644
--- a/xt/msgtime_cmp.t
+++ b/xt/msgtime_cmp.t
@@ -4,7 +4,7 @@
use strict;
use Test::More;
use PublicInbox::TestCommon;
-use PublicInbox::MIME;
+use PublicInbox::Eml;
use PublicInbox::Inbox;
use PublicInbox::Git;
use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
@@ -48,7 +48,7 @@ sub quiet_is_deeply ($$$$$) {
sub compare {
my ($bref, $oid, $type, $size) = @_;
local $SIG{__WARN__} = sub { diag "$oid: ", @_ };
- my $mime = PublicInbox::MIME->new($$bref);
+ my $mime = PublicInbox::Eml->new($$bref);
my $hdr = $mime->header_obj;
my @cur = msg_datestamp($hdr);
my @old = Old::msg_datestamp($hdr);
@@ -116,7 +116,7 @@ sub time_response ($) {
}
sub msg_received_at ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my @recvd = $hdr->header_raw('Received');
my ($ts);
foreach my $r (@recvd) {
@@ -131,7 +131,7 @@ sub msg_received_at ($) {
}
sub msg_date_only ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my @date = $hdr->header_raw('Date');
my ($ts);
foreach my $d (@date) {
@@ -149,7 +149,7 @@ sub msg_date_only ($) {
# Favors Received header for sorting globally
sub msg_timestamp ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my $ret;
$ret = msg_received_at($hdr) and return time_response($ret);
$ret = msg_date_only($hdr) and return time_response($ret);
@@ -158,7 +158,7 @@ sub msg_timestamp ($) {
# Favors the Date: header for display and sorting within a thread
sub msg_datestamp ($) {
- my ($hdr) = @_; # Email::MIME::Header
+ my ($hdr) = @_; # PublicInbox::Eml
my $ret;
$ret = msg_date_only($hdr) and return time_response($ret);
$ret = msg_received_at($hdr) and return time_response($ret);
diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index a4445959..30fc07dc 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -38,7 +38,7 @@ my $obuf = '';
my $m = 0;
my $cb = sub {
- $mime = PublicInbox::MIME->new(shift);
+ $mime = PublicInbox::Eml->new(shift);
PublicInbox::View::multipart_text_as_html($mime, $ctx);
++$m;
$obuf = '';
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 13/13] eml: drop trailing blank line on missing epilogue
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
` (11 preceding siblings ...)
2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
@ 2020-05-07 21:05 ` Eric Wong
12 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-07 21:05 UTC (permalink / raw)
To: meta
This improves Email::MIME compatibility when running
xt/cmp-msgview.t on some GPG-signed messages.
Its usefulness is dubious in the long term and this patch
may be reverted down the line.
---
lib/PublicInbox/Eml.pm | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Eml.pm b/lib/PublicInbox/Eml.pm
index 1adaff04..4508bd84 100644
--- a/lib/PublicInbox/Eml.pm
+++ b/lib/PublicInbox/Eml.pm
@@ -131,8 +131,12 @@ sub mp_descend ($$) {
# Cut at the the first epilogue, not subsequent ones.
# *sigh* just the regexp match alone seems to bump RSS by
# length($$bdy) on a ~30M string:
- $$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm and
+ my $epilogue_missing;
+ if ($$bdy =~ /((?:\r?\n)?^--$bnd--[ \t]*\r?$)/gsm) {
substr($$bdy, pos($$bdy) - length($1)) = '';
+ } else {
+ $epilogue_missing = 1;
+ }
# *Sigh* split() doesn't work in-place and return CoW strings
# because Perl wants to "\0"-terminate strings. So split()
@@ -150,6 +154,10 @@ sub mp_descend ($$) {
if (@parts) { # the usual path if we got this far:
undef $bdy; # release memory ASAP if $nr > 0
+
+ # compatibility with Email::MIME
+ $parts[-1] =~ s/\n\r?\n\z/\n/s if $epilogue_missing;
+
@parts = grep /[^ \t\r\n]/s, @parts; # ignore empty parts
# Keep "From: someone..." from preamble in old,
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 11/13] xt: eml comparison tests
2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
@ 2020-05-08 4:47 ` Eric Wong
0 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-05-08 4:47 UTC (permalink / raw)
To: meta
Eric Wong <e@yhbt.net> wrote:
> xt/cmp-msgstr.t | 108 +++++++++++++++++++++++++++++++++++++++++++++++
> xt/cmp-msgview.t | 95 +++++++++++++++++++++++++++++++++++++++++
Btw, I run these in parallel on inboxes I have:
N=$(nproc)
find ~/v2/*/git/ -type d -name '*.git' -print0 | xargs -0 -P$N -n1 sh -c \
'GIANT_INBOX_DIR=$1 perl -I lib -w xt/cmp-msgview.t' --
find ~/v1/ -type d -name '*.git' -print0 | xargs -0 -P$N -n1 sh -c \
'GIANT_INBOX_DIR=$1 perl -I lib -w xt/cmp-msgstr.t' --
And the main differences I see are minor:
* trailing whitespace may still be different for broken messages
missing epilogues (MIMEDefang, or some old gnus + GPG)
* trailing whitespace differences for header extraction
(Eml strips all trailing spaces, not just LF/CRLF)
* empty parts of multipart messages are skipped for efficiency
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-05-08 4:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-07 21:05 [PATCH 00/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 01/13] msg_iter: make ->each_part method for PublicInbox::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 02/13] msg_iter: pass $idx as a scalar, not array Eric Wong
2020-05-07 21:05 ` [PATCH 03/13] filter/rubylang: avoid recursing subparts to strip trailers Eric Wong
2020-05-07 21:05 ` [PATCH 04/13] smsg: use capitalization for header retrieval Eric Wong
2020-05-07 21:05 ` [PATCH 05/13] eml: pure-Perl replacement for Email::MIME Eric Wong
2020-05-07 21:05 ` [PATCH 06/13] switch read-only Email::Simple users to Eml Eric Wong
2020-05-07 21:05 ` [PATCH 07/13] replace most uses of PublicInbox::MIME with Eml Eric Wong
2020-05-07 21:05 ` [PATCH 08/13] EmlContentFoo: Email::MIME::ContentType replacement Eric Wong
2020-05-07 21:05 ` [PATCH 09/13] EmlContentFoo: relax Encode version requirement Eric Wong
2020-05-07 21:05 ` [PATCH 10/13] eml: remove dependency on Email::MIME::Encodings Eric Wong
2020-05-07 21:05 ` [PATCH 11/13] xt: eml comparison tests Eric Wong
2020-05-08 4:47 ` Eric Wong
2020-05-07 21:05 ` [PATCH 12/13] remove most internal Email::MIME usage Eric Wong
2020-05-07 21:05 ` [PATCH 13/13] eml: drop trailing blank line on missing epilogue Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).