unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH] xt: add eml ->as_string round trip checker
@ 2020-09-24 20:51 Eric Wong
  2020-10-17  8:17 ` [REVERT?] " Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: Eric Wong @ 2020-09-24 20:51 UTC (permalink / raw)
  To: meta

Unlike Email::MIME, PublicInbox::Eml::as_string should be able
to round trip from the Perl object to a raw scalar and back
without changes.
---
 MANIFEST                 |  1 +
 xt/eml_check_roundtrip.t | 43 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)
 create mode 100644 xt/eml_check_roundtrip.t

diff --git a/MANIFEST b/MANIFEST
index b6a681e9..65fa8736 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -385,6 +385,7 @@ t/xcpdb-reshard.t
 xt/cmp-msgstr.t
 xt/cmp-msgview.t
 xt/eml_check_limits.t
+xt/eml_check_roundtrip.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
 xt/httpd-async-stream.t
diff --git a/xt/eml_check_roundtrip.t b/xt/eml_check_roundtrip.t
new file mode 100644
index 00000000..9b216c53
--- /dev/null
+++ b/xt/eml_check_roundtrip.t
@@ -0,0 +1,43 @@
+#!perl -w
+# Copyright (C) 2020 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use Test::More;
+use PublicInbox::TestCommon;
+use PublicInbox::Eml;
+use PublicInbox::Inbox;
+use List::Util qw(max);
+use Benchmark qw(:all :hireswallclock);
+use PublicInbox::Spawn qw(popen_rd);
+use Carp ();
+require_git(2.19); # for --unordered
+my $inboxdir = $ENV{GIANT_INBOX_DIR};
+plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'x' });
+my $git = $ibx->git;
+my @cat = qw(cat-file --buffer --batch-check --batch-all-objects --unordered);
+my $fh = $git->popen(@cat);
+my $cat_cb = sub {
+	my ($bref, $oid, $type, $size, $check) = @_;
+	my $orig = $$bref;
+	my $copy = PublicInbox::Eml->new($bref)->as_string;
+	++$check->[$orig eq $copy ? 0 : 1];
+};
+
+my $n = 0;
+my $check = [ 0, 0 ]; # [ eql, neq ]
+my $t = timeit(1, sub {
+	my ($blob, $type);
+	while (<$fh>) {
+		($blob, $type) = split / /;
+		next if $type ne 'blob';
+		$git->cat_async($blob, $cat_cb, $check);
+		if ((++$n % 8192) == 0) {
+			diag "n=$n eql=$check->[0] neq=$check->[1]";
+		}
+	}
+	$git->cat_async_wait;
+});
+is($check->[0], $n, 'all messages round tripped');
+is($check->[1], 0, 'no messages failed to round trip');
+done_testing;

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [REVERT?] xt: add eml ->as_string round trip checker
  2020-09-24 20:51 [PATCH] xt: add eml ->as_string round trip checker Eric Wong
@ 2020-10-17  8:17 ` Eric Wong
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2020-10-17  8:17 UTC (permalink / raw)
  To: meta

Eric Wong <e@80x24.org> wrote:
> Unlike Email::MIME, PublicInbox::Eml::as_string should be able
> to round trip from the Perl object to a raw scalar and back
> without changes.

Well, almost...  As long as we don't use ->each_part.
Will likely go with this revert:

----------8<----------
Subject: [PATCH] xt: remove eml_check_roundtrip

If there's no body ({bdy} field), ->each_part set the {bdy}
field to "\n" and the ->as_string result afterwards is one
extra "\n" byte longer than the original.

It's not worth extra cycles in common ->each_part calls to
ensure 100% round-trip matches of header-only messages (which
are likely spam), especially when the only difference is a
trailing "\n".
---
 MANIFEST                 |  1 -
 xt/eml_check_roundtrip.t | 43 ----------------------------------------
 2 files changed, 44 deletions(-)
 delete mode 100644 xt/eml_check_roundtrip.t

diff --git a/MANIFEST b/MANIFEST
index 65fa8736..b6a681e9 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -385,7 +385,6 @@ t/xcpdb-reshard.t
 xt/cmp-msgstr.t
 xt/cmp-msgview.t
 xt/eml_check_limits.t
-xt/eml_check_roundtrip.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
 xt/httpd-async-stream.t
diff --git a/xt/eml_check_roundtrip.t b/xt/eml_check_roundtrip.t
deleted file mode 100644
index 9b216c53..00000000
-- 
--irreversible-delete was used with git-format-patch

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-10-17  8:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-24 20:51 [PATCH] xt: add eml ->as_string round trip checker Eric Wong
2020-10-17  8:17 ` [REVERT?] " Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).