unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 00/13] smsg: remove tricky {mime} field
@ 2020-06-01 10:06 Eric Wong
  2020-06-01 10:06 ` [PATCH 01/13] inbox: introduce smsg_eml method Eric Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

Storing a large PublicInbox::Eml (or in the past, Email::MIME)
object inside a small PublicInbox::Smsg object has historically
been bloat-prone[1] since there may be many small smsgs in
memory at once

Hundreds or thousands of $smsg objects can linger in memory due
to search results and message threading operations.  So keep
$eml and $smsg objects independent of each other, for now.
Instead, we'll introduce a $smsg->populate($eml) API to handle
filling in the keys for the importer, indexer, and
non-SQLite-using WWW users.

Furthermore, $smsg->$field dispatch has always been measurably
faster than $smsg->{$field} access in NNTP.  Since $smsg->$field
became read-only with the removal of $smsg->{mime}, we can
abandon the $smsg->$field invocations and favor of direct hash
access.

[1] the prime example being what commit 7d02b9e64455831d fixed
    ("view: stop storing all MIME objects on large threads")

Eric Wong (13):
  inbox: introduce smsg_eml method
  wwwatomstream: convert callers to use smsg_eml
  v2writable: fix non-sensical interpolation in BUG message
  import: modernize to use Perl 5.10 features
  smsg: introduce ->populate method
  smsg: get rid of ->wrap initializer, too
  inbox: msg_by_*: remove $(size)ref args
  www: remove smsg_mime API and adjust callers
  nntp: smsg_range_i: favor ->{$field} lookups when possible
  smsg: get rid of remaining {mime} users
  smsg: remove ->bytes and ->lines methods
  smsg: remove remaining accessor methods
  wwwatomstream: drop smsg->{mid} fallback for non-SQLite

 Documentation/mknews.perl        |   7 +-
 lib/PublicInbox/ExtMsg.pm        |   2 +-
 lib/PublicInbox/Feed.pm          |   8 +-
 lib/PublicInbox/Import.pm        |  69 ++++++++---------
 lib/PublicInbox/Inbox.pm         |  32 ++++----
 lib/PublicInbox/Mbox.pm          |   2 +-
 lib/PublicInbox/NNTP.pm          |  14 +++-
 lib/PublicInbox/OverIdx.pm       |   3 +-
 lib/PublicInbox/SearchIdx.pm     |  33 ++++-----
 lib/PublicInbox/SearchView.pm    |   6 +-
 lib/PublicInbox/Smsg.pm          | 123 +++++++++++--------------------
 lib/PublicInbox/SolverGit.pm     |   4 +-
 lib/PublicInbox/V2Writable.pm    |  11 +--
 lib/PublicInbox/View.pm          |  63 ++++++++--------
 lib/PublicInbox/WwwAtomStream.pm |   8 +-
 t/altid.t                        |   3 +-
 t/altid_v2.t                     |   3 +-
 t/import.t                       |   3 +-
 t/search.t                       |  46 +++++++-----
 t/v2mda.t                        |   4 +-
 t/v2writable.t                   |   5 +-
 21 files changed, 207 insertions(+), 242 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/13] inbox: introduce smsg_eml method
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 02/13] wwwatomstream: convert callers to use smsg_eml Eric Wong
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

The goal of this is to eventually remove the $smsg->{mime} field
which is easy-to-misuse and cause memory explosions which
necessitated fixes like commit 7d02b9e64455831d
("view: stop storing all MIME objects on large threads").
---
 lib/PublicInbox/Inbox.pm     | 6 ++++++
 lib/PublicInbox/SolverGit.pm | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index c295b2677e4..bd1489e3699 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -334,6 +334,12 @@ sub smsg_mime {
 	}
 }
 
+sub smsg_eml {
+	my ($self, $smsg) = @_;
+	my $bref = msg_by_smsg($self, $smsg) or return;
+	PublicInbox::Eml->new($bref);
+}
+
 sub mid2num($$) {
 	my ($self, $mid) = @_;
 	my $mm = mm($self) or return;
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index f718e28cbd5..b1cb1ae97c4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -233,8 +233,8 @@ sub find_extract_diffs ($$$) {
 
 	my $diffs = [];
 	foreach my $smsg (@$msgs) {
-		$ibx->smsg_mime($smsg) or next;
-		delete($smsg->{mime})->each_part(\&extract_diff,
+		my $eml = $ibx->smsg_eml($smsg) or next;
+		$eml->each_part(\&extract_diff,
 				[$self, $diffs, $pre, $post, $ibx, $smsg], 1);
 	}
 	@$diffs ? $diffs : undef;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 02/13] wwwatomstream: convert callers to use smsg_eml
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
  2020-06-01 10:06 ` [PATCH 01/13] inbox: introduce smsg_eml method Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 03/13] v2writable: fix non-sensical interpolation in BUG message Eric Wong
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

We can simplify WwwAtomStream callbacks by performing ->smsg_eml
calls in the `feed_entry' sub itself.  This simplifies callers,
by reducing the number of places which can load an Eml object
into memory.
---
 lib/PublicInbox/Feed.pm          | 2 +-
 lib/PublicInbox/SearchView.pm    | 2 +-
 lib/PublicInbox/WwwAtomStream.pm | 9 +++++----
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index e64628be830..b770a35077c 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -12,7 +12,7 @@ use PublicInbox::Smsg; # this loads w/o Search::Xapian
 sub generate_i {
 	my ($ctx) = @_;
 	while (my $smsg = shift @{$ctx->{msgs}}) {
-		$ctx->{-inbox}->smsg_mime($smsg) and return $smsg;
+		return $smsg;
 	}
 }
 
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 4336e4d9b2d..249cf53926d 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -340,7 +340,7 @@ sub adump_i {
 		my $smsg = eval {
 			PublicInbox::Smsg::from_mitem($mi, $ctx->{srch});
 		} or next;
-		$ctx->{-inbox}->smsg_mime($smsg) and return $smsg;
+		return $smsg;
 	}
 }
 
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index c3fbb1a7cef..6ed0cb212d6 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -12,6 +12,7 @@ use warnings;
 use POSIX qw(strftime);
 use Digest::SHA qw(sha1_hex);
 use PublicInbox::Address;
+use PublicInbox::MID qw(mids);
 use PublicInbox::Hval qw(ascii_html mid_href);
 use PublicInbox::MsgTime qw(msg_timestamp);
 
@@ -99,9 +100,9 @@ sub atom_header {
 sub feed_entry {
 	my ($self, $smsg) = @_;
 	my $ctx = $self->{ctx};
-	my $mid = $smsg->mid; # may extract Message-ID from {mime}
-	my $mime = delete $smsg->{mime};
-	my $hdr = $mime->header_obj;
+	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return '';
+	my $hdr = $eml->header_obj;
+	my $mid = $smsg->{mid} // mids($hdr)->[0];
 	my $irt = PublicInbox::View::in_reply_to($hdr);
 	my $uuid = to_uuid($mid);
 	my $base = $ctx->{feed_base_url};
@@ -141,7 +142,7 @@ sub feed_entry {
 		qq(<pre\nstyle="white-space:pre-wrap">);
 	$ctx->{obuf} = \$s;
 	$ctx->{mhref} = $href;
-	PublicInbox::View::multipart_text_as_html($mime, $ctx);
+	PublicInbox::View::multipart_text_as_html($eml, $ctx);
 	delete $ctx->{obuf};
 	$s .= '</pre></div></content></entry>';
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/13] v2writable: fix non-sensical interpolation in BUG message
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
  2020-06-01 10:06 ` [PATCH 01/13] inbox: introduce smsg_eml method Eric Wong
  2020-06-01 10:06 ` [PATCH 02/13] wwwatomstream: convert callers to use smsg_eml Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 04/13] import: modernize to use Perl 5.10 features Eric Wong
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

No point in attempting to print the value of an undefined
variable if there's a bug.  Fortunately, (AFAIK) we've never hit
that bug check :>
---
 lib/PublicInbox/V2Writable.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index b393b31f0e2..1a824531f3c 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -182,7 +182,7 @@ sub _add {
 
 	my ($num, $mid0) = v2_num_for($self, $mime);
 	defined $num or return; # duplicate
-	defined $mid0 or die "BUG: $mid0 undefined\n";
+	defined $mid0 or die "BUG: \$mid0 undefined\n";
 	my $im = $self->importer;
 	my $smsg = bless { mid => $mid0, num => $num }, 'PublicInbox::Smsg';
 	my $cmt = $im->add($mime, undef, $smsg); # sets $smsg->{ds|ts|blob}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/13] import: modernize to use Perl 5.10 features
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (2 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 03/13] v2writable: fix non-sensical interpolation in BUG message Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 05/13] smsg: introduce ->populate method Eric Wong
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

First, prefer the leaner "parent" module over the heavy "base"
module to establish ISA relationships, since "base" is only
needed for "fields".

The "//" and "//=" operators allow us simplify our code and fix
minor bugs where a value of "0" was disallowed.  Yes, we'll
allow "0" as an email address, too, since some twisted BOFH
could theoretically use it as a local user name.

Going forward, we'll also be avoiding "use warnings" and
instead rely on `-w' in the shebang.
---
 lib/PublicInbox/Import.pm | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index a901350402c..1a7ed9ce878 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -7,8 +7,8 @@
 # requires read-only access.
 package PublicInbox::Import;
 use strict;
-use warnings;
-use base qw(PublicInbox::Lock);
+use parent qw(PublicInbox::Lock);
+use v5.10.1;
 use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::MID qw(mids mid2path);
 use PublicInbox::Address;
@@ -24,10 +24,10 @@ sub new {
 	my ($class, $git, $name, $email, $ibx) = @_;
 	my $ref = 'refs/heads/master';
 	if ($ibx) {
-		$ref = $ibx->{ref_head} || 'refs/heads/master';
-		$name ||= $ibx->{name};
-		$email ||= $ibx->{-primary_address};
-		$git ||= $ibx->git;
+		$ref = $ibx->{ref_head} // 'refs/heads/master';
+		$name //= $ibx->{name};
+		$email //= $ibx->{-primary_address};
+		$git //= $ibx->git;
 	}
 	bless {
 		git => $git,
@@ -252,7 +252,7 @@ sub remove {
 	}
 	my $ident = $self->{ident};
 	my $now = now_raw();
-	$msg ||= 'rm';
+	$msg //= 'rm';
 	my $len = length($msg) + 1;
 	print $w "commit $ref\nmark :$commit\n",
 		"author $ident $now\n",
@@ -277,21 +277,17 @@ sub git_timestamp {
 
 sub extract_cmt_info ($;$) {
 	my ($mime, $smsg) = @_;
+	# $mime is PublicInbox::Eml, but remains Email::MIME-compatible
 
 	my $sender = '';
-	my $from = $mime->header('From');
-	$from ||= '';
+	my $hdr = $mime->header_obj;
+	my $from = $hdr->header('From') // '';
 	my ($email) = PublicInbox::Address::emails($from);
 	my ($name) = PublicInbox::Address::names($from);
 	if (!defined($name) || !defined($email)) {
-		$sender = $mime->header('Sender');
-		$sender ||= '';
-		if (!defined($name)) {
-			($name) = PublicInbox::Address::names($sender);
-		}
-		if (!defined($email)) {
-			($email) = PublicInbox::Address::emails($sender);
-		}
+		$sender = $hdr->header('Sender') // '';
+		$name //= (PublicInbox::Address::names($sender))[0];
+		$email //= (PublicInbox::Address::emails($sender))[0];
 	}
 	if (defined $email) {
 		# Email::Address::XS may leave quoted '<' in addresses,
@@ -317,11 +313,8 @@ sub extract_cmt_info ($;$) {
 		warn "no name in From: $from or Sender: $sender\n";
 	}
 
-	my $hdr = $mime->header_obj;
-
-	my $subject = $hdr->header('Subject');
-	$subject = '(no subject)' unless defined $subject;
-	# Mime decoding can create nulls replace them with spaces to protect git
+	my $subject = $hdr->header('Subject') // '(no subject)';
+	# MIME decoding can create nulls replace them with spaces to protect git
 	$subject =~ tr/\0/ /;
 	utf8::encode($subject);
 	my $at = git_timestamp(my @at = msg_datestamp($hdr));

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/13] smsg: introduce ->populate method
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (3 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 04/13] import: modernize to use Perl 5.10 features Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 06/13] smsg: get rid of ->wrap initializer, too Eric Wong
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

This will eventually replace the __hdr() calling methods and
eradicate {mime} usage from Smsg.  For now, we can eliminate
PublicInbox::Smsg->new since most callers already rely on an
open `bless' to avoid the old {mime} arg.
---
 lib/PublicInbox/Import.pm     | 40 ++++++++++++++++----------------
 lib/PublicInbox/SearchIdx.pm  | 31 +++++++++++--------------
 lib/PublicInbox/Smsg.pm       | 43 +++++++++++++++++++++++++++--------
 lib/PublicInbox/V2Writable.pm |  9 ++++----
 t/import.t                    |  3 ++-
 5 files changed, 73 insertions(+), 53 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 1a7ed9ce878..ab75aa00dc2 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -12,7 +12,8 @@ use v5.10.1;
 use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::MID qw(mids mid2path);
 use PublicInbox::Address;
-use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
+use PublicInbox::Smsg;
+use PublicInbox::MsgTime qw(msg_datestamp);
 use PublicInbox::ContentHash qw(content_digest);
 use PublicInbox::MDA;
 use PublicInbox::Eml;
@@ -269,8 +270,8 @@ sub remove {
 	(($self->{tip} = ":$commit"), $cur);
 }
 
-sub git_timestamp {
-	my ($ts, $zone) = @_;
+sub git_timestamp ($) {
+	my ($ts, $zone) = @{$_[0]};
 	$ts = 0 if $ts < 0; # git uses unsigned times
 	"$ts $zone";
 }
@@ -278,10 +279,13 @@ sub git_timestamp {
 sub extract_cmt_info ($;$) {
 	my ($mime, $smsg) = @_;
 	# $mime is PublicInbox::Eml, but remains Email::MIME-compatible
+	$smsg //= bless {}, 'PublicInbox::Smsg';
 
-	my $sender = '';
 	my $hdr = $mime->header_obj;
-	my $from = $hdr->header('From') // '';
+	$smsg->populate($hdr);
+
+	my $sender = '';
+	my $from = delete($smsg->{From}) // '';
 	my ($email) = PublicInbox::Address::emails($from);
 	my ($name) = PublicInbox::Address::names($from);
 	if (!defined($name) || !defined($email)) {
@@ -313,17 +317,11 @@ sub extract_cmt_info ($;$) {
 		warn "no name in From: $from or Sender: $sender\n";
 	}
 
-	my $subject = $hdr->header('Subject') // '(no subject)';
-	# MIME decoding can create nulls replace them with spaces to protect git
-	$subject =~ tr/\0/ /;
+	my $subject = delete($smsg->{Subject}) // '(no subject)';
 	utf8::encode($subject);
-	my $at = git_timestamp(my @at = msg_datestamp($hdr));
-	my $ct = git_timestamp(my @ct = msg_timestamp($hdr));
-	if ($smsg) {
-		$smsg->{ds} = $at[0];
-		$smsg->{ts} = $ct[0];
-	}
-	($name, $email, $at, $ct, $subject);
+	my $at = git_timestamp(delete $smsg->{-ds});
+	my $ct = git_timestamp(delete $smsg->{-ts});
+	("$name <$email>", $at, $ct, $subject);
 }
 
 # kill potentially confusing/misleading headers
@@ -370,7 +368,7 @@ sub clean_tree_v2 ($$$) {
 sub add {
 	my ($self, $mime, $check_cb, $smsg) = @_;
 
-	my ($name, $email, $at, $ct, $subject) = extract_cmt_info($mime, $smsg);
+	my ($author, $at, $ct, $subject) = extract_cmt_info($mime, $smsg);
 	my $path_type = $self->{path_type};
 	my $path;
 	if ($path_type eq '2/38') {
@@ -414,7 +412,7 @@ sub add {
 	}
 
 	print $w "commit $ref\nmark :$commit\n",
-		"author $name <$email> $at\n",
+		"author $author $at\n",
 		"committer $self->{ident} $ct\n" or wfail;
 	print $w "data ", (length($subject) + 1), "\n",
 		$subject, "\n\n" or wfail;
@@ -502,11 +500,11 @@ sub digest2mid ($$) {
 
 sub rewrite_commit ($$$$) {
 	my ($self, $oids, $buf, $mime) = @_;
-	my ($name, $email, $at, $ct, $subject);
+	my ($author, $at, $ct, $subject);
 	if ($mime) {
-		($name, $email, $at, $ct, $subject) = extract_cmt_info($mime);
+		($author, $at, $ct, $subject) = extract_cmt_info($mime);
 	} else {
-		$name = $email = '';
+		$author = '<>';
 		$subject = 'purged '.join(' ', @$oids);
 	}
 	@$oids = ();
@@ -515,7 +513,7 @@ sub rewrite_commit ($$$$) {
 		my $l = $buf->[$i];
 		if ($l =~ /^author .* ([0-9]+ [\+-]?[0-9]+)$/) {
 			$at //= $1;
-			$buf->[$i] = "author $name <$email> $at\n";
+			$buf->[$i] = "author $author $at\n";
 		} elsif ($l =~ /^committer .* ([0-9]+ [\+-]?[0-9]+)$/) {
 			$ct //= $1;
 			$buf->[$i] = "committer $self->{ident} $ct\n";
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index b4088933dbf..eb228e6bba7 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -156,16 +156,14 @@ sub index_text ($$$$) {
 	}
 }
 
-sub index_users ($$) {
+sub index_headers ($$) {
 	my ($self, $smsg) = @_;
-
-	my $from = $smsg->from;
-	my $to = $smsg->to;
-	my $cc = $smsg->cc;
-
-	index_text($self, $from, 1, 'A'); # A - author
-	index_text($self, $to, 1, 'XTO') if $to ne '';
-	index_text($self, $cc, 1, 'XCC') if $cc ne '';
+	my @x = (from => 'A', # Author
+		subject => 'S', to => 'XTO', cc => 'XCC');
+	while (my ($field, $pfx) = splice(@x, 0, 2)) {
+		my $val = $smsg->{$field};
+		index_text($self, $val, 1, $pfx) if $val ne '';
+	}
 }
 
 sub index_diff_inc ($$$$) {
@@ -285,9 +283,9 @@ sub index_xapian { # msg_iter callback
 	if ($part->{is_submsg}) {
 		my $mids = mids_for_index($part);
 		index_ids($self, $doc, $part, $mids);
-		my $smsg = PublicInbox::Smsg->new($part);
-		index_users($self, $smsg);
-		index_text($self, $smsg->subject, 1, 'S') if $smsg->subject;
+		my $smsg = bless {}, 'PublicInbox::Smsg';
+		$smsg->populate($part);
+		index_headers($self, $smsg);
 	}
 
 	my ($s, undef) = msg_part_text($part, $ct);
@@ -335,10 +333,8 @@ sub index_ids ($$$$) {
 
 sub add_xapian ($$$$) {
 	my ($self, $mime, $smsg, $mids) = @_;
-	$smsg->{mime} = $mime; # XXX dangerous
 	my $hdr = $mime->header_obj;
 	my $doc = $X->{Document}->new;
-	my $subj = $smsg->subject;
 	add_val($doc, PublicInbox::Search::TS(), $smsg->{ts});
 	my @ds = gmtime($smsg->{ds});
 	my $yyyymmdd = strftime('%Y%m%d', @ds);
@@ -348,8 +344,7 @@ sub add_xapian ($$$$) {
 
 	my $tg = term_generator($self);
 	$tg->set_document($doc);
-	index_text($self, $subj, 1, 'S') if $subj;
-	index_users($self, $smsg);
+	index_headers($self, $smsg);
 
 	msg_iter($mime, \&index_xapian, [ $self, $doc ]);
 	index_ids($self, $doc, $hdr, $mids);
@@ -392,8 +387,7 @@ sub add_message {
 	};
 
 	# v1 and tests only:
-	$smsg->{ds} //= msg_datestamp($hdr, $self->{autime});
-	$smsg->{ts} //= msg_timestamp($hdr, $self->{cotime});
+	$smsg->populate($hdr, $self);
 
 	eval {
 		# order matters, overview stores every possible piece of
@@ -649,6 +643,7 @@ sub read_log {
 		my $mime = do_cat_mail($git, $blob, \$bytes);
 		$del_cb->($self, $mime);
 	}
+	delete @$self{qw(autime cotime)};
 	$batch_cb->($nr, $latest, $newest);
 }
 
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 7a2766d8ff8..8e2771274a1 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -17,11 +17,6 @@ use PublicInbox::Address;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use Time::Local qw(timegm);
 
-sub new {
-	my ($class, $mime) = @_;
-	bless { mime => $mime }, $class;
-}
-
 sub wrap {
 	my ($class, $mid) = @_;
 	bless { mid => $mid }, $class;
@@ -36,11 +31,11 @@ sub get_val ($$) {
 sub to_doc_data {
 	my ($self) = @_;
 	join("\n",
-		$self->subject,
-		$self->from,
+		$self->{subject},
+		$self->{from},
 		$self->references,
-		$self->to,
-		$self->cc,
+		$self->{to},
+		$self->{cc},
 		$self->{blob},
 		$self->{mid},
 		$self->{bytes} // '',
@@ -115,6 +110,36 @@ sub __hdr ($$) {
 	};
 }
 
+# for Import and v1 WWW code paths
+sub populate {
+	my ($self, $hdr, $v2w) = @_;
+	for my $f (qw(From To Cc Subject)) {
+		my @all = $hdr->header($f);
+		my $val = join(', ', @all);
+		$val =~ tr/\r//d;
+		# MIME decoding can create NULs, replace them with spaces
+		# to protect git and NNTP clients
+		$val =~ tr/\0\t\n/   /;
+
+		# lower-case fields for read-only stuff
+		$self->{lc($f)} = $val;
+
+		# Capitalized From/Subject for git-fast-import
+		next if $f eq 'To' || $f eq 'Cc';
+		if (scalar(@all) > 1) {
+			$val = $all[0];
+			$val =~ tr/\r//d;
+			$val =~ tr/\0\t\n/   /;
+		}
+		$self->{$f} = $val if $val ne '';
+	}
+	$v2w //= {};
+	$self->{-ds} = [ my @ds = msg_datestamp($hdr, $v2w->{autime}) ];
+	$self->{-ts} = [ my @ts = msg_timestamp($hdr, $v2w->{cotime}) ];
+	$self->{ds} //= $ds[0]; # no zone
+	$self->{ts} //= $ts[0];
+}
+
 sub subject ($) { __hdr($_[0], 'Subject') }
 sub to ($) { __hdr($_[0], 'To') }
 sub cc ($) { __hdr($_[0], 'Cc') }
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index 1a824531f3c..79bee7f9f3d 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -19,7 +19,6 @@ use PublicInbox::OverIdx;
 use PublicInbox::Msgmap;
 use PublicInbox::Spawn qw(spawn popen_rd);
 use PublicInbox::SearchIdx;
-use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use PublicInbox::MultiMidQueue;
 use IO::Handle; # ->autoflush
 use File::Temp qw(tempfile);
@@ -156,8 +155,6 @@ sub add {
 # indexes a message, returns true if checkpointing is needed
 sub do_idx ($$$$) {
 	my ($self, $msgref, $mime, $smsg) = @_;
-	$smsg->{ds} //= msg_datestamp($mime->header_obj, $self->{autime});
-	$smsg->{ts} //= msg_timestamp($mime->header_obj, $self->{cotime});
 	$self->{over}->add_overview($mime, $smsg);
 	my $idx = idx_shard($self, $smsg->{num} % $self->{shards});
 	$idx->index_raw($msgref, $mime, $smsg);
@@ -575,6 +572,8 @@ W: $list
 			num => $smsg->{num},
 			mid => $smsg->{mid},
 		}, 'PublicInbox::Smsg';
+		my $v2w = { autime => $smsg->{ds}, cotime => $smsg->{ts} };
+		$new_smsg->populate($new_mime, $v2w);
 		do_idx($self, \$raw, $new_mime, $new_smsg);
 	}
 	$rewritten->{rewrites};
@@ -968,6 +967,7 @@ sub reindex_oid_m ($$$$;$) {
 		blob => $oid,
 		mid => $mid0,
 	}, 'PublicInbox::Smsg';
+	$smsg->populate($mime, $self);
 	if (do_idx($self, $msgref, $mime, $smsg)) {
 		reindex_checkpoint($self, $sync, $git);
 	}
@@ -1059,6 +1059,7 @@ sub reindex_oid ($$$$) {
 		blob => $oid,
 		mid => $mid0,
 	}, 'PublicInbox::Smsg';
+	$smsg->populate($mime, $self);
 	if (do_idx($self, $msgref, $mime, $smsg)) {
 		reindex_checkpoint($self, $sync, $git);
 	}
@@ -1298,7 +1299,7 @@ sub index_epoch ($$$) {
 		}
 	}
 	close $fh or die "git log failed: \$?=$?";
-	delete $self->{reindex_pipe};
+	delete @$self{qw(reindex_pipe autime cotime)};
 	update_last_commit($self, $git, $i, $cmt) if defined $cmt;
 }
 
diff --git a/t/import.t b/t/import.t
index 3f308299148..f987b1141f7 100644
--- a/t/import.t
+++ b/t/import.t
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use PublicInbox::Eml;
+use PublicInbox::Smsg;
 use PublicInbox::Git;
 use PublicInbox::Import;
 use PublicInbox::Spawn qw(spawn);
@@ -26,7 +27,7 @@ hello world
 EOF
 
 my $v2 = require_git(2.6, 1);
-my $smsg = {} if $v2;
+my $smsg = bless {}, 'PublicInbox::Smsg' if $v2;
 like($im->add($mime, undef, $smsg), qr/\A:[0-9]+\z/, 'added one message');
 
 if ($v2) {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/13] smsg: get rid of ->wrap initializer, too
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (4 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 05/13] smsg: introduce ->populate method Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 07/13] inbox: msg_by_*: remove $(size)ref args Eric Wong
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

We'll just use `bless' like most current PublicInbox::Smsg callers.
---
 lib/PublicInbox/SearchIdx.pm | 2 +-
 lib/PublicInbox/Smsg.pm      | 5 -----
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index eb228e6bba7..f10a9104e78 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -484,7 +484,7 @@ sub remove_by_oid {
 	for (; $head != $tail; $head++) {
 		my $docid = $head->get_docid;
 		my $doc = $db->get_document($docid);
-		my $smsg = PublicInbox::Smsg->wrap($mid);
+		my $smsg = bless { mid => $mid }, 'PublicInbox::Smsg';
 		$smsg->load_expand($doc);
 		if ($smsg->{blob} eq $oid) {
 			push(@delete, $docid);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 8e2771274a1..446bca81b48 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -17,11 +17,6 @@ use PublicInbox::Address;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use Time::Local qw(timegm);
 
-sub wrap {
-	my ($class, $mid) = @_;
-	bless { mid => $mid }, $class;
-}
-
 sub get_val ($$) {
 	my ($doc, $col) = @_;
 	# sortable_unserialise is defined by PublicInbox::Search::load_xapian()

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 07/13] inbox: msg_by_*: remove $(size)ref args
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (5 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 06/13] smsg: get rid of ->wrap initializer, too Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 08/13] www: remove smsg_mime API and adjust callers Eric Wong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

None of our current callers care about the size of the blob
we're retrieving, so stop wasting stack space and code for
it.
---
 lib/PublicInbox/Inbox.pm | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index bd1489e3699..38abdfe5847 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -310,25 +310,25 @@ sub nntp_usable {
 }
 
 # for v1 users w/o SQLite only
-sub msg_by_path ($$;$) {
-	my ($self, $path, $ref) = @_;
-	git($self)->cat_file('HEAD:'.$path, $ref);
+sub msg_by_path ($$) {
+	my ($self, $path) = @_;
+	git($self)->cat_file('HEAD:'.$path);
 }
 
-sub msg_by_smsg ($$;$) {
-	my ($self, $smsg, $ref) = @_;
+sub msg_by_smsg ($$) {
+	my ($self, $smsg) = @_;
 
 	# ghosts may have undef smsg (from SearchThread.node) or
 	# no {blob} field
 	return unless defined $smsg;
 	defined(my $blob = $smsg->{blob}) or return;
 
-	git($self)->cat_file($blob, $ref);
+	git($self)->cat_file($blob);
 }
 
 sub smsg_mime {
-	my ($self, $smsg, $ref) = @_;
-	if (my $s = msg_by_smsg($self, $smsg, $ref)) {
+	my ($self, $smsg) = @_;
+	if (my $s = msg_by_smsg($self, $smsg)) {
 		$smsg->{mime} = PublicInbox::Eml->new($s);
 		return $smsg;
 	}
@@ -355,14 +355,14 @@ sub smsg_by_mid ($$) {
 	PublicInbox::Smsg::psgi_cull($smsg);
 }
 
-sub msg_by_mid ($$;$) {
-	my ($self, $mid, $ref) = @_;
+sub msg_by_mid ($$) {
+	my ($self, $mid) = @_;
 
 	over($self) or
-		return msg_by_path($self, mid2path($mid), $ref);
+		return msg_by_path($self, mid2path($mid));
 
 	my $smsg = smsg_by_mid($self, $mid);
-	$smsg ? msg_by_smsg($self, $smsg, $ref) : undef;
+	$smsg ? msg_by_smsg($self, $smsg) : undef;
 }
 
 sub recent {

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 08/13] www: remove smsg_mime API and adjust callers
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (6 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 07/13] inbox: msg_by_*: remove $(size)ref args Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 09/13] nntp: smsg_range_i: favor ->{$field} lookups when possible Eric Wong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

To further simplify callers and avoid embarrasing memory
explosions[1], we can finally eliminate this method in
favor of smsg_eml.

[1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5
    ("view: stop storing all MIME objects on large threads")
    fixed a huge memory blowup.
---
 Documentation/mknews.perl     |  7 ++--
 lib/PublicInbox/Feed.pm       |  6 ++--
 lib/PublicInbox/Inbox.pm      | 12 ++-----
 lib/PublicInbox/SearchView.pm |  4 +--
 lib/PublicInbox/Smsg.pm       |  7 ++--
 lib/PublicInbox/View.pm       | 63 +++++++++++++++++------------------
 t/v2mda.t                     |  4 +--
 7 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/Documentation/mknews.perl b/Documentation/mknews.perl
index 3bdebfce7a5..965c30c1dcb 100755
--- a/Documentation/mknews.perl
+++ b/Documentation/mknews.perl
@@ -102,9 +102,10 @@ sub mime2txt {
 }
 
 sub mime2html {
-	my ($out, $mime, $ctx) = @_;
-	my $smsg = bless { mime => $mime }, 'PublicInbox::Smsg';
-	print $out PublicInbox::View::index_entry($smsg, $ctx, 1) or die;
+	my ($out, $eml, $ctx) = @_;
+	my $smsg = bless {}, 'PublicInbox::Smsg';
+	$smsg->populate($eml);
+	print $out PublicInbox::View::eml_entry($ctx, $smsg, $eml, 1) or die;
 }
 
 sub html_start {
diff --git a/lib/PublicInbox/Feed.pm b/lib/PublicInbox/Feed.pm
index b770a35077c..4c1056b4665 100644
--- a/lib/PublicInbox/Feed.pm
+++ b/lib/PublicInbox/Feed.pm
@@ -53,9 +53,9 @@ sub new_html_i {
 	my ($nr, $ctx) = @_;
 	my $msgs = $ctx->{msgs};
 	while (my $smsg = shift @$msgs) {
-		my $m = $ctx->{-inbox}->smsg_mime($smsg) or next;
-		my $more = scalar @$msgs;
-		return PublicInbox::View::index_entry($m, $ctx, $more);
+		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
+		return PublicInbox::View::eml_entry($ctx, $smsg, $eml,
+							scalar @$msgs);
 	}
 	PublicInbox::View::pagination_footer($ctx, './new.html');
 }
diff --git a/lib/PublicInbox/Inbox.pm b/lib/PublicInbox/Inbox.pm
index 38abdfe5847..af034358b15 100644
--- a/lib/PublicInbox/Inbox.pm
+++ b/lib/PublicInbox/Inbox.pm
@@ -326,18 +326,12 @@ sub msg_by_smsg ($$) {
 	git($self)->cat_file($blob);
 }
 
-sub smsg_mime {
-	my ($self, $smsg) = @_;
-	if (my $s = msg_by_smsg($self, $smsg)) {
-		$smsg->{mime} = PublicInbox::Eml->new($s);
-		return $smsg;
-	}
-}
-
 sub smsg_eml {
 	my ($self, $smsg) = @_;
 	my $bref = msg_by_smsg($self, $smsg) or return;
-	PublicInbox::Eml->new($bref);
+	my $eml = PublicInbox::Eml->new($bref);
+	$smsg->populate($eml) unless exists($smsg->{num}); # v1 w/o SQLite
+	$eml;
 }
 
 sub mid2num($$) {
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 249cf53926d..d53a533e53c 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -300,8 +300,8 @@ sub mset_thread_i {
 	my ($nr, $ctx) = @_;
 	my $msgs = $ctx->{msgs} or return;
 	while (my $smsg = pop @$msgs) {
-		$ctx->{-inbox}->smsg_mime($smsg) or next;
-		return PublicInbox::View::index_entry($smsg, $ctx,
+		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
+		return PublicInbox::View::eml_entry($ctx, $smsg, $eml,
 							scalar @$msgs);
 	}
 	my ($skel) = delete @$ctx{qw(skel msgs)};
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 446bca81b48..9688c5592a2 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -12,7 +12,7 @@ use strict;
 use warnings;
 use base qw(Exporter);
 our @EXPORT_OK = qw(subject_normalized);
-use PublicInbox::MID qw/mid_mime/;
+use PublicInbox::MID qw(mid_mime mids);
 use PublicInbox::Address;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use Time::Local qw(timegm);
@@ -105,7 +105,7 @@ sub __hdr ($$) {
 	};
 }
 
-# for Import and v1 WWW code paths
+# for Import and v1 non-SQLite WWW code paths
 sub populate {
 	my ($self, $hdr, $v2w) = @_;
 	for my $f (qw(From To Cc Subject)) {
@@ -133,6 +133,9 @@ sub populate {
 	$self->{-ts} = [ my @ts = msg_timestamp($hdr, $v2w->{cotime}) ];
 	$self->{ds} //= $ds[0]; # no zone
 	$self->{ts} //= $ts[0];
+
+	# for v1 users w/o SQLite
+	$self->{mid} //= eval { mids($hdr)->[0] } // '';
 }
 
 sub subject ($) { __hdr($_[0], 'Subject') }
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a05ac4142f2..0bc2b06e4dc 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -74,11 +74,10 @@ sub msg_page_more { # cold
 	my $ibx = $ctx->{-inbox};
 	my $next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
 	$ctx->{more} = [ $id, $prev, $next ] if $next;
-	$smsg = $ibx->smsg_mime($smsg) or return '';
+	my $eml = $ibx->smsg_eml($smsg) or return '';
 	$ctx->{mhref} = '../' . mid_href($smsg->{mid}) . '/';
-	my $mime = delete $smsg->{mime};
-	$ctx->{obuf} = _msg_page_prepare_obuf($mime->header_obj, $ctx, $nr);
-	multipart_text_as_html($mime, $ctx);
+	$ctx->{obuf} = _msg_page_prepare_obuf($eml->header_obj, $ctx, $nr);
+	multipart_text_as_html($eml, $ctx);
 	${delete $ctx->{obuf}} .= '</pre><hr>';
 }
 
@@ -181,14 +180,14 @@ sub nr_to_s ($$$) {
 # human-friendly format
 sub fmt_ts ($) { strftime('%Y-%m-%d %k:%M', gmtime($_[0])) }
 
+# Displays the text of of the message for /$INBOX/$MSGID/[Tt]/ endpoint
 # this is already inside a <pre>
-sub index_entry {
-	my ($smsg, $ctx, $more) = @_;
-	my $subj = $smsg->subject;
-	my $mid_raw = $smsg->mid;
+sub eml_entry {
+	my ($ctx, $smsg, $eml, $more) = @_;
+	my $subj = delete $smsg->{subject};
+	my $mid_raw = $smsg->{mid};
 	my $id = id_compress($mid_raw, 1);
 	my $id_m = 'm'.$id;
-
 	my $root_anchor = $ctx->{root_anchor} || '';
 	my $irt;
 	my $obfs_ibx = $ctx->{-obfs_ibx};
@@ -201,12 +200,12 @@ sub index_entry {
 	$rv .= $subj . "\n";
 	$rv .= _th_index_lite($mid_raw, \$irt, $id, $ctx);
 	my @tocc;
-	my $ds = $smsg->ds; # for v1 non-Xapian/SQLite users
-	# deleting {mime} is critical to memory use,
-	# the rest of the fields saves about 400K as we iterate across 1K msgs
-	my ($mime) = delete @$smsg{qw(mime ds ts blob subject)};
+	my $ds = delete $smsg->{ds}; # for v1 non-Xapian/SQLite users
+
+	# Deleting these fields saves about 400K as we iterate across 1K msgs
+	delete @$smsg{qw(ts blob)};
 
-	my $hdr = $mime->header_obj;
+	my $hdr = $eml->header_obj;
 	my $from = _hdr_names_html($hdr, 'From');
 	obfuscate_addrs($obfs_ibx, $from) if $obfs_ibx;
 	$rv .= "From: $from @ ".fmt_ts($ds)." UTC";
@@ -244,7 +243,7 @@ sub index_entry {
 	# scan through all parts, looking for displayable text
 	$ctx->{mhref} = $mhref;
 	$ctx->{obuf} = \$rv;
-	$mime->each_part(\&add_text_body, $ctx, 1);
+	$eml->each_part(\&add_text_body, $ctx, 1);
 	delete $ctx->{obuf};
 
 	# add the footer
@@ -372,10 +371,10 @@ sub pre_thread  { # walk_thread callback
 	skel_dump($ctx, $level, $node);
 }
 
-sub thread_index_entry {
-	my ($ctx, $level, $smsg) = @_;
+sub thread_eml_entry {
+	my ($ctx, $level, $smsg, $eml) = @_;
 	my ($beg, $end) = thread_adj_level($ctx, $level);
-	$beg . '<pre>' . index_entry($smsg, $ctx, 0) . '</pre>' . $end;
+	$beg . '<pre>' . eml_entry($ctx, $smsg, $eml, 0) . '</pre>' . $end;
 }
 
 sub stream_thread_i { # PublicInbox::WwwStream::getline callback
@@ -387,8 +386,8 @@ sub stream_thread_i { # PublicInbox::WwwStream::getline callback
 		my $node = shift @$q or next;
 		my $cl = $level + 1;
 		unshift @$q, map { ($cl, $_) } @{$node->{children}};
-		if ($ctx->{-inbox}->smsg_mime($node)) {
-			return thread_index_entry($ctx, $level, $node);
+		if (my $eml = $ctx->{-inbox}->smsg_eml($node)) {
+			return thread_eml_entry($ctx, $level, $node, $eml);
 		} else {
 			return ghost_index_entry($ctx, $level, $node);
 		}
@@ -400,19 +399,19 @@ sub stream_thread ($$) {
 	my ($rootset, $ctx) = @_;
 	my $ibx = $ctx->{-inbox};
 	my @q = map { (0, $_) } @$rootset;
-	my ($smsg, $level);
+	my ($smsg, $eml, $level);
 	while (@q) {
 		$level = shift @q;
-		my $node = shift @q or next;
+		$smsg = shift @q or next;
 		my $cl = $level + 1;
-		unshift @q, map { ($cl, $_) } @{$node->{children}};
-		$smsg = $ibx->smsg_mime($node) and last;
+		unshift @q, map { ($cl, $_) } @{$smsg->{children}};
+		$eml = $ibx->smsg_eml($smsg) and last;
 	}
-	return missing_thread($ctx) unless $smsg;
+	return missing_thread($ctx) unless $eml;
 
 	$ctx->{-obfs_ibx} = $ibx->{obfuscate} ? $ibx : undef;
 	$ctx->{-title_html} = ascii_html($smsg->{subject});
-	$ctx->{-html_tip} = thread_index_entry($ctx, $level, $smsg);
+	$ctx->{-html_tip} = thread_eml_entry($ctx, $level, $smsg, $eml);
 	$ctx->{-queue} = \@q;
 	PublicInbox::WwwStream->response($ctx, 200, \&stream_thread_i);
 }
@@ -452,13 +451,13 @@ sub thread_html {
 	return stream_thread($rootset, $ctx) unless $ctx->{flat};
 
 	# flat display: lazy load the full message from smsg
-	my $smsg;
-	while (my $m = shift @$msgs) {
-		$smsg = $ibx->smsg_mime($m) and last;
+	my ($smsg, $eml);
+	while ($smsg = shift @$msgs) {
+		$eml = $ibx->smsg_eml($smsg) and last;
 	}
 	return missing_thread($ctx) unless $smsg;
 	$ctx->{-title_html} = ascii_html($smsg->{subject});
-	$ctx->{-html_tip} = '<pre>'.index_entry($smsg, $ctx, scalar @$msgs);
+	$ctx->{-html_tip} = '<pre>'.eml_entry($ctx, $smsg, $eml, scalar @$msgs);
 	$ctx->{msgs} = $msgs;
 	PublicInbox::WwwStream->response($ctx, 200, \&thread_html_i);
 }
@@ -467,8 +466,8 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 	my ($nr, $ctx) = @_;
 	my $msgs = $ctx->{msgs} or return;
 	while (my $smsg = shift @$msgs) {
-		$ctx->{-inbox}->smsg_mime($smsg) or next;
-		return index_entry($smsg, $ctx, scalar @$msgs);
+		my $eml = $ctx->{-inbox}->smsg_eml($smsg) or next;
+		return eml_entry($ctx, $smsg, $eml, scalar @$msgs);
 	}
 	my ($skel) = delete @$ctx{qw(skel msgs)};
 	$$skel;
diff --git a/t/v2mda.t b/t/v2mda.t
index 36f43ff096c..7666eb2dacd 100644
--- a/t/v2mda.t
+++ b/t/v2mda.t
@@ -52,8 +52,8 @@ if ($V == 1) {
 }
 my $msgs = $ibx->search->query('');
 is(scalar(@$msgs), 1, 'only got one message');
-my $saved = $ibx->smsg_mime($msgs->[0]);
-is($saved->{mime}->as_string, $mime->as_string, 'injected message');
+my $eml = $ibx->smsg_eml($msgs->[0]);
+is($eml->as_string, $mime->as_string, 'injected message');
 
 {
 	my @new = glob("$faildir/new/*");

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 09/13] nntp: smsg_range_i: favor ->{$field} lookups when possible
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (7 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 08/13] www: remove smsg_mime API and adjust callers Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 10/13] smsg: get rid of remaining {mime} users Eric Wong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

PublicInbox::Smsg::date remains the only exception which
requires any subroutine calls, here, so we'll just have
a branch just for that.
---
 lib/PublicInbox/NNTP.pm | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index 54207500dd8..a37910d1739 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -722,8 +722,16 @@ sub smsg_range_i {
 	my $msgs = $over->query_xover($$beg, $end);
 	scalar(@$msgs) or return;
 	my $tmp = '';
-	foreach my $s (@$msgs) {
-		$tmp .= $s->{num} . ' ' . $s->$field . "\r\n";
+
+	# ->{$field} is faster than ->$field invocations, so favor that.
+	if ($field eq 'date') {
+		for my $s (@$msgs) {
+			$tmp .= "$s->{num} ".PublicInbox::Smsg::date($s)."\r\n"
+		}
+	} else {
+		for my $s (@$msgs) {
+			$tmp .= "$s->{num} $s->{$field}\r\n";
+		}
 	}
 	utf8::encode($tmp);
 	$self->msg_more($tmp);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 10/13] smsg: get rid of remaining {mime} users
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (8 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 09/13] nntp: smsg_range_i: favor ->{$field} lookups when possible Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 11/13] smsg: remove ->bytes and ->lines methods Eric Wong
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

We'll let $smsg->populate take care of everything all at once
without hanging onto the header object for too long.
---
 lib/PublicInbox/OverIdx.pm |  1 -
 lib/PublicInbox/Smsg.pm    | 33 +++------------------------------
 2 files changed, 3 insertions(+), 31 deletions(-)

diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index cb15baadf2b..a078f80451f 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -254,7 +254,6 @@ sub subject_path ($) {
 sub add_overview {
 	my ($self, $mime, $smsg) = @_;
 	$smsg->{lines} = $mime->body_raw =~ tr!\n!\n!;
-	$smsg->{mime} = $mime; # XXX temporary?
 	my $hdr = $mime->header_obj;
 	my $mids = mids_for_index($hdr);
 	my $refs = parse_references($smsg, $hdr, $mids);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 9688c5592a2..9e363a112c0 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -12,7 +12,7 @@ use strict;
 use warnings;
 use base qw(Exporter);
 our @EXPORT_OK = qw(subject_normalized);
-use PublicInbox::MID qw(mid_mime mids);
+use PublicInbox::MID qw(mids);
 use PublicInbox::Address;
 use PublicInbox::MsgTime qw(msg_timestamp msg_datestamp);
 use Time::Local qw(timegm);
@@ -96,13 +96,7 @@ sub lines ($) { $_[0]->{lines} }
 
 sub __hdr ($$) {
 	my ($self, $field) = @_;
-	$self->{lc($field)} //= do {
-		my $mime = $self->{mime} or return;
-		my $val = join(', ', $mime->header($field));
-		$val =~ tr/\r//d;
-		$val =~ tr/\t\n/  /;
-		$val;
-	};
+	$self->{lc($field)};
 }
 
 # for Import and v1 non-SQLite WWW code paths
@@ -174,34 +168,13 @@ sub from_name {
 	$self->{from_name};
 }
 
-sub ts {
-	my ($self) = @_;
-	$self->{ts} ||= eval { msg_timestamp($self->{mime}->header_obj) } || 0;
-}
-
-sub ds {
-	my ($self) = @_;
-	$self->{ds} ||= eval { msg_datestamp($self->{mime}->header_obj); } || 0;
-}
-
 sub references {
 	my ($self) = @_;
 	my $x = $self->{references};
 	defined $x ? $x : '';
 }
 
-sub mid ($;$) {
-	my ($self, $mid) = @_;
-
-	if (defined $mid) {
-		$self->{mid} = $mid;
-	} elsif (defined(my $rv = $self->{mid})) {
-		$rv;
-	} else {
-		die "NO {mime} for mid\n" unless $self->{mime};
-		mid_mime($self->{mime}) # v1 w/o Xapian
-	}
-}
+sub mid { $_[0]->{mid} }
 
 our $REPLY_RE = qr/^re:\s+/i;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 11/13] smsg: remove ->bytes and ->lines methods
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (9 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 10/13] smsg: get rid of remaining {mime} users Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 12/13] smsg: remove remaining accessor methods Eric Wong
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

They're stored directly in Xapian and SQLite document data.
NNTP accesses those fields directly to avoid method invocation
overhead so there's no reason to waste several kilobytes for
each sub.
---
 lib/PublicInbox/Smsg.pm | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index 9e363a112c0..a7ee2e40939 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -90,10 +90,6 @@ sub from_mitem {
 	psgi_cull(load_expand($self, $mitem->get_document));
 }
 
-# :bytes and :lines metadata in RFC 3977
-sub bytes ($) { $_[0]->{bytes} }
-sub lines ($) { $_[0]->{lines} }
-
 sub __hdr ($$) {
 	my ($self, $field) = @_;
 	$self->{lc($field)};

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 12/13] smsg: remove remaining accessor methods
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (10 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 11/13] smsg: remove ->bytes and ->lines methods Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 10:06 ` [PATCH 13/13] wwwatomstream: drop smsg->{mid} fallback for non-SQLite Eric Wong
  2020-06-01 16:45 ` [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

We'll continue to favor simpler data models that can be
used directly rather than wasting time and memory with
accessor APIs.

The ->from, ->to, -cc, ->mid, ->subject, >references methods can
all be trivially replaced by hash lookups since all their values
are stored in doc_data.  Most remaining callers of those methods
were test cases, anyways.

->from_name is only used in the PSGI code, so we can just
use ->psgi_cull to take care of populating the {from_name}
field.
---
 lib/PublicInbox/ExtMsg.pm  |  2 +-
 lib/PublicInbox/Mbox.pm    |  2 +-
 lib/PublicInbox/NNTP.pm    |  2 +-
 lib/PublicInbox/OverIdx.pm |  2 +-
 lib/PublicInbox/Smsg.pm    | 45 ++++++-------------------------------
 t/altid.t                  |  3 ++-
 t/altid_v2.t               |  3 ++-
 t/search.t                 | 46 +++++++++++++++++++++-----------------
 t/v2writable.t             |  5 +++--
 9 files changed, 44 insertions(+), 66 deletions(-)

diff --git a/lib/PublicInbox/ExtMsg.pm b/lib/PublicInbox/ExtMsg.pm
index 1d17c2ce673..d7917b34fb4 100644
--- a/lib/PublicInbox/ExtMsg.pm
+++ b/lib/PublicInbox/ExtMsg.pm
@@ -30,7 +30,7 @@ our @EXT_URL = map { ascii_html($_) } (
 sub PARTIAL_MAX () { 100 }
 
 sub mids_from_mset { # Search::retry_reopen callback
-	[ map { PublicInbox::Smsg::from_mitem($_)->mid } $_[0]->items ];
+	[ map { PublicInbox::Smsg::from_mitem($_)->{mid} } $_[0]->items ];
 }
 
 sub search_partial ($$) {
diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 94e61d4d2ff..b46dacfdc88 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -151,7 +151,7 @@ sub thread_mbox {
 	return [404, [qw(Content-Type text/plain)], []] if !@$msgs;
 	$ctx->{prev} = $msgs->[-1];
 	$ctx->{over} = $over; # bump refcnt
-	PublicInbox::MboxGz->response($ctx, \&thread_cb, $msgs->[0]->subject);
+	PublicInbox::MboxGz->response($ctx, \&thread_cb, $msgs->[0]->{subject});
 }
 
 sub emit_range {
diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm
index a37910d1739..ac13c7df8ce 100644
--- a/lib/PublicInbox/NNTP.pm
+++ b/lib/PublicInbox/NNTP.pm
@@ -296,7 +296,7 @@ sub newnews_i {
 	my $msgs = $over->query_ts($ts, $$prev);
 	if (scalar @$msgs) {
 		more($self, '<' .
-			join(">\r\n<", map { $_->mid } @$msgs ).
+			join(">\r\n<", map { $_->{mid} } @$msgs ).
 			'>');
 		$$prev = $msgs->[-1]->{num};
 	} else {
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index a078f80451f..c7f45a6c910 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -257,7 +257,7 @@ sub add_overview {
 	my $hdr = $mime->header_obj;
 	my $mids = mids_for_index($hdr);
 	my $refs = parse_references($smsg, $hdr, $mids);
-	my $subj = $smsg->subject;
+	my $subj = $smsg->{subject};
 	my $xpath;
 	if ($subj ne '') {
 		$xpath = subject_path($subj);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index a7ee2e40939..e8f9c9a3681 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -28,7 +28,7 @@ sub to_doc_data {
 	join("\n",
 		$self->{subject},
 		$self->{from},
-		$self->references,
+		$self->{references} // '',
 		$self->{to},
 		$self->{cc},
 		$self->{blob},
@@ -74,11 +74,15 @@ sub load_expand {
 
 sub psgi_cull ($) {
 	my ($self) = @_;
-	from_name($self); # fill in {from_name} so we can delete {from}
+
+	# ghosts don't have ->{from}
+	my $from = delete($self->{from}) // '';
+	my @n = PublicInbox::Address::names($from);
+	$self->{from_name} = join(', ', @n);
 
 	# drop NNTP-only fields which aren't relevant to PSGI results:
 	# saves ~80K on a 200 item search result:
-	delete @$self{qw(from ts to cc bytes lines)};
+	delete @$self{qw(ts to cc bytes lines)};
 	$self;
 }
 
@@ -90,11 +94,6 @@ sub from_mitem {
 	psgi_cull(load_expand($self, $mitem->get_document));
 }
 
-sub __hdr ($$) {
-	my ($self, $field) = @_;
-	$self->{lc($field)};
-}
-
 # for Import and v1 non-SQLite WWW code paths
 sub populate {
 	my ($self, $hdr, $v2w) = @_;
@@ -128,10 +127,6 @@ sub populate {
 	$self->{mid} //= eval { mids($hdr)->[0] } // '';
 }
 
-sub subject ($) { __hdr($_[0], 'Subject') }
-sub to ($) { __hdr($_[0], 'To') }
-sub cc ($) { __hdr($_[0], 'Cc') }
-
 # no strftime, that is locale-dependent and not for RFC822
 my @DoW = qw(Sun Mon Tue Wed Thu Fri Sat);
 my @MoY = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
@@ -146,32 +141,6 @@ sub date ($) {
 
 }
 
-sub from ($) {
-	my ($self) = @_;
-	my $from = __hdr($self, 'From');
-	if (defined $from && !defined $self->{from_name}) {
-		my @n = PublicInbox::Address::names($from);
-		$self->{from_name} = join(', ', @n);
-	}
-	$from;
-}
-
-sub from_name {
-	my ($self) = @_;
-	my $from_name = $self->{from_name};
-	return $from_name if defined $from_name;
-	$self->from;
-	$self->{from_name};
-}
-
-sub references {
-	my ($self) = @_;
-	my $x = $self->{references};
-	defined $x ? $x : '';
-}
-
-sub mid { $_[0]->{mid} }
-
 our $REPLY_RE = qr/^re:\s+/i;
 
 sub subject_normalized ($) {
diff --git a/t/altid.t b/t/altid.t
index 670a3963375..f3c01520c6d 100644
--- a/t/altid.t
+++ b/t/altid.t
@@ -47,7 +47,8 @@ EOF
 {
 	my $ro = PublicInbox::Search->new($ibx);
 	my $msgs = $ro->query("gmane:1234");
-	is_deeply([map { $_->mid } @$msgs], ['a@example.com'], 'got one match');
+	$msgs = [ map { $_->{mid} } @$msgs ];
+	is_deeply($msgs, ['a@example.com'], 'got one match');
 
 	$msgs = $ro->query("gmane:666");
 	is_deeply([], $msgs, 'body did NOT match');
diff --git a/t/altid_v2.t b/t/altid_v2.t
index 28a047d992e..01ed9ed48db 100644
--- a/t/altid_v2.t
+++ b/t/altid_v2.t
@@ -42,7 +42,8 @@ EOF
 $v2w->done;
 
 my $msgs = $ibx->search->reopen->query("gmane:1234");
-is_deeply([map { $_->mid } @$msgs], ['a@example.com'], 'got one match');
+$msgs = [ map { $_->{mid} } @$msgs ];
+is_deeply($msgs, ['a@example.com'], 'got one match');
 $msgs = $ibx->search->query("gmane:666");
 is_deeply([], $msgs, 'body did NOT match');
 
diff --git a/t/search.t b/t/search.t
index 9d74f5e0532..6cf2bc2d6b4 100644
--- a/t/search.t
+++ b/t/search.t
@@ -92,7 +92,7 @@ EOF
 
 sub filter_mids {
 	my ($msgs) = @_;
-	sort(map { $_->mid } @$msgs);
+	sort(map { $_->{mid} } @$msgs);
 }
 
 {
@@ -100,7 +100,7 @@ sub filter_mids {
 	$ro->reopen;
 	my $found = $ro->query('m:root@s');
 	is(scalar(@$found), 1, "message found");
-	is($found->[0]->mid, 'root@s', 'mid set correctly') if scalar(@$found);
+	is($found->[0]->{mid}, 'root@s', 'mid set correctly') if @$found;
 
 	my ($res, @res);
 	my @exp = sort qw(root@s last@s);
@@ -176,7 +176,7 @@ EOF
 	# body
 	$res = $ro->query('goodbye');
 	is(scalar(@$res), 1, "goodbye message found");
-	is($res->[0]->mid, 'last@s', 'got goodbye message body') if scalar(@$res);
+	is($res->[0]->{mid}, 'last@s', 'got goodbye message body') if @$res;
 
 	# datestamp
 	$res = $ro->query('dt:20101002000001..20101002000001');
@@ -257,12 +257,13 @@ fade
 EOF
 	my $res = $rw->query("theatre");
 	is(scalar(@$res), 2, "got both matches");
-	is($res->[0]->mid, 'nquote@a', "non-quoted scores higher") if scalar(@$res);
-	is($res->[1]->mid, 'quote@a', "quoted result still returned") if scalar(@$res);
-
+	if (@$res == 2) {
+		is($res->[0]->{mid}, 'nquote@a', 'non-quoted scores higher');
+		is($res->[1]->{mid}, 'quote@a', 'quoted result still returned');
+	}
 	$res = $rw->query("illusions");
 	is(scalar(@$res), 1, "got a match for quoted text");
-	is($res->[0]->mid, 'quote@a',
+	is($res->[0]->{mid}, 'quote@a',
 		"quoted result returned if nothing else") if scalar(@$res);
 });
 
@@ -283,8 +284,10 @@ EOF
 	ok($doc_id > 0, "doc_id defined with circular reference");
 	my $smsg = $rw->query('m:circle@a', {limit=>1})->[0];
 	is(defined($smsg), 1, 'found m:circl@a');
-	is($smsg->references, '', "no references created") if defined($smsg);
-	is($smsg->subject, $s, 'long subject not rewritten') if defined($smsg);
+	if (defined $smsg) {
+		is($smsg->{references}, '', "no references created");
+		is($smsg->{subject}, $s, 'long subject not rewritten');
+	}
 });
 
 $ibx->with_umask(sub {
@@ -293,7 +296,10 @@ $ibx->with_umask(sub {
 	ok($doc_id > 0, 'message indexed doc_id with UTF-8');
 	my $msg = $rw->query('m:testmessage@example.com', {limit => 1})->[0];
 	is(defined($msg), 1, 'found testmessage@example.com');
-	is($mime->header('Subject'), $msg->subject, 'UTF-8 subject preserved') if defined($msg);
+	if (defined $msg) {
+		is($mime->header('Subject'), $msg->{subject},
+			'UTF-8 subject preserved');
+	}
 });
 
 {
@@ -311,14 +317,14 @@ $ibx->with_umask(sub {
 	is($mset->size, 6, 'searched To: successfully');
 	foreach my $m ($mset->items) {
 		my $smsg = $ro->{over_ro}->get_art($m->get_docid);
-		like($smsg->to, qr/\blist\@example\.com\b/, 'to appears');
+		like($smsg->{to}, qr/\blist\@example\.com\b/, 'to appears');
 	}
 
 	$mset = $ro->query('tc:list@example.com', {mset => 1});
 	is($mset->size, 6, 'searched To+Cc: successfully');
 	foreach my $m ($mset->items) {
 		my $smsg = $ro->{over_ro}->get_art($m->get_docid);
-		my $tocc = join("\n", $smsg->to, $smsg->cc);
+		my $tocc = join("\n", $smsg->{to}, $smsg->{cc});
 		like($tocc, qr/\blist\@example\.com\b/, 'tocc appears');
 	}
 
@@ -327,7 +333,7 @@ $ibx->with_umask(sub {
 		is($mset->items, 1, "searched $pfx successfully for Cc:");
 		foreach my $m ($mset->items) {
 			my $smsg = $ro->{over_ro}->get_art($m->get_docid);
-			like($smsg->cc, qr/\bfoo\@example\.com\b/,
+			like($smsg->{cc}, qr/\bfoo\@example\.com\b/,
 				'cc appears');
 		}
 	}
@@ -337,7 +343,7 @@ $ibx->with_umask(sub {
 		is(scalar(@$res), 1,
 			"searched $pfx successfully for From:");
 		foreach my $smsg (@$res) {
-			like($smsg->from_name, qr/Laggy Sender/,
+			like($smsg->{from_name}, qr/Laggy Sender/,
 				"From appears with $pfx");
 		}
 	}
@@ -354,18 +360,18 @@ $ibx->with_umask(sub {
 
 	$res = $ro->query('q:theatre');
 	is(scalar(@$res), 1, 'only one quoted body');
-	like($res->[0]->from_name, qr/\AQuoter/,
+	like($res->[0]->{from_name}, qr/\AQuoter/,
 		'got quoted body') if (scalar(@$res));
 
 	$res = $ro->query('nq:theatre');
 	is(scalar @$res, 1, 'only one non-quoted body');
-	like($res->[0]->from_name, qr/\ANon-Quoter/,
+	like($res->[0]->{from_name}, qr/\ANon-Quoter/,
 		'got non-quoted body') if (scalar(@$res));
 
 	foreach my $pfx (qw(b: bs:)) {
 		$res = $ro->query($pfx . 'theatre');
 		is(scalar @$res, 2, "searched both bodies for $pfx");
-		like($res->[0]->from_name, qr/\ANon-Quoter/,
+		like($res->[0]->{from_name}, qr/\ANon-Quoter/,
 			"non-quoter first for $pfx") if scalar(@$res);
 	}
 }
@@ -379,16 +385,16 @@ $ibx->with_umask(sub {
 	is(scalar @$n, 1, 'got result for n:');
 	my $res = $ro->query('part_deux.txt');
 	is(scalar @$res, 1, 'got result without n:');
-	is($n->[0]->mid, $res->[0]->mid,
+	is($n->[0]->{mid}, $res->[0]->{mid},
 		'same result with and without') if scalar(@$res);
 	my $txt = $ro->query('"inside another"');
 	is(scalar @$txt, 1, 'found inside another');
-	is($txt->[0]->mid, $res->[0]->mid,
+	is($txt->[0]->{mid}, $res->[0]->{mid},
 		'search inside text attachments works') if scalar(@$txt);
 
 	my $art;
 	if (scalar(@$n) >= 1) {
-		my $mid = $n->[0]->mid;
+		my $mid = $n->[0]->{mid};
 		my ($id, $prev);
 		$art = $ro->{over_ro}->next_by_mid($mid, \$id, \$prev);
 		ok($art, 'article exists in OVER DB');
diff --git a/t/v2writable.t b/t/v2writable.t
index fa5c786e151..2bd7a400978 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -224,12 +224,13 @@ EOF
 	like($tip, qr/\A[a-f0-9]+ test removal\n\z/s,
 		'commit message propagated to git');
 	is_deeply(\@after, \@before, 'only one commit written to git');
-	is($ibx->mm->num_for($smsg->mid), undef, 'no longer in Msgmap by mid');
+	my $mid = $smsg->{mid};
+	is($ibx->mm->num_for($mid), undef, 'no longer in Msgmap by mid');
 	my $num = $smsg->{num};
 	like($num, qr/\A\d+\z/, 'numeric number in return message');
 	is($ibx->mm->mid_for($num), undef, 'no longer in Msgmap by num');
 	my $srch = $ibx->search->reopen;
-	my $mset = $srch->query('m:'.$smsg->mid, { mset => 1});
+	my $mset = $srch->query('m:'.$mid, { mset => 1});
 	is($mset->size, 0, 'no longer found in Xapian');
 	my @log1 = (@log, qw(-1 --pretty=raw --raw -r --no-renames));
 	is($srch->{over_ro}->get_art($num), undef,

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 13/13] wwwatomstream: drop smsg->{mid} fallback for non-SQLite
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (11 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 12/13] smsg: remove remaining accessor methods Eric Wong
@ 2020-06-01 10:06 ` Eric Wong
  2020-06-01 16:45 ` [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 10:06 UTC (permalink / raw)
  To: meta

It's no longer necessary to populate the smsg->{mid} field now
that ->smsg_eml calls smsg->populate in rare cases where the
smsg did not originate from SQLite.
---
 lib/PublicInbox/WwwAtomStream.pm | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 6ed0cb212d6..9dc24e16e9f 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -12,7 +12,6 @@ use warnings;
 use POSIX qw(strftime);
 use Digest::SHA qw(sha1_hex);
 use PublicInbox::Address;
-use PublicInbox::MID qw(mids);
 use PublicInbox::Hval qw(ascii_html mid_href);
 use PublicInbox::MsgTime qw(msg_timestamp);
 
@@ -102,7 +101,7 @@ sub feed_entry {
 	my $ctx = $self->{ctx};
 	my $eml = $ctx->{-inbox}->smsg_eml($smsg) or return '';
 	my $hdr = $eml->header_obj;
-	my $mid = $smsg->{mid} // mids($hdr)->[0];
+	my $mid = $smsg->{mid};
 	my $irt = PublicInbox::View::in_reply_to($hdr);
 	my $uuid = to_uuid($mid);
 	my $base = $ctx->{feed_base_url};

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 00/13] smsg: remove tricky {mime} field
  2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
                   ` (12 preceding siblings ...)
  2020-06-01 10:06 ` [PATCH 13/13] wwwatomstream: drop smsg->{mid} fallback for non-SQLite Eric Wong
@ 2020-06-01 16:45 ` Eric Wong
  13 siblings, 0 replies; 15+ messages in thread
From: Eric Wong @ 2020-06-01 16:45 UTC (permalink / raw)
  To: meta

Eric Wong <e@yhbt.net> wrote:
> Furthermore, $smsg->$field dispatch has always been measurably
> faster than $smsg->{$field} access in NNTP.

s/faster/slower/  :x

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-06-01 16:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-01 10:06 [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong
2020-06-01 10:06 ` [PATCH 01/13] inbox: introduce smsg_eml method Eric Wong
2020-06-01 10:06 ` [PATCH 02/13] wwwatomstream: convert callers to use smsg_eml Eric Wong
2020-06-01 10:06 ` [PATCH 03/13] v2writable: fix non-sensical interpolation in BUG message Eric Wong
2020-06-01 10:06 ` [PATCH 04/13] import: modernize to use Perl 5.10 features Eric Wong
2020-06-01 10:06 ` [PATCH 05/13] smsg: introduce ->populate method Eric Wong
2020-06-01 10:06 ` [PATCH 06/13] smsg: get rid of ->wrap initializer, too Eric Wong
2020-06-01 10:06 ` [PATCH 07/13] inbox: msg_by_*: remove $(size)ref args Eric Wong
2020-06-01 10:06 ` [PATCH 08/13] www: remove smsg_mime API and adjust callers Eric Wong
2020-06-01 10:06 ` [PATCH 09/13] nntp: smsg_range_i: favor ->{$field} lookups when possible Eric Wong
2020-06-01 10:06 ` [PATCH 10/13] smsg: get rid of remaining {mime} users Eric Wong
2020-06-01 10:06 ` [PATCH 11/13] smsg: remove ->bytes and ->lines methods Eric Wong
2020-06-01 10:06 ` [PATCH 12/13] smsg: remove remaining accessor methods Eric Wong
2020-06-01 10:06 ` [PATCH 13/13] wwwatomstream: drop smsg->{mid} fallback for non-SQLite Eric Wong
2020-06-01 16:45 ` [PATCH 00/13] smsg: remove tricky {mime} field Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).