unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 2/2] www: deduplicate Message-ID in threading + skeleton
Date: Mon, 10 Jun 2024 11:34:27 +0000	[thread overview]
Message-ID: <20240610113427.122371-3-e@80x24.org> (raw)
In-Reply-To: <20240610113427.122371-1-e@80x24.org>

xt/perf-threading.t reports a small 0.5-1.0% memory reduction in
non-ancient Perls with CoW strings for threading alone (w/o
rendering the View.pm stuff).

On informal tests using -httpd and giant Linux stable patch set
threads (700+ messages), this ends up being roughly 5MB saved in
/T/ rendering since we use the {mid} field again in the
$ctx->{mapping} table.  This becomes even more beneficial if
handling parallel HTTP requests for messages in the same message
thread, even across different endpoints.
---
 lib/PublicInbox/SearchThread.pm | 9 +++++++--
 lib/PublicInbox/View.pm         | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 00ae9fac..672c53ad 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -33,19 +33,24 @@ sub thread {
 	# can be shakier if somebody used In-Reply-To with multiple, disparate
 	# messages.  So, take the client Date: into account since we can't
 	# always determine ordering when somebody uses multiple In-Reply-To.
+	my (%dedupe, $mid);
 	my @kids = sort { $a->{ds} <=> $b->{ds} } grep {
 		# this delete saves around 4K across 1K messages
 		# TODO: move this to a more appropriate place, breaks tests
 		# if we do it during psgi_cull
 		delete $_->{num};
 		bless $_, 'PublicInbox::SearchThread::Msg';
-		if (exists $id_table{$_->{mid}}) {
+		$mid = $_->{mid};
+		if (exists $id_table{$mid}) {
 			$_->{children} = [];
 			push @imposters, $_; # we'll deal with them later
 			undef;
 		} else {
 			$_->{children} = {}; # will become arrayref later
-			$id_table{$_->{mid}} = $_;
+			%dedupe = ($mid => undef);
+			($mid) = keys %dedupe;
+			$_->{mid} = $mid;
+			$id_table{$mid} = $_;
 			defined($_->{references});
 		}
 	} @$msgs;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 958efa41..dcceb311 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -432,6 +432,7 @@ sub walk_thread ($$$) {
 
 sub pre_thread  { # walk_thread callback
 	my ($ctx, $level, $node, $idx) = @_;
+	# node->{mid} is deduplicated in PublicInbox::SearchThread::thread
 	$ctx->{mapping}->{$node->{mid}} = [ '', $node, $idx, $level ];
 	skel_dump($ctx, $level, $node);
 }

      parent reply	other threads:[~2024-06-10 11:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 11:34 [PATCH 0/2] www: minor mem reduction in message threading Eric Wong
2024-06-10 11:34 ` [PATCH 1/2] xt/perf-threading: modernize + remove Xapian dependency Eric Wong
2024-06-10 11:34 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240610113427.122371-3-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).