unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/4] favor shorter binary OID comparisons
@ 2021-07-25  0:43 Eric Wong
  2021-07-25  0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25  0:43 UTC (permalink / raw)
  To: meta

We were doing unneccessary 40-byte hex comparisons for SHA-1s
in our code when 20-byte comparisons would've been sufficient.
3/4 is a typo fix which didn't hit any warnings before,
either...

Eric Wong (4):
  extsearchidx: favor binary comparison in common case
  lei_search: favor binary OID comparisons
  lei_inspect: fix typo
  lei_mail_sync: locations_for API uses oidbin for comparisons

 lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
 lib/PublicInbox/LeiExportKw.pm  | 7 +++----
 lib/PublicInbox/LeiInspect.pm   | 7 ++++---
 lib/PublicInbox/LeiMailSync.pm  | 7 ++++---
 lib/PublicInbox/LeiSearch.pm    | 6 +++---
 t/lei_mail_sync.t               | 4 ++--
 6 files changed, 18 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] extsearchidx: favor binary comparison in common case
  2021-07-25  0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
@ 2021-07-25  0:43 ` Eric Wong
  2021-07-25  0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25  0:43 UTC (permalink / raw)
  To: meta

We'll use 20-byte SHA-1 comparisons instead of 40-byte
hex representations for a minor reduction in memory
traffic.
---
 lib/PublicInbox/ExtSearchIdx.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 51dbf54f..fb1f511e 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -784,16 +784,16 @@ ORDER BY docid,xnum ASC LIMIT 10000
 
 			$fetching = $min = $docid;
 			my $smsg = $ibx->over->get_art($xnum);
-			my $oidhex = unpack('H*', $oidbin);
 			my $err;
 			if (!$smsg) {
 				$err = 'stale';
-			} elsif ($smsg->{blob} ne $oidhex) {
+			} elsif (pack('H*', $smsg->{blob}) ne $oidbin) {
 				$err = "mismatch (!= $smsg->{blob})";
 			} else {
 				next; # likely, all good
 			}
 			# current_info already has eidx_key
+			my $oidhex = unpack('H*', $oidbin);
 			warn "$xnum:$oidhex (#$docid): $err\n";
 			my $del = $self->{oidx}->dbh->prepare_cached(<<'');
 DELETE FROM xref3 WHERE ibx_id = ? AND xnum = ? AND oidbin = ?

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] lei_search: favor binary OID comparisons
  2021-07-25  0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
  2021-07-25  0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
@ 2021-07-25  0:43 ` Eric Wong
  2021-07-25  0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
  2021-07-25  0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25  0:43 UTC (permalink / raw)
  To: meta

Reduce memory traffic and code, too.
---
 lib/PublicInbox/LeiExportKw.pm | 7 +++----
 lib/PublicInbox/LeiSearch.pm   | 6 +++---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiExportKw.pm b/lib/PublicInbox/LeiExportKw.pm
index 671a84df..42a5ff22 100644
--- a/lib/PublicInbox/LeiExportKw.pm
+++ b/lib/PublicInbox/LeiExportKw.pm
@@ -10,8 +10,7 @@ use Errno qw(EEXIST ENOENT);
 
 sub export_kw_md { # LeiMailSync->each_src callback
 	my ($oidbin, $id, $self, $mdir) = @_;
-	my $oidhex = unpack('H*', $oidbin);
-	my $sto_kw = $self->{lse}->oid_keywords($oidhex) or return;
+	my $sto_kw = $self->{lse}->oidbin_keywords($oidbin) or return;
 	my $bn = $$id;
 	my ($md_kw, $unknown, @try);
 	if ($bn =~ s/:2,([a-zA-Z]*)\z//) {
@@ -57,13 +56,13 @@ sub export_kw_md { # LeiMailSync->each_src callback
 	# both tries failed
 	my $e = $!;
 	my $orig = '['.join('|', @fail).']';
+	my $oidhex = unpack('H*', $oidbin);
 	$lei->child_error(1, "link($orig, $dst) ($oidhex): $e");
 }
 
 sub export_kw_imap { # LeiMailSync->each_src callback
 	my ($oidbin, $id, $self, $mic) = @_;
-	my $oidhex = unpack('H*', $oidbin);
-	my $sto_kw = $self->{lse}->oid_keywords($oidhex) or return;
+	my $sto_kw = $self->{lse}->oidbin_keywords($oidbin) or return;
 	$self->{imap_mod_kw}->($self->{nwr}, $mic, $id, [ keys %$sto_kw ]);
 }
 
diff --git a/lib/PublicInbox/LeiSearch.pm b/lib/PublicInbox/LeiSearch.pm
index 37bfc65e..79b2fd7d 100644
--- a/lib/PublicInbox/LeiSearch.pm
+++ b/lib/PublicInbox/LeiSearch.pm
@@ -42,9 +42,9 @@ sub _oid_kw { # retry_reopen callback
 }
 
 # returns undef if blob is unknown
-sub oid_keywords {
-	my ($self, $oidhex) = @_;
-	my @num = $self->over->blob_exists($oidhex) or return;
+sub oidbin_keywords {
+	my ($self, $oidbin) = @_;
+	my @num = $self->over->oidbin_exists($oidbin) or return;
 	$self->retry_reopen(\&_oid_kw, \@num);
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] lei_inspect: fix typo
  2021-07-25  0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
  2021-07-25  0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
  2021-07-25  0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
@ 2021-07-25  0:43 ` Eric Wong
  2021-07-25  0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25  0:43 UTC (permalink / raw)
  To: meta

Not sure how this wasn't caught, earlier...
---
 lib/PublicInbox/LeiInspect.pm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/PublicInbox/LeiInspect.pm b/lib/PublicInbox/LeiInspect.pm
index 574da7a7..c277520c 100644
--- a/lib/PublicInbox/LeiInspect.pm
+++ b/lib/PublicInbox/LeiInspect.pm
@@ -74,7 +74,7 @@ sub inspect_docid ($$;$) {
 	my $data = $doc->get_data;
 	$ent->{docid} = $docid;
 	$ent->{data_length} = length($data);
-	$ent->{description} => $doc->get_description;
+	$ent->{description} = $doc->get_description;
 	$ent->{$_} = $doc->$_ for (qw(termlist_count values_count));
 	my $cur = $doc->termlist_begin;
 	my $end = $doc->termlist_end;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons
  2021-07-25  0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
                   ` (2 preceding siblings ...)
  2021-07-25  0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
@ 2021-07-25  0:43 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-07-25  0:43 UTC (permalink / raw)
  To: meta

Favor oidbin use internally to reduce internal memory traffic.
---
 lib/PublicInbox/LeiInspect.pm  | 5 +++--
 lib/PublicInbox/LeiMailSync.pm | 7 ++++---
 t/lei_mail_sync.t              | 4 ++--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/LeiInspect.pm b/lib/PublicInbox/LeiInspect.pm
index c277520c..bf7a4836 100644
--- a/lib/PublicInbox/LeiInspect.pm
+++ b/lib/PublicInbox/LeiInspect.pm
@@ -14,10 +14,11 @@ sub inspect_blob ($$) {
 	my ($lei, $oidhex) = @_;
 	my $ent = {};
 	if (my $lse = $lei->{lse}) {
-		my @docids = $lse ? $lse->over->blob_exists($oidhex) : ();
+		my $oidbin = pack('H*', $oidhex);
+		my @docids = $lse ? $lse->over->oidbin_exists($oidbin) : ();
 		$ent->{'lei/store'} = \@docids if @docids;
 		my $lms = $lse->lms;
-		if (my $loc = $lms ? $lms->locations_for($oidhex) : undef) {
+		if (my $loc = $lms ? $lms->locations_for($oidbin) : undef) {
 			$ent->{'mail-sync'} = $loc;
 		}
 	}
diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index 49e521da..82740d59 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -206,16 +206,16 @@ SELECT $op(uid) FROM blob2num WHERE fid = ?
 
 # returns a { location => [ list-of-ids-or-names ] } mapping
 sub locations_for {
-	my ($self, $oidhex) = @_;
+	my ($self, $oidbin) = @_;
 	my ($fid, $sth, $id, %fid2id);
 	my $dbh = $self->{dbh} //= dbh_new($self);
 	$sth = $dbh->prepare('SELECT fid,uid FROM blob2num WHERE oidbin = ?');
-	$sth->execute(pack('H*', $oidhex));
+	$sth->execute($oidbin);
 	while (my ($fid, $uid) = $sth->fetchrow_array) {
 		push @{$fid2id{$fid}}, $uid;
 	}
 	$sth = $dbh->prepare('SELECT fid,name FROM blob2name WHERE oidbin = ?');
-	$sth->execute(pack('H*', $oidhex));
+	$sth->execute($oidbin);
 	while (my ($fid, $name) = $sth->fetchrow_array) {
 		push @{$fid2id{$fid}}, $name;
 	}
@@ -225,6 +225,7 @@ sub locations_for {
 		$sth->execute($fid);
 		my ($loc) = $sth->fetchrow_array;
 		unless (defined $loc) {
+			my $oidhex = unpack('H*', $oidbin);
 			warn "E: fid=$fid for $oidhex unknown:\n", map {
 					'E: '.(ref() ? $$_ : "#$_")."\n";
 				} @$ids;
diff --git a/t/lei_mail_sync.t b/t/lei_mail_sync.t
index f0605092..a5e5f5d3 100644
--- a/t/lei_mail_sync.t
+++ b/t/lei_mail_sync.t
@@ -24,7 +24,7 @@ is_deeply([$ro->folders($imap)], [$imap], 'IMAP folder with full GLOB');
 is_deeply([$ro->folders('imaps://bob@[::1]/INBOX')], [$imap],
 		'IMAP folder with partial GLOB');
 
-is_deeply($ro->locations_for('deadbeef'),
+is_deeply($ro->locations_for("\xde\xad\xbe\xef"),
 	{ $imap => [ 1 ] }, 'locations_for w/ imap');
 
 my $maildir = 'maildir:/home/user/md';
@@ -33,7 +33,7 @@ $lms->lms_begin;
 ok($lms->set_src('deadbeef', $maildir, \$fname), 'set Maildir once');
 ok($lms->set_src('deadbeef', $maildir, \$fname) == 0, 'set Maildir again');
 $lms->lms_commit;
-is_deeply($ro->locations_for('deadbeef'),
+is_deeply($ro->locations_for("\xde\xad\xbe\xef"),
 	{ $imap => [ 1 ], $maildir => [ $fname ] },
 	'locations_for w/ maildir + imap');
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-25  0:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-25  0:43 [PATCH 0/4] favor shorter binary OID comparisons Eric Wong
2021-07-25  0:43 ` [PATCH 1/4] extsearchidx: favor binary comparison in common case Eric Wong
2021-07-25  0:43 ` [PATCH 2/4] lei_search: favor binary OID comparisons Eric Wong
2021-07-25  0:43 ` [PATCH 3/4] lei_inspect: fix typo Eric Wong
2021-07-25  0:43 ` [PATCH 4/4] lei_mail_sync: locations_for API uses oidbin for comparisons Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).