unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [CFT] SQLite and mmap...
Date: Sun, 10 Oct 2021 09:14:43 +0000	[thread overview]
Message-ID: <20211010091443.M599445@dcvr> (raw)

Anybody feel like benchmarking the below patch?

I've been trying it out a bit with indexing/reindexing/gc and
read-only daemons, but I haven't noticed an improvement on my
old AMD CPUs[1].  If anything, it's maybe <0.5% slower with
mmap for me, but probably within the margin of error on noisy
machines.

[1] only affected by Spectre v1/v2, but not other CPU bugs AFAIK

DBI and Perl have their own overheads, so it's probably masked
somewhat.  I do notice huge differences between different SSDs,
though...

Theoretically, mmap-ing regular files is nice since it avoids
data copies, mallocs (from SQLite's page cache) and syscalls.
However, throughout the years (not just this project
nor SQLite) I haven't seen meaningful speedups from mmap over
pread in most workloads.  Maybe I'm working on the wrong
projects :P

There's also caveats listed at: <https://www.sqlite.org/mmap.html>.
By default, the SQLite compile-time limit is only 2GB (which
doesn't make sense to me on 64-bit Linux, at least), and
over.sqlite3 is ~12G for lore/all; so that could also have
something to do with it.

With more mmap use, there's also /proc/sys/vm/max_map_count
limits which I already bump into when testing git.  So I'm not
exactly thrilled with the possibility of users being more
likely to bump into that...

Anyways, SQLite will parse the "PRAGMA mmap_size" value as a
signed 64-bit int, the change below is using the largest
possible value and letting it clamp to whatever compile-time
limit was.

---
 lib/PublicInbox/LeiMailSync.pm | 1 +
 lib/PublicInbox/Over.pm        | 1 +
 lib/PublicInbox/OverIdx.pm     | 4 ++--
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/LeiMailSync.pm b/lib/PublicInbox/LeiMailSync.pm
index 91cd1c934a1f..56cd14e6ef02 100644
--- a/lib/PublicInbox/LeiMailSync.pm
+++ b/lib/PublicInbox/LeiMailSync.pm
@@ -30,6 +30,7 @@ sub dbh_new {
 	create_tables($self, $dbh) if $rw;
 	$dbh->do('PRAGMA journal_mode = WAL') if $creat;
 	$dbh->do('PRAGMA case_sensitive_like = ON');
+	$dbh->do('PRAGMA mmap_size = 0x7'.('f' x 15)); # SQLite will clamp
 	$dbh;
 }
 
diff --git a/lib/PublicInbox/Over.pm b/lib/PublicInbox/Over.pm
index 19da056a10af..bcd91067e267 100644
--- a/lib/PublicInbox/Over.pm
+++ b/lib/PublicInbox/Over.pm
@@ -43,6 +43,7 @@ sub dbh_new {
 	} while ($st ne $self->{st} && $tries++ < 3);
 	warn "W: $f: .st_dev, .st_ino unstable\n" if $st ne $self->{st};
 
+	$dbh->do('PRAGMA mmap_size = 0x7'.('f' x 15)); # SQLite will clamp
 	if ($rw) {
 		# TRUNCATE reduces I/O compared to the default (DELETE).
 		#
diff --git a/lib/PublicInbox/OverIdx.pm b/lib/PublicInbox/OverIdx.pm
index 985abbf4e693..ce9b86616594 100644
--- a/lib/PublicInbox/OverIdx.pm
+++ b/lib/PublicInbox/OverIdx.pm
@@ -24,8 +24,8 @@ sub dbh_new {
 
 	# 80000 pages (80MiB on SQLite <3.12.0, 320MiB on 3.12.0+)
 	# was found to be good in 2018 during the large LKML import
-	# at the time.  This ought to be configurable based on HW
-	# and inbox size; I suspect it's overkill for many inboxes.
+	# at the time.  SQLite will only use as much as it needs,
+	# and maybe it's irrelevant since we use mmap nowadays.
 	$dbh->do('PRAGMA cache_size = 80000');
 
 	create_tables($dbh);

                 reply	other threads:[~2021-10-10  9:14 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211010091443.M599445@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).