From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: Re: --batch-size and --jobs combination
Date: Thu, 5 Aug 2021 11:05:41 +0000 [thread overview]
Message-ID: <20210805110541.GA6446@dcvr> (raw)
In-Reply-To: <20210801204024.ieab3lr6yl2yqpsd@nitro.local>
Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> > My gut says 1g batch-size seems too high (Xapian has extra
> > overhead) and could still eat too much into the kernel cache
> > (and slow down reads). 100m might be a more reasonable limit
> > for jobs=4 and 128G RAM.
>
> Okay, I have things up and running on one of the 4 edge nodes. You can access
> it and kick the tires at https://x-lore.kernel.org/. Initial observations:
>
> - I can't give any kind of reliable numbers for initial importing/indexing, as
> I was doing it piecemeal for a while to make sure that the indexer hooks
> were doing the right thing. Besides, this is a live system serving a lot of
> (static) content from the same partition where the indexing was done, so I/O
> was routinely under high and unpredictable load. Final import/index took 40+
> hours, but I'll have more reliable numbers once I do it on 3 other systems.
40 hours seems about reasonable.
> - Performance in /all/ seems laggy at times, probably depending on whether
> lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
> performance, speed seems to dramatically improve, which is probably when
> most of the backend is in cache.
Yeah, SSDs still make a huge difference. I moved most of my
worktrees and personal mail back to HDDs and it's been an
eye-opening and painful experience on a cold cache.
Try as I might, physics can't be beat :< (And most of the
stuff that makes us faster on HDDs makes us faster on SSD, too)
As you get more inboxes, git 2.33.0-rc0+ dramatically reduces
memory use with many alternates (and also makes startup time
tolerable).
Reducing loose objects with more frequent packing will probably
be good, too.
> - I will bring up the rest of the nodes throughout the week, so
> x-lore.kernel.org will become more geoip-balanced. I will share any other
> observations once I have more data. Once all 4 nodes are up, I will share
> this more widely with kernel devs so they can kick some tires and report
> whether they are seeing decreased performance compared to current
> lore.kernel.org. It's entirely possible that my plan to use
> mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
> which case I may bring up several dedicated instances in multiple clouds
> instead.
Increasing the size of the SSD caches would net the most
dramatic improvement (or going SSD-only). Even consumer grade
SSDs (MLC/TLC) leave enterprise HDDs in the dust.
Once the initial index is done, the workload is not especially
write-intensive, either, if SSD wear is a concern (I always
mount everything with noatime).
> Thanks for all your work, Eric.
You're welcome and thanks for the support. It's been a very
rough time, especially with the pandemic still dragging on :<
next prev parent reply other threads:[~2021-08-05 11:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
2021-07-29 21:13 ` Eric Wong
2021-07-29 21:24 ` Konstantin Ryabitsev
2021-07-29 22:06 ` Eric Wong
2021-08-01 20:40 ` Konstantin Ryabitsev
2021-08-05 11:05 ` Eric Wong [this message]
2021-08-05 20:59 ` Konstantin Ryabitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210805110541.GA6446@dcvr \
--to=e@80x24.org \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).