From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: --batch-size and --jobs combination
Date: Sun, 1 Aug 2021 16:40:24 -0400 [thread overview]
Message-ID: <20210801204024.ieab3lr6yl2yqpsd@nitro.local> (raw)
In-Reply-To: <20210729220629.GA29593@dcvr>
On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> My gut says 1g batch-size seems too high (Xapian has extra
> overhead) and could still eat too much into the kernel cache
> (and slow down reads). 100m might be a more reasonable limit
> for jobs=4 and 128G RAM.
Okay, I have things up and running on one of the 4 edge nodes. You can access
it and kick the tires at https://x-lore.kernel.org/. Initial observations:
- I can't give any kind of reliable numbers for initial importing/indexing, as
I was doing it piecemeal for a while to make sure that the indexer hooks
were doing the right thing. Besides, this is a live system serving a lot of
(static) content from the same partition where the indexing was done, so I/O
was routinely under high and unpredictable load. Final import/index took 40+
hours, but I'll have more reliable numbers once I do it on 3 other systems.
- Performance in /all/ seems laggy at times, probably depending on whether
lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
performance, speed seems to dramatically improve, which is probably when
most of the backend is in cache.
- URLs are mapped a bit wonkily right now -- / redirects to /all/, since I
expect that would be what most devs would want (pending feedback, I could be
totally wrong). Wwwlisting is mapped to https://x-lore.kernel.org/lists.html
since that's the URL currently containing the full archive. All of this may
and will probably change.
- I will bring up the rest of the nodes throughout the week, so
x-lore.kernel.org will become more geoip-balanced. I will share any other
observations once I have more data. Once all 4 nodes are up, I will share
this more widely with kernel devs so they can kick some tires and report
whether they are seeing decreased performance compared to current
lore.kernel.org. It's entirely possible that my plan to use
mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
which case I may bring up several dedicated instances in multiple clouds
instead.
Thanks for all your work, Eric.
Best regards,
-K
next prev parent reply other threads:[~2021-08-01 20:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
2021-07-29 21:13 ` Eric Wong
2021-07-29 21:24 ` Konstantin Ryabitsev
2021-07-29 22:06 ` Eric Wong
2021-08-01 20:40 ` Konstantin Ryabitsev [this message]
2021-08-05 11:05 ` Eric Wong
2021-08-05 20:59 ` Konstantin Ryabitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210801204024.ieab3lr6yl2yqpsd@nitro.local \
--to=konstantin@linuxfoundation.org \
--cc=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).