unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org
Subject: Re: --batch-size and --jobs combination
Date: Thu, 5 Aug 2021 11:05:41 +0000	[thread overview]
Message-ID: <20210805110541.GA6446@dcvr> (raw)
In-Reply-To: <20210801204024.ieab3lr6yl2yqpsd@nitro.local>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> > My gut says 1g batch-size seems too high (Xapian has extra
> > overhead) and could still eat too much into the kernel cache
> > (and slow down reads). 100m might be a more reasonable limit
> > for jobs=4 and 128G RAM.
> 
> Okay, I have things up and running on one of the 4 edge nodes. You can access
> it and kick the tires at https://x-lore.kernel.org/. Initial observations:
> 
> - I can't give any kind of reliable numbers for initial importing/indexing, as
>   I was doing it piecemeal for a while to make sure that the indexer hooks
>   were doing the right thing. Besides, this is a live system serving a lot of
>   (static) content from the same partition where the indexing was done, so I/O
>   was routinely under high and unpredictable load. Final import/index took 40+
>   hours, but I'll have more reliable numbers once I do it on 3 other systems.

40 hours seems about reasonable.

> - Performance in /all/ seems laggy at times, probably depending on whether
>   lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
>   performance, speed seems to dramatically improve, which is probably when
>   most of the backend is in cache.

Yeah, SSDs still make a huge difference.  I moved most of my
worktrees and personal mail back to HDDs and it's been an
eye-opening and painful experience on a cold cache.

Try as I might, physics can't be beat :<  (And most of the
stuff that makes us faster on HDDs makes us faster on SSD, too)

As you get more inboxes, git 2.33.0-rc0+ dramatically reduces
memory use with many alternates (and also makes startup time
tolerable).

Reducing loose objects with more frequent packing will probably
be good, too.

> - I will bring up the rest of the nodes throughout the week, so
>   x-lore.kernel.org will become more geoip-balanced. I will share any other
>   observations once I have more data. Once all 4 nodes are up, I will share
>   this more widely with kernel devs so they can kick some tires and report
>   whether they are seeing decreased performance compared to current
>   lore.kernel.org. It's entirely possible that my plan to use
>   mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
>   which case I may bring up several dedicated instances in multiple clouds
>   instead.

Increasing the size of the SSD caches would net the most
dramatic improvement (or going SSD-only).  Even consumer grade
SSDs (MLC/TLC) leave enterprise HDDs in the dust.

Once the initial index is done, the workload is not especially
write-intensive, either, if SSD wear is a concern (I always
mount everything with noatime).

> Thanks for all your work, Eric.

You're welcome and thanks for the support.  It's been a very
rough time, especially with the pandemic still dragging on :<

  reply	other threads:[~2021-08-05 11:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
2021-07-29 21:13 ` Eric Wong
2021-07-29 21:24   ` Konstantin Ryabitsev
2021-07-29 22:06     ` Eric Wong
2021-08-01 20:40       ` Konstantin Ryabitsev
2021-08-05 11:05         ` Eric Wong [this message]
2021-08-05 20:59           ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210805110541.GA6446@dcvr \
    --to=e@80x24.org \
    --cc=konstantin@linuxfoundation.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).