Re: --batch-size and --jobs combination

unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed

From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: --batch-size and --jobs combination
Date: Sun, 1 Aug 2021 16:40:24 -0400	[thread overview]
Message-ID: <20210801204024.ieab3lr6yl2yqpsd@nitro.local> (raw)
In-Reply-To: <20210729220629.GA29593@dcvr>

On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> My gut says 1g batch-size seems too high (Xapian has extra
> overhead) and could still eat too much into the kernel cache
> (and slow down reads). 100m might be a more reasonable limit
> for jobs=4 and 128G RAM.

Okay, I have things up and running on one of the 4 edge nodes. You can access
it and kick the tires at https://x-lore.kernel.org/. Initial observations:

- I can't give any kind of reliable numbers for initial importing/indexing, as
  I was doing it piecemeal for a while to make sure that the indexer hooks
  were doing the right thing. Besides, this is a live system serving a lot of
  (static) content from the same partition where the indexing was done, so I/O
  was routinely under high and unpredictable load. Final import/index took 40+
  hours, but I'll have more reliable numbers once I do it on 3 other systems.

- Performance in /all/ seems laggy at times, probably depending on whether
  lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
  performance, speed seems to dramatically improve, which is probably when
  most of the backend is in cache.

- URLs are mapped a bit wonkily right now -- / redirects to /all/, since I
  expect that would be what most devs would want (pending feedback, I could be
  totally wrong). Wwwlisting is mapped to https://x-lore.kernel.org/lists.html
  since that's the URL currently containing the full archive. All of this may
  and will probably change.

- I will bring up the rest of the nodes throughout the week, so
  x-lore.kernel.org will become more geoip-balanced. I will share any other
  observations once I have more data. Once all 4 nodes are up, I will share
  this more widely with kernel devs so they can kick some tires and report
  whether they are seeing decreased performance compared to current
  lore.kernel.org. It's entirely possible that my plan to use
  mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
  which case I may bring up several dedicated instances in multiple clouds
  instead.

Thanks for all your work, Eric.

Best regards,
-K

next prev parent reply	other threads:[~2021-08-01 20:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
2021-07-29 21:13 ` Eric Wong
2021-07-29 21:24   ` Konstantin Ryabitsev
2021-07-29 22:06     ` Eric Wong
2021-08-01 20:40       ` Konstantin Ryabitsev [this message]
2021-08-05 11:05         ` Eric Wong
2021-08-05 20:59           ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210801204024.ieab3lr6yl2yqpsd@nitro.local \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).