unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* --batch-size and --jobs combination
@ 2021-07-29 20:28 Konstantin Ryabitsev
  2021-07-29 21:13 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-29 20:28 UTC (permalink / raw)
  To: meta

Hello:

Is there any specific logic for mixing --batch-size and --jobs? On a system
with plenty of CPUs and lots of RAM, does it make sense to have more --jobs,
larger --batch-size, or some balance of both?

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
@ 2021-07-29 21:13 ` Eric Wong
  2021-07-29 21:24   ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2021-07-29 21:13 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> Hello:
> 
> Is there any specific logic for mixing --batch-size and --jobs? On a system
> with plenty of CPUs and lots of RAM, does it make sense to have more --jobs,
> larger --batch-size, or some balance of both?

jobs will be bound by I/O capability for your case.  SATA-2 vs
SATA-3 vs NVME will have a notable difference, as does the
quality of the device (MLC, TLC, QLC; cache/controller).

Xapian seems to do better with bigger batch-sizes up to a point.
I'm not sure I have enough RAM to accurately test >8m batch
sizes (since we also need to account for kernel caching).

batch-size * (jobs - 1) = rough total batch size

If it's the initial index creation, I would definitely use
--no-fsync, too.  Perhaps that should be the default for new
indices.

Also note: the recent RFC for --sequential-commit doesn't seem to
be working out performance-wise on my SATA-2 system; but I'm also
not sure about SSD life/degradation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-07-29 21:13 ` Eric Wong
@ 2021-07-29 21:24   ` Konstantin Ryabitsev
  2021-07-29 22:06     ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2021-07-29 21:24 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Jul 29, 2021 at 09:13:21PM +0000, Eric Wong wrote:
> jobs will be bound by I/O capability for your case.  SATA-2 vs
> SATA-3 vs NVME will have a notable difference, as does the
> quality of the device (MLC, TLC, QLC; cache/controller).

So, on these systems with large lvmcache disks, large --jobs number isn't
really going to help, right? Disk performance is not great, since these
systems are continuously serving traffic for mirrors.kernel.org, but writes
are spread across RAID-6, so not sure --sequential-shard will do much
difference.

Let me try "--jobs=2 --batch-size=1g --no-fsync" for the initial index.
There's 128G of RAM on this system, so it should be able to take that easily.
I didn't use --no-fsync due to the admonition in the manpages that it probably
will result in worse performance with a large --batch-size, but if you say
it's going to likely help, I'm willing to try that out.

Thanks for your help!

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-07-29 21:24   ` Konstantin Ryabitsev
@ 2021-07-29 22:06     ` Eric Wong
  2021-08-01 20:40       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2021-07-29 22:06 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jul 29, 2021 at 09:13:21PM +0000, Eric Wong wrote:
> > jobs will be bound by I/O capability for your case.  SATA-2 vs
> > SATA-3 vs NVME will have a notable difference, as does the
> > quality of the device (MLC, TLC, QLC; cache/controller).
> 
> So, on these systems with large lvmcache disks, large --jobs number isn't
> really going to help, right? Disk performance is not great, since these
> systems are continuously serving traffic for mirrors.kernel.org, but writes
> are spread across RAID-6, so not sure --sequential-shard will do much
> difference.
> 
> Let me try "--jobs=2 --batch-size=1g --no-fsync" for the initial index.
> There's 128G of RAM on this system, so it should be able to take that easily.
> I didn't use --no-fsync due to the admonition in the manpages that it probably
> will result in worse performance with a large --batch-size, but if you say
> it's going to likely help, I'm willing to try that out.

With 128G RAM, I'm thinking higher jobs (maybe 4) would still be
able to offset the I/O limitations read-wise (especially
read/seek the latency); leaving writes as the bottleneck...

How much write throughput (MB/s) can you put through the system?
With SATA-2 (<300 MB/s) limits, jobs=4 seemed reasonable.

My gut says 1g batch-size seems too high (Xapian has extra
overhead) and could still eat too much into the kernel cache
(and slow down reads). 100m might be a more reasonable limit
for jobs=4 and 128G RAM.

From what I've read on Xapian lists, the honey backend of Xapian
ought to solve problems with giant shards; but it's still
lacking write/update support.

I''ve also been thinking of introducing an epoch-like approach
on a per-shard basis to mitigate this problem; but this seems
like a problem better solved with honey.

> Thanks for your help!

No problem, in any case, it'd be much appreciated to report back
what works and what doesn't.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-07-29 22:06     ` Eric Wong
@ 2021-08-01 20:40       ` Konstantin Ryabitsev
  2021-08-05 11:05         ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2021-08-01 20:40 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> My gut says 1g batch-size seems too high (Xapian has extra
> overhead) and could still eat too much into the kernel cache
> (and slow down reads). 100m might be a more reasonable limit
> for jobs=4 and 128G RAM.

Okay, I have things up and running on one of the 4 edge nodes. You can access
it and kick the tires at https://x-lore.kernel.org/. Initial observations:

- I can't give any kind of reliable numbers for initial importing/indexing, as
  I was doing it piecemeal for a while to make sure that the indexer hooks
  were doing the right thing. Besides, this is a live system serving a lot of
  (static) content from the same partition where the indexing was done, so I/O
  was routinely under high and unpredictable load. Final import/index took 40+
  hours, but I'll have more reliable numbers once I do it on 3 other systems.

- Performance in /all/ seems laggy at times, probably depending on whether
  lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
  performance, speed seems to dramatically improve, which is probably when
  most of the backend is in cache.

- URLs are mapped a bit wonkily right now -- / redirects to /all/, since I
  expect that would be what most devs would want (pending feedback, I could be
  totally wrong). Wwwlisting is mapped to https://x-lore.kernel.org/lists.html
  since that's the URL currently containing the full archive. All of this may
  and will probably change.

- I will bring up the rest of the nodes throughout the week, so
  x-lore.kernel.org will become more geoip-balanced. I will share any other
  observations once I have more data. Once all 4 nodes are up, I will share
  this more widely with kernel devs so they can kick some tires and report
  whether they are seeing decreased performance compared to current
  lore.kernel.org. It's entirely possible that my plan to use
  mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
  which case I may bring up several dedicated instances in multiple clouds
  instead.

Thanks for all your work, Eric.

Best regards,
-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-08-01 20:40       ` Konstantin Ryabitsev
@ 2021-08-05 11:05         ` Eric Wong
  2021-08-05 20:59           ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2021-08-05 11:05 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: meta

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Thu, Jul 29, 2021 at 10:06:29PM +0000, Eric Wong wrote:
> > My gut says 1g batch-size seems too high (Xapian has extra
> > overhead) and could still eat too much into the kernel cache
> > (and slow down reads). 100m might be a more reasonable limit
> > for jobs=4 and 128G RAM.
> 
> Okay, I have things up and running on one of the 4 edge nodes. You can access
> it and kick the tires at https://x-lore.kernel.org/. Initial observations:
> 
> - I can't give any kind of reliable numbers for initial importing/indexing, as
>   I was doing it piecemeal for a while to make sure that the indexer hooks
>   were doing the right thing. Besides, this is a live system serving a lot of
>   (static) content from the same partition where the indexing was done, so I/O
>   was routinely under high and unpredictable load. Final import/index took 40+
>   hours, but I'll have more reliable numbers once I do it on 3 other systems.

40 hours seems about reasonable.

> - Performance in /all/ seems laggy at times, probably depending on whether
>   lvmcache has Xapian DBs in SSD cache or not. After a period of laggy
>   performance, speed seems to dramatically improve, which is probably when
>   most of the backend is in cache.

Yeah, SSDs still make a huge difference.  I moved most of my
worktrees and personal mail back to HDDs and it's been an
eye-opening and painful experience on a cold cache.

Try as I might, physics can't be beat :<  (And most of the
stuff that makes us faster on HDDs makes us faster on SSD, too)

As you get more inboxes, git 2.33.0-rc0+ dramatically reduces
memory use with many alternates (and also makes startup time
tolerable).

Reducing loose objects with more frequent packing will probably
be good, too.

> - I will bring up the rest of the nodes throughout the week, so
>   x-lore.kernel.org will become more geoip-balanced. I will share any other
>   observations once I have more data. Once all 4 nodes are up, I will share
>   this more widely with kernel devs so they can kick some tires and report
>   whether they are seeing decreased performance compared to current
>   lore.kernel.org. It's entirely possible that my plan to use
>   mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
>   which case I may bring up several dedicated instances in multiple clouds
>   instead.

Increasing the size of the SSD caches would net the most
dramatic improvement (or going SSD-only).  Even consumer grade
SSDs (MLC/TLC) leave enterprise HDDs in the dust.

Once the initial index is done, the workload is not especially
write-intensive, either, if SSD wear is a concern (I always
mount everything with noatime).

> Thanks for all your work, Eric.

You're welcome and thanks for the support.  It's been a very
rough time, especially with the pandemic still dragging on :<

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: --batch-size and --jobs combination
  2021-08-05 11:05         ` Eric Wong
@ 2021-08-05 20:59           ` Konstantin Ryabitsev
  0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2021-08-05 20:59 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

On Thu, Aug 05, 2021 at 11:05:41AM +0000, Eric Wong wrote:
> > - I will bring up the rest of the nodes throughout the week, so
> >   x-lore.kernel.org will become more geoip-balanced. I will share any other
> >   observations once I have more data. Once all 4 nodes are up, I will share
> >   this more widely with kernel devs so they can kick some tires and report
> >   whether they are seeing decreased performance compared to current
> >   lore.kernel.org. It's entirely possible that my plan to use
> >   mirrors.edge.kernel.org nodes for this isn't one of my brightest ideas, in
> >   which case I may bring up several dedicated instances in multiple clouds
> >   instead.
> 
> Increasing the size of the SSD caches would net the most
> dramatic improvement (or going SSD-only).  Even consumer grade
> SSDs (MLC/TLC) leave enterprise HDDs in the dust.

The more I poke at this, the more it seems that using faster storage is a much
better plan than trying to make this work reasonably well with our distro
mirroring nodes. Occasional high IO latency will go largely unnoticed when
it's apt or dnf that's fronting the requests, but web or b4 calls that make
someone wait for 10+ seconds seems like a recipe for some very irate
developers.

I actually have a good opportunity to move our current git.kernel.org systems
to newer hardware at Equinix Metal. We'll go from 6 older nodes to 3 newer
nodes, each with more CPU/RAM and 1.2TB total of SSD. This should give us
plenty of room to colocate lore.kernel.org on the same systems as
git.kernel.org, and it's a much better plan than trying to make spinning rust
work well for this purpose.

> > Thanks for all your work, Eric.
> 
> You're welcome and thanks for the support.  It's been a very
> rough time, especially with the pandemic still dragging on :<

And we thought we were all done this spring, eh? Ah well, this too shall pass.

Best regards,
-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-05 20:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-29 20:28 --batch-size and --jobs combination Konstantin Ryabitsev
2021-07-29 21:13 ` Eric Wong
2021-07-29 21:24   ` Konstantin Ryabitsev
2021-07-29 22:06     ` Eric Wong
2021-08-01 20:40       ` Konstantin Ryabitsev
2021-08-05 11:05         ` Eric Wong
2021-08-05 20:59           ` Konstantin Ryabitsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).