> I'd start by reducing the number of probed files, and then I'd
> benchmark the results and see if it's "good enough".  Threads add
> another dimension of complexity, so I'd only go there if we have a
> very good reason.

Just to show some numbers:
I run a fairly heavy Spacemacs configuration.
Without any patch it takes 80 seconds to start.
Reducing the number of probed files takes this down to 40 seconds.

The VTune profiler tells me it is spending 80% of the time waiting for openp().

I'll rewrite the patch to reduce the number of probed files.