On 24/07/2023 16:26, Eli Zaretskii wrote: >> Date: Mon, 24 Jul 2023 15:55:13 +0300 >> Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, >> 64735@debbugs.gnu.org >> From: Dmitry Gutov >> >>>> 1. 'find' itself is much slower there. There is room for improvement in >>>> the port. >>> >>> I think it's the filesystem, not the port (which I did myself in this >>> case). >> >> But directory-files-recursively goes through the same filesystem, >> doesn't it? > > It does (more or less; see below). But I was not trying to explain > why Find is slower than directory-files-recursively, I was trying to > explain why Find on Windows is slower than Find on GNU/Linux. Understood. But we probably don't need to worry about the differences between platforms as much as about choosing the best option for each platform (or not choosing the worst, at least). So I'm more interested about how the find-based solution is more than 4x slower than the built-in one on MS Windows. > If you are asking why directory-files-recursively is so much faster on > Windows than Find, then the main factors I can think about are: > > . IPC, at least in how we implement it in Emacs on MS-Windows, via a > separate thread and OS-level events between them to signal that > stuff is available for reading, whereas > directory-files-recursively avoids this overhead completely; > . Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on > Windows are emulated by wrappers around native APIs. Moreover, > Find uses 'char *' for file names, so calling native APIs involves > transparent conversion to UTF-16 and back, which is what native > APIs accept and return. By contrast, Emacs on Windows calls the > native APIs directly, and converts to UTF-16 from UTF-8, which is > faster. (This last point also means that using Find on Windows > has another grave disadvantage: it cannot fully support non-ASCII > file names, only those that can be encoded by the current > single-byte system codepage.) I seem to remember that Wine, which also does a similar dance of translating library and system calls, is often very close to the native performance for many programs. So this could be a problem, but necessarily a significant one. Although text encoding conversion seems like a prime suspect, if the problem is here. >>>> 2. The process output handling is worse. >>> >>> Not sure what that means. >> >> Emacs's ability to process the output of a process on the particular >> platform. >> >> You said: >> >> Btw, the Find command with pipe to some other program, like wc, >> finishes much faster, like 2 to 4 times faster than when it is run >> from find-directory-files-recursively. That's probably the slowdown >> due to communications with async subprocesses in action. > > I see this slowdown on GNU/Linux as well. > >> One thing to try it changing the -with-find implementation to use a >> synchronous call, to compare (e.g. using 'process-file'). And repeat >> these tests on GNU/Linux too. > > This still uses pipes, albeit without the pselect stuff. I'm attaching an extended benchmark, one that includes a "synchronous" implementation as well. Please give it a spin as well. Here (GNU/Linux) the reported numbers look like this: > (my-bench 1 default-directory "") (("built-in" . "Elapsed time: 1.601649s (0.709108s in 22 GCs)") ("with-find" . "Elapsed time: 1.792383s (1.135869s in 38 GCs)") ("with-find-p" . "Elapsed time: 1.248543s (0.682827s in 20 GCs)") ("with-find-sync" . "Elapsed time: 0.922291s (0.343497s in 10 GCs)"))