* Native OS pipelines in eshell and Emacs @ 2024-05-28 14:42 Spencer Baugh 2024-05-28 16:33 ` Jim Porter 2024-05-29 1:21 ` Dmitry Gutov 0 siblings, 2 replies; 10+ messages in thread From: Spencer Baugh @ 2024-05-28 14:42 UTC (permalink / raw) To: emacs-devel; +Cc: johnw, spwhitton, dmitry eshell "pipelines" operate by reading the data in from one process and writing it out to the next process. Thus the data flows from one process, to Emacs, and then to the next process. This differs from the native OS capability to make a pipe and pass one end down to one process as stdout, and the other end down to another process as stdin, which is more efficient. Has there been work before on supporting this in eshell and Emacs? I saw there was the new em-extpipe capability in eshell, but that requires different syntax bypasses Eshell's usual features - adding the ability to create pipelines natively in Emacs would allow the normal Eshell syntax to just be efficient on its own. (This would, I think, remove the need for extpipe) This same ability would be useful for project.el, where it would be nice for the output of project-files (e.g. "git ls-files") to be piped directly to xargs grep for commands like project-find-regexp, instead of sending the data through Emacs which makes it substantially slower. Specifically, the new feature would be something like an :stdin argument to make-process which allows a make-pipe-process (or other process) to be passed as stdin, and grabs the output file descriptor from that process (what Emacs would normally read) and passes it down as stdin for the new process instead. I'm working on a patch to do this, but I wonder if any work has been done on this before? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh @ 2024-05-28 16:33 ` Jim Porter 2024-05-28 18:38 ` Spencer Baugh 2024-05-29 1:21 ` Dmitry Gutov 1 sibling, 1 reply; 10+ messages in thread From: Jim Porter @ 2024-05-28 16:33 UTC (permalink / raw) To: Spencer Baugh, emacs-devel; +Cc: johnw, spwhitton, dmitry On 5/28/2024 7:42 AM, Spencer Baugh wrote: > eshell "pipelines" operate by reading the data in from one process and > writing it out to the next process. Thus the data flows from one > process, to Emacs, and then to the next process. > > This differs from the native OS capability to make a pipe and pass one > end down to one process as stdout, and the other end down to another > process as stdin, which is more efficient. > > Has there been work before on supporting this in eshell and Emacs? I've worked on this previously, and even put together a hacky sketch of how it would work before abandoning it due to a bunch of complexities in Eshell that make this infeasible (in my opinion, anyway). As the current Eshell maintainer, I'd (softly) suggest you turn back now, unless you're willing to go down a fairly deep rabbit hole. I'll also note: the benefits here are also somewhat reduced by improvements to Eshell pipelines in Emacs 29. As of commit d7b89ea4077 (bug#56025), piped processes in Eshell no longer use PTYs for output, which resulted in a ~35x improvement in my limited tests. (Still 5-10x slower than in Bash though.) I didn't test this extensively at the time though since the main goal was fixing incorrect behavior; the perf improvement was just a nice bonus. > Specifically, the new feature would be something like an :stdin argument > to make-process which allows a make-pipe-process (or other process) to > be passed as stdin, and grabs the output file descriptor from that > process (what Emacs would normally read) and passes it down as stdin for > the new process instead. It's not quite as simple as that, I'm afraid. The C side is perfectly reasonable I think, and would likely make some parts of Eshell easier to manage, but there still needs to be some extra sorcery for Eshell. Eshell commands can either be Lisp-based or they can be external programs. That sounds simple, but it's not actually possible to determine ahead of time which Eshell will choose. Consider "cat". The implementation of "cat random" that Eshell uses depends on your cwd: if "random" is a regular file in your cwd, we use a Lisp implementation. But if your cwd is /dev, then "random" is a character device file, and the Lisp implementation replaces itself (*after* starting execution) with the external program. This makes it a lot harder to determine how to connect this command in a pipeline. Another issue is Tramp. If Eshell runs each remote process as an independent 'make-process' invocation as it is today, then we're stuck with a whole lot of extra indirection, and any pipe (native or otherwise) would be *local* instead of remote (where we want it). This even applies to not-really-remote cases like sudo, which Eshell manages via Tramp. Both of these cases are worked around via extpipes: in the former, the extpipe mandates that all connected commands are external programs, and in the latter, it constructs an 'sh' invocation that runs the entire pipeline as a unit on the remote host. With enough work it might be possible to overcome some of these problems for Eshell, but I haven't been able to produce a satisfactory design for this that doesn't involve major incompatible changes. It's a different strategy, but I wonder if improving the scheduling in Emacs' process handling would get us close to "native" performance here? See <https://tdodge.consulting/blog/eshell/background-output-thread> for a discussion of the issue and a WIP(?) fix. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-28 16:33 ` Jim Porter @ 2024-05-28 18:38 ` Spencer Baugh 2024-05-28 19:56 ` Jim Porter 0 siblings, 1 reply; 10+ messages in thread From: Spencer Baugh @ 2024-05-28 18:38 UTC (permalink / raw) To: emacs-devel Jim Porter <jporterbugs@gmail.com> writes: > On 5/28/2024 7:42 AM, Spencer Baugh wrote: >> eshell "pipelines" operate by reading the data in from one process and >> writing it out to the next process. Thus the data flows from one >> process, to Emacs, and then to the next process. >> This differs from the native OS capability to make a pipe and pass >> one >> end down to one process as stdout, and the other end down to another >> process as stdin, which is more efficient. >> Has there been work before on supporting this in eshell and Emacs? > > I've worked on this previously, and even put together a hacky sketch > of how it would work before abandoning it due to a bunch of > complexities in Eshell that make this infeasible (in my opinion, > anyway). As the current Eshell maintainer, I'd (softly) suggest you > turn back now, unless you're willing to go down a fairly deep rabbit > hole. That is fair, but since supporting :stdin in Emacs would be useful for project.el anyway, I'm motivated to do this even if eshell won't immediately benefit. So I'm especially interested in anything you have to share about the C side of this, if anything. > I'll also note: the benefits here are also somewhat reduced by > improvements to Eshell pipelines in Emacs 29. As of commit d7b89ea4077 > (bug#56025), piped processes in Eshell no longer use PTYs for output, > which resulted in a ~35x improvement in my limited tests. (Still 5-10x > slower than in Bash though.) I didn't test this extensively at the > time though since the main goal was fixing incorrect behavior; the > perf improvement was just a nice bonus. > >> Specifically, the new feature would be something like an :stdin argument >> to make-process which allows a make-pipe-process (or other process) to >> be passed as stdin, and grabs the output file descriptor from that >> process (what Emacs would normally read) and passes it down as stdin for >> the new process instead. > > It's not quite as simple as that, I'm afraid. The C side is perfectly > reasonable I think, and would likely make some parts of Eshell easier > to manage, but there still needs to be some extra sorcery for > Eshell. Eshell commands can either be Lisp-based or they can be > external programs. That sounds simple, but it's not actually possible > to determine ahead of time which Eshell will choose. > > Consider "cat". The implementation of "cat random" that Eshell uses > depends on your cwd: if "random" is a regular file in your cwd, we use > a Lisp implementation. But if your cwd is /dev, then "random" is a > character device file, and the Lisp implementation replaces itself > (*after* starting execution) with the external program. This makes it > a lot harder to determine how to connect this command in a pipeline. > > Another issue is Tramp. If Eshell runs each remote process as an > independent 'make-process' invocation as it is today, then we're stuck > with a whole lot of extra indirection, and any pipe (native or > otherwise) would be *local* instead of remote (where we want it). This > even applies to not-really-remote cases like sudo, which Eshell > manages via Tramp. > > Both of these cases are worked around via extpipes: in the former, the > extpipe mandates that all connected commands are external programs, > and in the latter, it constructs an 'sh' invocation that runs the > entire pipeline as a unit on the remote host. Ah, those are indeed real and annoying concerns. But if extpipe is able to mandate this, doesn't that mean there is some way to get this information? That is, we're able to detect "all the commands are external programs" and "we're running on the local host". In that case, we could start by using the native pipes only when both those conditions are true. Or, slightly more aggressively: whenever two adjacent commands in an eshell pipeline are both external programs on the local host. > With enough work it might be possible to overcome some of these > problems for Eshell, but I haven't been able to produce a satisfactory > design for this that doesn't involve major incompatible changes. > > It's a different strategy, but I wonder if improving the scheduling in > Emacs' process handling would get us close to "native" performance > here? See > <https://tdodge.consulting/blog/eshell/background-output-thread> for a > discussion of the issue and a WIP(?) fix. Very interesting idea, but I personally am motivated to get performance which is not just closer to shell, but equivalent to shell. Right now I only infrequently use eshell, because every time I write and wait for a pipeline I think "I would have to wait less if I was in M-x shell", and I'd like to never think that :) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-28 18:38 ` Spencer Baugh @ 2024-05-28 19:56 ` Jim Porter 0 siblings, 0 replies; 10+ messages in thread From: Jim Porter @ 2024-05-28 19:56 UTC (permalink / raw) To: Spencer Baugh, emacs-devel On 5/28/2024 11:38 AM, Spencer Baugh wrote: > Jim Porter <jporterbugs@gmail.com> writes: >> I've worked on this previously, and even put together a hacky sketch >> of how it would work before abandoning it due to a bunch of >> complexities in Eshell that make this infeasible (in my opinion, >> anyway). As the current Eshell maintainer, I'd (softly) suggest you >> turn back now, unless you're willing to go down a fairly deep rabbit >> hole. > > That is fair, but since supporting :stdin in Emacs would be useful for > project.el anyway, I'm motivated to do this even if eshell won't > immediately benefit. > > So I'm especially interested in anything you have to share about the C > side of this, if anything. Don't let me discourage you from the C side of things. :) I think it would probably be pretty useful overall. As for advice on the C implementation, I don't have anything much in particular. I *think* this should be reasonably straightforward. The one wrinkle I got a bit bogged down in was with 'make-pipe-process'. That really creates 2 pipes under the hood, and for various reasons, I wanted to be able to use a single OS pipe from that, where I could (depending on the specific task) either write into it (pushing the data into stdin of a child process) or read out of it (pulling the data out of a child process's stdout). The above would probably make it easier for Eshell, since my plan was to use 'make-pipe-process' and hook it up into the Eshell IO code. (See 'eshell-get-target' and friends.) > Ah, those are indeed real and annoying concerns. > > But if extpipe is able to mandate this, doesn't that mean there is some > way to get this information? That is, we're able to detect "all the > commands are external programs" and "we're running on the local host". Well, extpipe does this by forcing the user to knowingly opt into a totally different behavior (just one that we hope is similar to the "default"). One option might be to use an enhanced 'make-pipe-process' object (as described above) as the common glue between commands in an Eshell pipeline. If you had the ability to set a process's :stdin to a "pipe-process", and then Eshell had the ability to use a "pipe-process" as an output target, I think that would get you (close to?) native pipes while still allowing one end of the pipe to be something other than a process. This doesn't help the Tramp case, since the pipe is still on your local host, but the Tramp case is hard enough that it might be best just to focus on not making the situation any worse than it already is. One final wrinkle to all this: I've been intending for quite a long time to add the ability to pipe data in Eshell *to* Lisp code; currently you can only pipe *from* Lisp code. I want to be extra careful that any changes to Eshell pipelines won't render these plans impossible. (This project has taken a long time since I want to be sure that when I eventually merge these Eshell "pseudo-processes", I get them as close to correct the first time as I can; incompatible changes down the line would be a pain.) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh 2024-05-28 16:33 ` Jim Porter @ 2024-05-29 1:21 ` Dmitry Gutov 2024-05-29 1:43 ` Spencer Baugh 1 sibling, 1 reply; 10+ messages in thread From: Dmitry Gutov @ 2024-05-29 1:21 UTC (permalink / raw) To: Spencer Baugh, emacs-devel; +Cc: johnw, spwhitton On 28/05/2024 17:42, Spencer Baugh wrote: > This same ability would be useful for project.el, where it would be nice > for the output of project-files (e.g. "git ls-files") to be piped > directly to xargs grep for commands like project-find-regexp, instead of > sending the data through Emacs which makes it substantially slower. I have indeed been considering something like that for project-files -> xref-matches-in-files. But mostly in broad strokes. > Specifically, the new feature would be something like an :stdin argument > to make-process which allows a make-pipe-process (or other process) to > be passed as stdin, and grabs the output file descriptor from that > process (what Emacs would normally read) and passes it down as stdin for > the new process instead. It would be doubly interesting if we manage to implement it so that Tramp would be able to connect two processes directly without round-tripping the i/o from the remote host to local and back to remote. That's a major source of latency in project-find-regexp on remote. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-29 1:21 ` Dmitry Gutov @ 2024-05-29 1:43 ` Spencer Baugh 2024-05-29 2:08 ` Dmitry Gutov 2024-05-29 7:53 ` Michael Albinus 0 siblings, 2 replies; 10+ messages in thread From: Spencer Baugh @ 2024-05-29 1:43 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel, johnw, spwhitton [-- Attachment #1: Type: text/plain, Size: 1250 bytes --] On Tue, May 28, 2024, 9:21 PM Dmitry Gutov <dmitry@gutov.dev> wrote: > > Specifically, the new feature would be something like an :stdin argument > > to make-process which allows a make-pipe-process (or other process) to > > be passed as stdin, and grabs the output file descriptor from that > > process (what Emacs would normally read) and passes it down as stdin for > > the new process instead. > > It would be doubly interesting if we manage to implement it so that > Tramp would be able to connect two processes directly without > round-tripping the i/o from the remote host to local and back to remote. > That's a major source of latency in project-find-regexp on remote. > Unfortunately this is almost impossibly hard. But, I actually have worked extensively on doing this specific impossible thing (remote process APIs that are powerful enough to do this) so I will eventually try to implement them for Emacs and TRAMP. It would allow full make-process support in TRAMP as well as a make-pipe-process which represents a pipe existing on a remote system. Anyway, in the short term it will probably only work efficiently for local processes, with remote project-files having to roundtrip through the local Emacs. [-- Attachment #2: Type: text/html, Size: 1838 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-29 1:43 ` Spencer Baugh @ 2024-05-29 2:08 ` Dmitry Gutov 2024-05-29 8:01 ` Michael Albinus 2024-05-29 7:53 ` Michael Albinus 1 sibling, 1 reply; 10+ messages in thread From: Dmitry Gutov @ 2024-05-29 2:08 UTC (permalink / raw) To: Spencer Baugh; +Cc: emacs-devel, johnw, spwhitton On 29/05/2024 04:43, Spencer Baugh wrote: > It would be doubly interesting if we manage to implement it so that > Tramp would be able to connect two processes directly without > round-tripping the i/o from the remote host to local and back to > remote. > That's a major source of latency in project-find-regexp on remote. > > > Unfortunately this is almost impossibly hard. But, I actually have > worked extensively on doing this specific impossible thing (remote > process APIs that are powerful enough to do this) so I will eventually > try to implement them for Emacs and TRAMP. It would allow full > make-process support in TRAMP as well as a make-pipe-process which > represents a pipe existing on a remote system. On the remote, it would be fine if the pipe is not direct between such processes, but goes through the shell, or maybe some other processes as well (maybe a temp file?). That would still be faster than doing the round-trip. > Anyway, in the short term it will probably only work efficiently for > local processes, with remote project-files having to roundtrip through > the local Emacs. Yes, well. For local processes, it would at least help with "asynchronous regexp search" Xref UI. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-29 2:08 ` Dmitry Gutov @ 2024-05-29 8:01 ` Michael Albinus 2024-05-29 10:31 ` Dmitry Gutov 0 siblings, 1 reply; 10+ messages in thread From: Michael Albinus @ 2024-05-29 8:01 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Spencer Baugh, emacs-devel, johnw, spwhitton Dmitry Gutov <dmitry@gutov.dev> writes: Hi Dmitry, >> Unfortunately this is almost impossibly hard. But, I actually have >> worked extensively on doing this specific impossible thing (remote >> process APIs that are powerful enough to do this) so I will eventually >> try to implement them for Emacs and TRAMP. It would allow full >> make-process support in TRAMP as well as a make-pipe-process which >> represents a pipe existing on a remote system. > > On the remote, it would be fine if the pipe is not direct between such > processes, but goes through the shell, or maybe some other processes as > well (maybe a temp file?). That would still be faster than doing the > round-trip. If both processes are on different remote hosts, you have the problem how to transfer the tmpfile from one host to the other. You have no knowledge how these two hosts see each other. A special case is if both hosts are accessed via Tramp's scp method, and you can use the tramp-use-scp-direct-remote-copying user option. See (info "(tramp) Ssh setup") Best regards, Michael. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-29 8:01 ` Michael Albinus @ 2024-05-29 10:31 ` Dmitry Gutov 0 siblings, 0 replies; 10+ messages in thread From: Dmitry Gutov @ 2024-05-29 10:31 UTC (permalink / raw) To: Michael Albinus; +Cc: Spencer Baugh, emacs-devel, johnw, spwhitton Hi Michael, On 29/05/2024 11:01, Michael Albinus wrote: >>> Unfortunately this is almost impossibly hard. But, I actually have >>> worked extensively on doing this specific impossible thing (remote >>> process APIs that are powerful enough to do this) so I will eventually >>> try to implement them for Emacs and TRAMP. It would allow full >>> make-process support in TRAMP as well as a make-pipe-process which >>> represents a pipe existing on a remote system. >> On the remote, it would be fine if the pipe is not direct between such >> processes, but goes through the shell, or maybe some other processes as >> well (maybe a temp file?). That would still be faster than doing the >> round-trip. > If both processes are on different remote hosts, you have the problem > how to transfer the tmpfile from one host to the other. You have no > knowledge how these two hosts see each other. > > A special case is if both hosts are accessed via Tramp's scp method, and > you can use the tramp-use-scp-direct-remote-copying user option. > See (info "(tramp) Ssh setup") The case I had in mind is when both hosts are the same. This one can be optimized at least in theory. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Native OS pipelines in eshell and Emacs 2024-05-29 1:43 ` Spencer Baugh 2024-05-29 2:08 ` Dmitry Gutov @ 2024-05-29 7:53 ` Michael Albinus 1 sibling, 0 replies; 10+ messages in thread From: Michael Albinus @ 2024-05-29 7:53 UTC (permalink / raw) To: Spencer Baugh; +Cc: Dmitry Gutov, emacs-devel, johnw, spwhitton Spencer Baugh <sbaugh@janestreet.com> writes: Hi Spencer, > It would be doubly interesting if we manage to implement it so > that Tramp would be able to connect two processes directly without > round-tripping the i/o from the remote host to local and back to > remote. That's a major source of latency in project-find-regexp > on remote. > > Unfortunately this is almost impossibly hard. But, I actually have > worked extensively on doing this specific impossible thing (remote > process APIs that are powerful enough to do this) so I will eventually > try to implement them for Emacs and TRAMP. It would allow full > make-process support in TRAMP as well as a make-pipe-process which > represents a pipe existing on a remote system. Much appreciated! A while ago, on a boring rainy day, I thought about adding "Implement remote make-pipe-process" to my TODO. I didn't because I have no idea yet how to do. Best regards, Michael. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-29 10:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh 2024-05-28 16:33 ` Jim Porter 2024-05-28 18:38 ` Spencer Baugh 2024-05-28 19:56 ` Jim Porter 2024-05-29 1:21 ` Dmitry Gutov 2024-05-29 1:43 ` Spencer Baugh 2024-05-29 2:08 ` Dmitry Gutov 2024-05-29 8:01 ` Michael Albinus 2024-05-29 10:31 ` Dmitry Gutov 2024-05-29 7:53 ` Michael Albinus
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).