Native OS pipelines in eshell and Emacs

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Native OS pipelines in eshell and Emacs
@ 2024-05-28 14:42 Spencer Baugh
  2024-05-28 16:33 ` Jim Porter
  2024-05-29  1:21 ` Dmitry Gutov
  0 siblings, 2 replies; 10+ messages in thread
From: Spencer Baugh @ 2024-05-28 14:42 UTC (permalink / raw)
  To: emacs-devel; +Cc: johnw, spwhitton, dmitry

eshell "pipelines" operate by reading the data in from one process and
writing it out to the next process.  Thus the data flows from one
process, to Emacs, and then to the next process.

This differs from the native OS capability to make a pipe and pass one
end down to one process as stdout, and the other end down to another
process as stdin, which is more efficient.

Has there been work before on supporting this in eshell and Emacs?

I saw there was the new em-extpipe capability in eshell, but that
requires different syntax bypasses Eshell's usual features - adding the
ability to create pipelines natively in Emacs would allow the normal
Eshell syntax to just be efficient on its own.  (This would, I think,
remove the need for extpipe)

This same ability would be useful for project.el, where it would be nice
for the output of project-files (e.g. "git ls-files") to be piped
directly to xargs grep for commands like project-find-regexp, instead of
sending the data through Emacs which makes it substantially slower.

Specifically, the new feature would be something like an :stdin argument
to make-process which allows a make-pipe-process (or other process) to
be passed as stdin, and grabs the output file descriptor from that
process (what Emacs would normally read) and passes it down as stdin for
the new process instead.

I'm working on a patch to do this, but I wonder if any work has been
done on this before?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh
@ 2024-05-28 16:33 ` Jim Porter
  2024-05-28 18:38   ` Spencer Baugh
  2024-05-29  1:21 ` Dmitry Gutov
  1 sibling, 1 reply; 10+ messages in thread
From: Jim Porter @ 2024-05-28 16:33 UTC (permalink / raw)
  To: Spencer Baugh, emacs-devel; +Cc: johnw, spwhitton, dmitry

On 5/28/2024 7:42 AM, Spencer Baugh wrote:
> eshell "pipelines" operate by reading the data in from one process and
> writing it out to the next process.  Thus the data flows from one
> process, to Emacs, and then to the next process.
> 
> This differs from the native OS capability to make a pipe and pass one
> end down to one process as stdout, and the other end down to another
> process as stdin, which is more efficient.
> 
> Has there been work before on supporting this in eshell and Emacs?

I've worked on this previously, and even put together a hacky sketch of 
how it would work before abandoning it due to a bunch of complexities in 
Eshell that make this infeasible (in my opinion, anyway). As the current 
Eshell maintainer, I'd (softly) suggest you turn back now, unless you're 
willing to go down a fairly deep rabbit hole.

I'll also note: the benefits here are also somewhat reduced by 
improvements to Eshell pipelines in Emacs 29. As of commit d7b89ea4077 
(bug#56025), piped processes in Eshell no longer use PTYs for output, 
which resulted in a ~35x improvement in my limited tests. (Still 5-10x 
slower than in Bash though.) I didn't test this extensively at the time 
though since the main goal was fixing incorrect behavior; the perf 
improvement was just a nice bonus.

> Specifically, the new feature would be something like an :stdin argument
> to make-process which allows a make-pipe-process (or other process) to
> be passed as stdin, and grabs the output file descriptor from that
> process (what Emacs would normally read) and passes it down as stdin for
> the new process instead.

It's not quite as simple as that, I'm afraid. The C side is perfectly 
reasonable I think, and would likely make some parts of Eshell easier to 
manage, but there still needs to be some extra sorcery for Eshell. 
Eshell commands can either be Lisp-based or they can be external 
programs. That sounds simple, but it's not actually possible to 
determine ahead of time which Eshell will choose.

Consider "cat". The implementation of "cat random" that Eshell uses 
depends on your cwd: if "random" is a regular file in your cwd, we use a 
Lisp implementation. But if your cwd is /dev, then "random" is a 
character device file, and the Lisp implementation replaces itself 
(*after* starting execution) with the external program. This makes it a 
lot harder to determine how to connect this command in a pipeline.

Another issue is Tramp. If Eshell runs each remote process as an 
independent 'make-process' invocation as it is today, then we're stuck 
with a whole lot of extra indirection, and any pipe (native or 
otherwise) would be *local* instead of remote (where we want it). This 
even applies to not-really-remote cases like sudo, which Eshell manages 
via Tramp.

Both of these cases are worked around via extpipes: in the former, the 
extpipe mandates that all connected commands are external programs, and 
in the latter, it constructs an 'sh' invocation that runs the entire 
pipeline as a unit on the remote host.

With enough work it might be possible to overcome some of these problems 
for Eshell, but I haven't been able to produce a satisfactory design for 
this that doesn't involve major incompatible changes.

It's a different strategy, but I wonder if improving the scheduling in 
Emacs' process handling would get us close to "native" performance here? 
See <https://tdodge.consulting/blog/eshell/background-output-thread> for 
a discussion of the issue and a WIP(?) fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-28 16:33 ` Jim Porter
@ 2024-05-28 18:38   ` Spencer Baugh
  2024-05-28 19:56     ` Jim Porter
  0 siblings, 1 reply; 10+ messages in thread
From: Spencer Baugh @ 2024-05-28 18:38 UTC (permalink / raw)
  To: emacs-devel

Jim Porter <jporterbugs@gmail.com> writes:
> On 5/28/2024 7:42 AM, Spencer Baugh wrote:
>> eshell "pipelines" operate by reading the data in from one process and
>> writing it out to the next process.  Thus the data flows from one
>> process, to Emacs, and then to the next process.
>> This differs from the native OS capability to make a pipe and pass
>> one
>> end down to one process as stdout, and the other end down to another
>> process as stdin, which is more efficient.
>> Has there been work before on supporting this in eshell and Emacs?
>
> I've worked on this previously, and even put together a hacky sketch
> of how it would work before abandoning it due to a bunch of
> complexities in Eshell that make this infeasible (in my opinion,
> anyway). As the current Eshell maintainer, I'd (softly) suggest you
> turn back now, unless you're willing to go down a fairly deep rabbit
> hole.

That is fair, but since supporting :stdin in Emacs would be useful for
project.el anyway, I'm motivated to do this even if eshell won't
immediately benefit.

So I'm especially interested in anything you have to share about the C
side of this, if anything.

> I'll also note: the benefits here are also somewhat reduced by
> improvements to Eshell pipelines in Emacs 29. As of commit d7b89ea4077
> (bug#56025), piped processes in Eshell no longer use PTYs for output,
> which resulted in a ~35x improvement in my limited tests. (Still 5-10x
> slower than in Bash though.) I didn't test this extensively at the
> time though since the main goal was fixing incorrect behavior; the
> perf improvement was just a nice bonus.
>
>> Specifically, the new feature would be something like an :stdin argument
>> to make-process which allows a make-pipe-process (or other process) to
>> be passed as stdin, and grabs the output file descriptor from that
>> process (what Emacs would normally read) and passes it down as stdin for
>> the new process instead.
>
> It's not quite as simple as that, I'm afraid. The C side is perfectly
> reasonable I think, and would likely make some parts of Eshell easier
> to manage, but there still needs to be some extra sorcery for
> Eshell. Eshell commands can either be Lisp-based or they can be
> external programs. That sounds simple, but it's not actually possible
> to determine ahead of time which Eshell will choose.
>
> Consider "cat". The implementation of "cat random" that Eshell uses
> depends on your cwd: if "random" is a regular file in your cwd, we use
> a Lisp implementation. But if your cwd is /dev, then "random" is a
> character device file, and the Lisp implementation replaces itself
> (*after* starting execution) with the external program. This makes it
> a lot harder to determine how to connect this command in a pipeline.
>
> Another issue is Tramp. If Eshell runs each remote process as an
> independent 'make-process' invocation as it is today, then we're stuck
> with a whole lot of extra indirection, and any pipe (native or
> otherwise) would be *local* instead of remote (where we want it). This
> even applies to not-really-remote cases like sudo, which Eshell
> manages via Tramp.
>
> Both of these cases are worked around via extpipes: in the former, the
> extpipe mandates that all connected commands are external programs,
> and in the latter, it constructs an 'sh' invocation that runs the
> entire pipeline as a unit on the remote host.

Ah, those are indeed real and annoying concerns.

But if extpipe is able to mandate this, doesn't that mean there is some
way to get this information?  That is, we're able to detect "all the
commands are external programs" and "we're running on the local host".

In that case, we could start by using the native pipes only when both
those conditions are true.

Or, slightly more aggressively: whenever two adjacent commands in an
eshell pipeline are both external programs on the local host.

> With enough work it might be possible to overcome some of these
> problems for Eshell, but I haven't been able to produce a satisfactory
> design for this that doesn't involve major incompatible changes.
>
> It's a different strategy, but I wonder if improving the scheduling in
> Emacs' process handling would get us close to "native" performance
> here? See
> <https://tdodge.consulting/blog/eshell/background-output-thread> for a
> discussion of the issue and a WIP(?) fix.

Very interesting idea, but I personally am motivated to get performance
which is not just closer to shell, but equivalent to shell.  Right now I
only infrequently use eshell, because every time I write and wait for a
pipeline I think "I would have to wait less if I was in M-x shell", and
I'd like to never think that :)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-28 18:38   ` Spencer Baugh
@ 2024-05-28 19:56     ` Jim Porter
  0 siblings, 0 replies; 10+ messages in thread
From: Jim Porter @ 2024-05-28 19:56 UTC (permalink / raw)
  To: Spencer Baugh, emacs-devel

On 5/28/2024 11:38 AM, Spencer Baugh wrote:
> Jim Porter <jporterbugs@gmail.com> writes:
>> I've worked on this previously, and even put together a hacky sketch
>> of how it would work before abandoning it due to a bunch of
>> complexities in Eshell that make this infeasible (in my opinion,
>> anyway). As the current Eshell maintainer, I'd (softly) suggest you
>> turn back now, unless you're willing to go down a fairly deep rabbit
>> hole.
> 
> That is fair, but since supporting :stdin in Emacs would be useful for
> project.el anyway, I'm motivated to do this even if eshell won't
> immediately benefit.
> 
> So I'm especially interested in anything you have to share about the C
> side of this, if anything.

Don't let me discourage you from the C side of things. :) I think it 
would probably be pretty useful overall.

As for advice on the C implementation, I don't have anything much in 
particular. I *think* this should be reasonably straightforward. The one 
wrinkle I got a bit bogged down in was with 'make-pipe-process'. That 
really creates 2 pipes under the hood, and for various reasons, I wanted 
to be able to use a single OS pipe from that, where I could (depending 
on the specific task) either write into it (pushing the data into stdin 
of a child process) or read out of it (pulling the data out of a child 
process's stdout).

The above would probably make it easier for Eshell, since my plan was to 
use 'make-pipe-process' and hook it up into the Eshell IO code. (See 
'eshell-get-target' and friends.)

> Ah, those are indeed real and annoying concerns.
> 
> But if extpipe is able to mandate this, doesn't that mean there is some
> way to get this information?  That is, we're able to detect "all the
> commands are external programs" and "we're running on the local host".

Well, extpipe does this by forcing the user to knowingly opt into a 
totally different behavior (just one that we hope is similar to the 
"default").

One option might be to use an enhanced 'make-pipe-process' object (as 
described above) as the common glue between commands in an Eshell 
pipeline. If you had the ability to set a process's :stdin to a 
"pipe-process", and then Eshell had the ability to use a "pipe-process" 
as an output target, I think that would get you (close to?) native pipes 
while still allowing one end of the pipe to be something other than a 
process.

This doesn't help the Tramp case, since the pipe is still on your local 
host, but the Tramp case is hard enough that it might be best just to 
focus on not making the situation any worse than it already is.

One final wrinkle to all this: I've been intending for quite a long time 
to add the ability to pipe data in Eshell *to* Lisp code; currently you 
can only pipe *from* Lisp code. I want to be extra careful that any 
changes to Eshell pipelines won't render these plans impossible. (This 
project has taken a long time since I want to be sure that when I 
eventually merge these Eshell "pseudo-processes", I get them as close to 
correct the first time as I can; incompatible changes down the line 
would be a pain.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh
  2024-05-28 16:33 ` Jim Porter
@ 2024-05-29  1:21 ` Dmitry Gutov
  2024-05-29  1:43   ` Spencer Baugh
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Gutov @ 2024-05-29  1:21 UTC (permalink / raw)
  To: Spencer Baugh, emacs-devel; +Cc: johnw, spwhitton

On 28/05/2024 17:42, Spencer Baugh wrote:
> This same ability would be useful for project.el, where it would be nice
> for the output of project-files (e.g. "git ls-files") to be piped
> directly to xargs grep for commands like project-find-regexp, instead of
> sending the data through Emacs which makes it substantially slower.

I have indeed been considering something like that for project-files -> 
xref-matches-in-files. But mostly in broad strokes.

> Specifically, the new feature would be something like an :stdin argument
> to make-process which allows a make-pipe-process (or other process) to
> be passed as stdin, and grabs the output file descriptor from that
> process (what Emacs would normally read) and passes it down as stdin for
> the new process instead.

It would be doubly interesting if we manage to implement it so that 
Tramp would be able to connect two processes directly without 
round-tripping the i/o from the remote host to local and back to remote. 
That's a major source of latency in project-find-regexp on remote.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-29  1:21 ` Dmitry Gutov
@ 2024-05-29  1:43   ` Spencer Baugh
  2024-05-29  2:08     ` Dmitry Gutov
  2024-05-29  7:53     ` Michael Albinus
  0 siblings, 2 replies; 10+ messages in thread
From: Spencer Baugh @ 2024-05-29  1:43 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel, johnw, spwhitton

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On Tue, May 28, 2024, 9:21 PM Dmitry Gutov <dmitry@gutov.dev> wrote:

> > Specifically, the new feature would be something like an :stdin argument
> > to make-process which allows a make-pipe-process (or other process) to
> > be passed as stdin, and grabs the output file descriptor from that
> > process (what Emacs would normally read) and passes it down as stdin for
> > the new process instead.
>
> It would be doubly interesting if we manage to implement it so that
> Tramp would be able to connect two processes directly without
> round-tripping the i/o from the remote host to local and back to remote.
> That's a major source of latency in project-find-regexp on remote.
>

Unfortunately this is almost impossibly hard.  But, I actually have worked
extensively on doing this specific impossible thing (remote process APIs
that are powerful enough to do this) so I will eventually try to implement
them for Emacs and TRAMP.  It would allow full make-process support in
TRAMP as well as a make-pipe-process which represents a pipe existing on a
remote system.

Anyway, in the short term it will probably only work efficiently for local
processes, with remote project-files having to roundtrip through the local
Emacs.

[-- Attachment #2: Type: text/html, Size: 1838 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-29  1:43   ` Spencer Baugh
@ 2024-05-29  2:08     ` Dmitry Gutov
  2024-05-29  8:01       ` Michael Albinus
  2024-05-29  7:53     ` Michael Albinus
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Gutov @ 2024-05-29  2:08 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: emacs-devel, johnw, spwhitton

On 29/05/2024 04:43, Spencer Baugh wrote:

>     It would be doubly interesting if we manage to implement it so that
>     Tramp would be able to connect two processes directly without
>     round-tripping the i/o from the remote host to local and back to
>     remote.
>     That's a major source of latency in project-find-regexp on remote.
> 
> 
> Unfortunately this is almost impossibly hard.  But, I actually have 
> worked extensively on doing this specific impossible thing (remote 
> process APIs that are powerful enough to do this) so I will eventually 
> try to implement them for Emacs and TRAMP.  It would allow full 
> make-process support in TRAMP as well as a make-pipe-process which 
> represents a pipe existing on a remote system.

On the remote, it would be fine if the pipe is not direct between such 
processes, but goes through the shell, or maybe some other processes as 
well (maybe a temp file?). That would still be faster than doing the 
round-trip.

> Anyway, in the short term it will probably only work efficiently for 
> local processes, with remote project-files having to roundtrip through 
> the local Emacs.

Yes, well. For local processes, it would at least help with 
"asynchronous regexp search" Xref UI.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-29  2:08     ` Dmitry Gutov
@ 2024-05-29  8:01       ` Michael Albinus
  2024-05-29 10:31         ` Dmitry Gutov
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Albinus @ 2024-05-29  8:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Spencer Baugh, emacs-devel, johnw, spwhitton

Dmitry Gutov <dmitry@gutov.dev> writes:

Hi Dmitry,

>> Unfortunately this is almost impossibly hard.  But, I actually have
>> worked extensively on doing this specific impossible thing (remote
>> process APIs that are powerful enough to do this) so I will eventually
>> try to implement them for Emacs and TRAMP.  It would allow full
>> make-process support in TRAMP as well as a make-pipe-process which
>> represents a pipe existing on a remote system.
>
> On the remote, it would be fine if the pipe is not direct between such
> processes, but goes through the shell, or maybe some other processes as
> well (maybe a temp file?). That would still be faster than doing the
> round-trip.

If both processes are on different remote hosts, you have the problem
how to transfer the tmpfile from one host to the other. You have no
knowledge how these two hosts see each other.

A special case is if both hosts are accessed via Tramp's scp method, and
you can use the tramp-use-scp-direct-remote-copying user option.
See (info "(tramp) Ssh setup")

Best regards, Michael.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-29  8:01       ` Michael Albinus
@ 2024-05-29 10:31         ` Dmitry Gutov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry Gutov @ 2024-05-29 10:31 UTC (permalink / raw)
  To: Michael Albinus; +Cc: Spencer Baugh, emacs-devel, johnw, spwhitton

Hi Michael,

On 29/05/2024 11:01, Michael Albinus wrote:
>>> Unfortunately this is almost impossibly hard.  But, I actually have
>>> worked extensively on doing this specific impossible thing (remote
>>> process APIs that are powerful enough to do this) so I will eventually
>>> try to implement them for Emacs and TRAMP.  It would allow full
>>> make-process support in TRAMP as well as a make-pipe-process which
>>> represents a pipe existing on a remote system.
>> On the remote, it would be fine if the pipe is not direct between such
>> processes, but goes through the shell, or maybe some other processes as
>> well (maybe a temp file?). That would still be faster than doing the
>> round-trip.
> If both processes are on different remote hosts, you have the problem
> how to transfer the tmpfile from one host to the other. You have no
> knowledge how these two hosts see each other.
> 
> A special case is if both hosts are accessed via Tramp's scp method, and
> you can use the tramp-use-scp-direct-remote-copying user option.
> See (info "(tramp) Ssh setup")

The case I had in mind is when both hosts are the same. This one can be 
optimized at least in theory.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Native OS pipelines in eshell and Emacs
  2024-05-29  1:43   ` Spencer Baugh
  2024-05-29  2:08     ` Dmitry Gutov
@ 2024-05-29  7:53     ` Michael Albinus
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Albinus @ 2024-05-29  7:53 UTC (permalink / raw)
  To: Spencer Baugh; +Cc: Dmitry Gutov, emacs-devel, johnw, spwhitton

Spencer Baugh <sbaugh@janestreet.com> writes:

Hi Spencer,

>     It would be doubly interesting if we manage to implement it so
>     that Tramp would be able to connect two processes directly without
>     round-tripping the i/o from the remote host to local and back to
>     remote.  That's a major source of latency in project-find-regexp
>     on remote.
>
> Unfortunately this is almost impossibly hard.  But, I actually have
> worked extensively on doing this specific impossible thing (remote
> process APIs that are powerful enough to do this) so I will eventually
> try to implement them for Emacs and TRAMP.  It would allow full
> make-process support in TRAMP as well as a make-pipe-process which
> represents a pipe existing on a remote system.

Much appreciated! A while ago, on a boring rainy day, I thought about
adding "Implement remote make-pipe-process" to my TODO. I didn't because
I have no idea yet how to do.

Best regards, Michael.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-29 10:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 14:42 Native OS pipelines in eshell and Emacs Spencer Baugh
2024-05-28 16:33 ` Jim Porter
2024-05-28 18:38   ` Spencer Baugh
2024-05-28 19:56     ` Jim Porter
2024-05-29  1:21 ` Dmitry Gutov
2024-05-29  1:43   ` Spencer Baugh
2024-05-29  2:08     ` Dmitry Gutov
2024-05-29  8:01       ` Michael Albinus
2024-05-29 10:31         ` Dmitry Gutov
2024-05-29  7:53     ` Michael Albinus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).