`process-send-*` performance seems ... bad?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* `process-send-*` performance seems ... bad?
@ 2019-06-18 21:41 Daniel Pittman
  2019-06-19 14:57 ` Noam Postavsky
  2019-06-19 15:44 ` Eli Zaretskii
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Pittman @ 2019-06-18 21:41 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1.1: Type: text/plain, Size: 4811 bytes --]

G'day.  I've been trying to work out of the fzf "fuzzy finder" system could
be used effectively within Emacs; it has the closest to "right" algorithm.

Unfortunately it is written in Go, and the upstream are not interested in
making it usable as either a pipe-process ala ispell pipe mode, or as a
library so that such a front-end could be developed.

It does support batch matching, though, by invoking the command with the
filter string, and sending the possible completions through it.  Given
that, I figured I'd test it out and see if the cost of running that
external process was too high.

The problem seems to be that Emacs writes (or reads) from the piped process
unreasonably slowly.  Output to a buffer, or a process filter (#'ignore)
seems to make no difference, so I'm assuming the write performance rather
than read-and-store performance is the issue.

My question here is: is there something I'm doing wrong, or is this the
very best I could expect from Emacs with an external process?

Emacs itself is built from git at commit 44a086e, and doesn't have
debugging stuff enabled or whatever.  macOS x86-64 system.  Have not tested
on GNU/Linux or anything at this time.

It doesn't look to me like pipes are inherently the problem on the
platform, given this testing shows that the same test command in the same
situation (pipe input and output) performs significantly faster that Emacs
will in the tests following:

] mkfifo in out
] cat obarray.data > in & wc < out & time fzf -f "hook mode" < in > out ;
wait
[1] 45534
[2] 45535
[1]  - 45534 done       gunzip -c obarray.data.gz > in
     604     604   15841
fzf -f "hook mode" < in > out  0.05s user 0.01s system 184% cpu 0.033 total
[2]  + 45535 done       wc < out
] wc obarray.data
   59176   59708  955548 obarray.data

Use of named pipes and cat to ensure with certainty we had the same basic
throughput model without, eg, the shell doing something too smart or
whatever.

That obarray.data file contains the result of inserting all symbols in the
default obarray in my Emacs into a file, newline separated, and then saving
it.  This is a hard, but not impossible, test of the filtering system I
think, intended to show bottlenecks.  Smaller data sets were second in line
to test, but...

Unfortunately Emacs performance is shockingly slow; all code is
byte-compiled and warmed up before the benchmark tests are run.  This is
the result of 5 warmup rounds, followed by 10 measured rounds, through the
external process.  GC effectively disabled, and forced before each
benchmark pass, to avoid that shaking things up.

Note that the cost of fzf is ~ 0.05s, so we attribute best case 1.747
seconds to Emacs.

========== pass 1 ==========
write-with-temp-buffer: 5.1272s with 0 GCs taking 0.0000s: (5.127218 0 0.0)
write-with-two-calls: 1.9974s with 0 GCs taking 0.0000s: (1.99745 0 0.0)
write-with-one-call: 1.6549s with 0 GCs taking 0.0000s: (1.6549049999999998
0 0.0)
========== pass 2 ==========
write-with-temp-buffer: 5.0828s with 0 GCs taking 0.0000s: (5.082764 0 0.0)
write-with-two-calls: 2.0536s with 0 GCs taking 0.0000s: (2.053559 0 0.0)
write-with-one-call: 1.7901s with 0 GCs taking 0.0000s: (1.790106 0 0.0)
========== pass 3 ==========
write-with-temp-buffer: 5.1175s with 0 GCs taking 0.0000s: (5.117541 0 0.0)
write-with-two-calls: 2.1159s with 0 GCs taking 0.0000s: (2.115893 0 0.0)
write-with-one-call: 1.7970s with 0 GCs taking 0.0000s: (1.797042 0 0.0)

The code behind those results is attached, and the obarray content is
available on request – it is still 375k gzipped, and isn't particularly
interesting, as most any "production" Emacs instance will have a large
enough obarray to be interesting.

I also pre-calculate the set of strings, to avoid that being part of the
benchmark; handling the output is also skipped for the same reason.

Using `tail -n 604` to get roughly the same amount of output, and to force
the command to run identically (ie: consume all input before output) makes
no difference, as expected given the observed performance of fzf above.
(middle run, others identical and omitted for brevity.)

write-with-temp-buffer: 5.4254s with 0 GCs taking 0.0000s: (5.425443 0 0.0)
write-with-two-calls: 2.3092s with 0 GCs taking 0.0000s: (2.309152 0 0.0)
write-with-one-call: 2.0219s with 0 GCs taking 0.0000s: (2.021912 0 0.0)

Using a newline rather than null byte delimiter makes no difference, as you
would probably expect.  Using a list rather than a vector for `data`, the
input, makes the temp buffer case ~ 1.3 seconds faster consistently, but
the other two cases ... roughly equally, maybe a fraction slower, but this
is adhoc enough that I think it is not statistically significant.

Thanks.

[-- Attachment #1.2: Type: text/html, Size: 5426 bytes --]

[-- Attachment #2: fzf-benchmark.el --]
[-- Type: application/octet-stream, Size: 3770 bytes --]

;; -*- lexical -*-
(let* ((gc-cons-percentage most-positive-fixnum)
       (gc-cons-threshold most-positive-fixnum)
       (fn
        (lambda (writer)
          (let ((default-directory
                  "/gssh:slippycheeze.c.googlers.com:/google/src/cloud/slippycheeze/work/google3/production/storage/chronicle/")
                (process-adaptive-read-buffering t)
                stdout stderr proc)
            (unwind-protect
                (progn
                  ;; done here because we need to safely unwind if generate-new-buffer
                  ;; fails the second time, but not the first.
                  (setq stdout (generate-new-buffer " *fzf-stdout*")
                        stderr (generate-new-buffer " *fzf-stderr*")
                        proc (make-process :name "fzf-for-benchmark"
                                           :buffer stdout
                                           :stderr stderr
                                           :sentinel #'ignore
                                           ;; :command '("fzf" "--read0" "--print0" "-f" "hook mode")
                                           :command '("tail" "-n" "604")
                                           :coding 'utf-8-auto
                                           :noquery t
                                           :connection-type 'pipe))
                  (funcall writer proc)
                  (process-send-eof proc)
                  ;; wait for completion
                  (while (accept-process-output proc 1))
                  ;; this is where we would parse and return results, but for benchmarking we skip that.
                  ))
            (progn
              (when (process-live-p proc)
                (kill-process proc t))
              (when (and stdout (buffer-name stdout))
                (kill-buffer stdout))
              (when (and stderr (buffer-name stderr))
                (kill-buffer stderr))))))
       (data (let (tmp)
               (mapatoms (lambda (symbol) (setq tmp (cons (symbol-name symbol) tmp))))
               (apply #'vector (reverse tmp))))
       (write-with-temp-buffer (lambda (proc)
                                 (with-temp-buffer
                                   (mapc (lambda (choice) (insert choice "\0")) data)
                                   (process-send-region proc (point-min) (point-max)))))
       (write-with-two-calls (lambda (proc)
                               (mapc (lambda (choice)
                                       (process-send-string proc choice)
                                       (process-send-string proc "\0"))
                                     data)))
       (write-with-one-call (lambda (proc)
                              (mapc (lambda (choice) (process-send-string proc (format "%s\0" choice))) data))))
  ;; (debug)
  ;; (funcall fn write-with-temp-buffer)
  (let ((compiled-fn (byte-compile fn)))
    (with-current-buffer-window "fzf benchmark results" nil nil
      (pop-to-buffer (current-buffer))
      (dotimes (pass 3)
        (insert (format "========== pass %d ==========" (1+ pass)) "\n")
        (mapc (lambda (test)
                (let* ((testfn (byte-compile (symbol-value test)))
                       ;; warm up the function.
                       (garbage-collect)
                       (warmup (dotimes (warmup-counter 5) (funcall compiled-fn testfn)))
                       (garbage-collect)
                       (result (benchmark-run 10 (funcall compiled-fn testfn))))
                  (insert (format "%18s: %.4fs with %d GCs taking %.4fs: %s" test (nth 0 result) (nth 1 result) (nth 2 result) result) "\n")))
              '(write-with-temp-buffer write-with-two-calls write-with-one-call)))
      (goto-char (point-min)))))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: `process-send-*` performance seems ... bad?
  2019-06-18 21:41 `process-send-*` performance seems ... bad? Daniel Pittman
@ 2019-06-19 14:57 ` Noam Postavsky
  2019-06-19 18:02   ` Daniel Pittman
  2019-06-19 15:44 ` Eli Zaretskii
  1 sibling, 1 reply; 6+ messages in thread
From: Noam Postavsky @ 2019-06-19 14:57 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: emacs-devel

On Tue, 18 Jun 2019 at 17:55, Daniel Pittman <slippycheeze@google.com> wrote:

> ] cat obarray.data > in & wc < out & time fzf -f "hook mode" < in > out ; wait

> The code behind those results is attached

          (let ((default-directory

"/gssh:slippycheeze.c.googlers.com:/google/src/cloud/slippycheeze/work/google3/production/storage/chronicle/")

It looks you're running locally in the shell case, but over TRAMP in
the elisp case. I imagine that could have a pretty large effect on
timing. Is there a reason for binding default-directory like this?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: `process-send-*` performance seems ... bad?
  2019-06-18 21:41 `process-send-*` performance seems ... bad? Daniel Pittman
  2019-06-19 14:57 ` Noam Postavsky
@ 2019-06-19 15:44 ` Eli Zaretskii
  1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2019-06-19 15:44 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: emacs-devel

> From: Daniel Pittman <slippycheeze@google.com>
> Date: Tue, 18 Jun 2019 17:41:22 -0400
> 
> My question here is: is there something I'm doing wrong, or is this the very best I could expect from Emacs
> with an external process?

It's hard to say: the program and the setup are entirely non-trivial,
so I wouldn't expect anyone to know the answer off hand.

A useful first step towards understanding what's going on is to
profile your runs using the built-in profiler, and post the full
profiles here.  It is advisable to load all the Lisp files involved in
this as .el, not .elc, because then the profile will be much more
detailed.

Armed with the profile, we can see where the program spends most of
its time, and then take it from there.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: `process-send-*` performance seems ... bad?
  2019-06-19 14:57 ` Noam Postavsky
@ 2019-06-19 18:02   ` Daniel Pittman
  2019-06-19 18:15     ` Noam Postavsky
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Pittman @ 2019-06-19 18:02 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

Ah, no, that is running locally – the call to make-process is not using
file handlers, and the purpose of setting that default directory was to
ensure that it didn't.

That said, I retested with `(default-directory (expand-file-name "~"))`
there, and performance is identical:

========== pass 1 ==========
write-with-temp-buffer: 5.8827s with 0 GCs taking 0.0000s: (5.882696 0 0.0)
write-with-two-calls: 2.4786s with 0 GCs taking 0.0000s: (2.478616 0 0.0)
write-with-one-call: 1.9073s with 0 GCs taking 0.0000s: (1.907256 0 0.0)
========== pass 2 ==========
write-with-temp-buffer: 5.6867s with 0 GCs taking 0.0000s: (5.686659 0 0.0)
write-with-two-calls: 2.2408s with 0 GCs taking 0.0000s: (2.240782 0 0.0)
write-with-one-call: 1.8802s with 0 GCs taking 0.0000s: (1.8801780000000001
0 0.0)
========== pass 3 ==========
write-with-temp-buffer: 5.7305s with 0 GCs taking 0.0000s:
(5.730538999999999 0 0.0)
write-with-two-calls: 2.3394s with 0 GCs taking 0.0000s: (2.339416 0 0.0)
write-with-one-call: 1.9350s with 0 GCs taking 0.0000s: (1.9350479999999999
0 0.0)


On Wed, Jun 19, 2019 at 10:58 AM Noam Postavsky <npostavs@gmail.com> wrote:

> On Tue, 18 Jun 2019 at 17:55, Daniel Pittman <slippycheeze@google.com>
> wrote:
>
> > ] cat obarray.data > in & wc < out & time fzf -f "hook mode" < in > out
> ; wait
>
> > The code behind those results is attached
>
>           (let ((default-directory
>
> "/gssh:slippycheeze.c.googlers.com:
> /google/src/cloud/slippycheeze/work/google3/production/storage/chronicle/")
>
> It looks you're running locally in the shell case, but over TRAMP in
> the elisp case. I imagine that could have a pretty large effect on
> timing. Is there a reason for binding default-directory like this?
>

[-- Attachment #2: Type: text/html, Size: 2279 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: `process-send-*` performance seems ... bad?
  2019-06-19 18:02   ` Daniel Pittman
@ 2019-06-19 18:15     ` Noam Postavsky
  2019-06-19 19:35       ` Daniel Pittman
  0 siblings, 1 reply; 6+ messages in thread
From: Noam Postavsky @ 2019-06-19 18:15 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: emacs-devel

On Wed, 19 Jun 2019 at 14:03, Daniel Pittman <slippycheeze@google.com> wrote:
>
> Ah, no, that is running locally – the call to make-process is not using file handlers

Oh right, make-process in master has support for file-handlers, but I
forgot that it's only if given an explicit :file-handlers t.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: `process-send-*` performance seems ... bad?
  2019-06-19 18:15     ` Noam Postavsky
@ 2019-06-19 19:35       ` Daniel Pittman
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Pittman @ 2019-06-19 19:35 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

On Wed, Jun 19, 2019 at 2:15 PM Noam Postavsky <npostavs@gmail.com> wrote:

> On Wed, 19 Jun 2019 at 14:03, Daniel Pittman <slippycheeze@google.com>
> wrote:
> >
> > Ah, no, that is running locally – the call to make-process is not using
> file handlers
>
> Oh right, make-process in master has support for file-handlers, but I
> forgot that it's only if given an explicit :file-handlers t.
>

Absolutely legitimate thing to question, and I'm glad you did.  It was good
to eliminate it as a possible cause of problems in my testing.

I'm working on the profiling that Eli requested, which is very reasonable,
and on ensuring the test code is simpler and more hermetic.   It'll take a
bit of time to get that done, and early results suggest that it'll probably
require getting out the C-level profiler to get useful information, so
that'll be on the list.

Please forgive delays in getting that through.

[-- Attachment #2: Type: text/html, Size: 1345 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-19 19:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-18 21:41 `process-send-*` performance seems ... bad? Daniel Pittman
2019-06-19 14:57 ` Noam Postavsky
2019-06-19 18:02   ` Daniel Pittman
2019-06-19 18:15     ` Noam Postavsky
2019-06-19 19:35       ` Daniel Pittman
2019-06-19 15:44 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).