all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Daniel Pittman <slippycheeze@google.com>
To: emacs-devel <emacs-devel@gnu.org>
Subject: `process-send-*` performance seems ... bad?
Date: Tue, 18 Jun 2019 17:41:22 -0400	[thread overview]
Message-ID: <CAC45yQuk95Y0VC_pYf6RKpS0pU+=BYZtgsc1A=bUk2088C67LA@mail.gmail.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 4811 bytes --]

G'day.  I've been trying to work out of the fzf "fuzzy finder" system could
be used effectively within Emacs; it has the closest to "right" algorithm.

Unfortunately it is written in Go, and the upstream are not interested in
making it usable as either a pipe-process ala ispell pipe mode, or as a
library so that such a front-end could be developed.

It does support batch matching, though, by invoking the command with the
filter string, and sending the possible completions through it.  Given
that, I figured I'd test it out and see if the cost of running that
external process was too high.

The problem seems to be that Emacs writes (or reads) from the piped process
unreasonably slowly.  Output to a buffer, or a process filter (#'ignore)
seems to make no difference, so I'm assuming the write performance rather
than read-and-store performance is the issue.

My question here is: is there something I'm doing wrong, or is this the
very best I could expect from Emacs with an external process?

Emacs itself is built from git at commit 44a086e, and doesn't have
debugging stuff enabled or whatever.  macOS x86-64 system.  Have not tested
on GNU/Linux or anything at this time.

It doesn't look to me like pipes are inherently the problem on the
platform, given this testing shows that the same test command in the same
situation (pipe input and output) performs significantly faster that Emacs
will in the tests following:

] mkfifo in out
] cat obarray.data > in & wc < out & time fzf -f "hook mode" < in > out ;
wait
[1] 45534
[2] 45535
[1]  - 45534 done       gunzip -c obarray.data.gz > in
     604     604   15841
fzf -f "hook mode" < in > out  0.05s user 0.01s system 184% cpu 0.033 total
[2]  + 45535 done       wc < out
] wc obarray.data
   59176   59708  955548 obarray.data

Use of named pipes and cat to ensure with certainty we had the same basic
throughput model without, eg, the shell doing something too smart or
whatever.

That obarray.data file contains the result of inserting all symbols in the
default obarray in my Emacs into a file, newline separated, and then saving
it.  This is a hard, but not impossible, test of the filtering system I
think, intended to show bottlenecks.  Smaller data sets were second in line
to test, but...

Unfortunately Emacs performance is shockingly slow; all code is
byte-compiled and warmed up before the benchmark tests are run.  This is
the result of 5 warmup rounds, followed by 10 measured rounds, through the
external process.  GC effectively disabled, and forced before each
benchmark pass, to avoid that shaking things up.

Note that the cost of fzf is ~ 0.05s, so we attribute best case 1.747
seconds to Emacs.

========== pass 1 ==========
write-with-temp-buffer: 5.1272s with 0 GCs taking 0.0000s: (5.127218 0 0.0)
write-with-two-calls: 1.9974s with 0 GCs taking 0.0000s: (1.99745 0 0.0)
write-with-one-call: 1.6549s with 0 GCs taking 0.0000s: (1.6549049999999998
0 0.0)
========== pass 2 ==========
write-with-temp-buffer: 5.0828s with 0 GCs taking 0.0000s: (5.082764 0 0.0)
write-with-two-calls: 2.0536s with 0 GCs taking 0.0000s: (2.053559 0 0.0)
write-with-one-call: 1.7901s with 0 GCs taking 0.0000s: (1.790106 0 0.0)
========== pass 3 ==========
write-with-temp-buffer: 5.1175s with 0 GCs taking 0.0000s: (5.117541 0 0.0)
write-with-two-calls: 2.1159s with 0 GCs taking 0.0000s: (2.115893 0 0.0)
write-with-one-call: 1.7970s with 0 GCs taking 0.0000s: (1.797042 0 0.0)

The code behind those results is attached, and the obarray content is
available on request – it is still 375k gzipped, and isn't particularly
interesting, as most any "production" Emacs instance will have a large
enough obarray to be interesting.

I also pre-calculate the set of strings, to avoid that being part of the
benchmark; handling the output is also skipped for the same reason.

Using `tail -n 604` to get roughly the same amount of output, and to force
the command to run identically (ie: consume all input before output) makes
no difference, as expected given the observed performance of fzf above.
(middle run, others identical and omitted for brevity.)

write-with-temp-buffer: 5.4254s with 0 GCs taking 0.0000s: (5.425443 0 0.0)
write-with-two-calls: 2.3092s with 0 GCs taking 0.0000s: (2.309152 0 0.0)
write-with-one-call: 2.0219s with 0 GCs taking 0.0000s: (2.021912 0 0.0)

Using a newline rather than null byte delimiter makes no difference, as you
would probably expect.  Using a list rather than a vector for `data`, the
input, makes the temp buffer case ~ 1.3 seconds faster consistently, but
the other two cases ... roughly equally, maybe a fraction slower, but this
is adhoc enough that I think it is not statistically significant.

Thanks.

[-- Attachment #1.2: Type: text/html, Size: 5426 bytes --]

[-- Attachment #2: fzf-benchmark.el --]
[-- Type: application/octet-stream, Size: 3770 bytes --]

;; -*- lexical -*-
(let* ((gc-cons-percentage most-positive-fixnum)
       (gc-cons-threshold most-positive-fixnum)
       (fn
        (lambda (writer)
          (let ((default-directory
                  "/gssh:slippycheeze.c.googlers.com:/google/src/cloud/slippycheeze/work/google3/production/storage/chronicle/")
                (process-adaptive-read-buffering t)
                stdout stderr proc)
            (unwind-protect
                (progn
                  ;; done here because we need to safely unwind if generate-new-buffer
                  ;; fails the second time, but not the first.
                  (setq stdout (generate-new-buffer " *fzf-stdout*")
                        stderr (generate-new-buffer " *fzf-stderr*")
                        proc (make-process :name "fzf-for-benchmark"
                                           :buffer stdout
                                           :stderr stderr
                                           :sentinel #'ignore
                                           ;; :command '("fzf" "--read0" "--print0" "-f" "hook mode")
                                           :command '("tail" "-n" "604")
                                           :coding 'utf-8-auto
                                           :noquery t
                                           :connection-type 'pipe))
                  (funcall writer proc)
                  (process-send-eof proc)
                  ;; wait for completion
                  (while (accept-process-output proc 1))
                  ;; this is where we would parse and return results, but for benchmarking we skip that.
                  ))
            (progn
              (when (process-live-p proc)
                (kill-process proc t))
              (when (and stdout (buffer-name stdout))
                (kill-buffer stdout))
              (when (and stderr (buffer-name stderr))
                (kill-buffer stderr))))))
       (data (let (tmp)
               (mapatoms (lambda (symbol) (setq tmp (cons (symbol-name symbol) tmp))))
               (apply #'vector (reverse tmp))))
       (write-with-temp-buffer (lambda (proc)
                                 (with-temp-buffer
                                   (mapc (lambda (choice) (insert choice "\0")) data)
                                   (process-send-region proc (point-min) (point-max)))))
       (write-with-two-calls (lambda (proc)
                               (mapc (lambda (choice)
                                       (process-send-string proc choice)
                                       (process-send-string proc "\0"))
                                     data)))
       (write-with-one-call (lambda (proc)
                              (mapc (lambda (choice) (process-send-string proc (format "%s\0" choice))) data))))
  ;; (debug)
  ;; (funcall fn write-with-temp-buffer)
  (let ((compiled-fn (byte-compile fn)))
    (with-current-buffer-window "fzf benchmark results" nil nil
      (pop-to-buffer (current-buffer))
      (dotimes (pass 3)
        (insert (format "========== pass %d ==========" (1+ pass)) "\n")
        (mapc (lambda (test)
                (let* ((testfn (byte-compile (symbol-value test)))
                       ;; warm up the function.
                       (garbage-collect)
                       (warmup (dotimes (warmup-counter 5) (funcall compiled-fn testfn)))
                       (garbage-collect)
                       (result (benchmark-run 10 (funcall compiled-fn testfn))))
                  (insert (format "%18s: %.4fs with %d GCs taking %.4fs: %s" test (nth 0 result) (nth 1 result) (nth 2 result) result) "\n")))
              '(write-with-temp-buffer write-with-two-calls write-with-one-call)))
      (goto-char (point-min)))))

             reply	other threads:[~2019-06-18 21:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-18 21:41 Daniel Pittman [this message]
2019-06-19 14:57 ` `process-send-*` performance seems ... bad? Noam Postavsky
2019-06-19 18:02   ` Daniel Pittman
2019-06-19 18:15     ` Noam Postavsky
2019-06-19 19:35       ` Daniel Pittman
2019-06-19 15:44 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC45yQuk95Y0VC_pYf6RKpS0pU+=BYZtgsc1A=bUk2088C67LA@mail.gmail.com' \
    --to=slippycheeze@google.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.