Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: "Clément Pit-Claudel" <cpitclaudel@gmail.com>
To: Emacs developers <emacs-devel@gnu.org>
Subject: Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster?
Date: Sat, 25 Mar 2017 00:06:18 -0400	[thread overview]
Message-ID: <e96962c4-f0b4-9769-87c3-94548c44ce22@gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 6042 bytes --]

Hi all,

I can't make sense of the following three observations: adding a useless (copy-sequence …) and discarding the result makes my code roughly 20-times faster, sending strings of length 2^16+1 to a subprocess is several times slower than sending strings of length 2^16, and sending 2^6 strings of length 2^10 is much faster than sending a single string of length 2^16.

Here's my function (read.py is a python script containing just "import sys; sys.stdin.read()"):

(defun bench (cargo-cult size)
  (let* ((message (make-string size ?a))
         (process-connection-type nil)
         (proc (start-process "reader" nil "python2" "read.py")))
    (dotimes (_ 100)
      (when cargo-cult
        (copy-sequence message))
      (process-send-string proc message))
    (kill-process proc)))

And here's the experimental data:

(benchmark-run 1 (bench t 65536))   ;; (0.048931082 0 0.0)
(benchmark-run 1 (bench nil 65536)) ;; (1.844878298 0 0.0)

(benchmark-run 1 (bench t 65537))   ;; (2.028768849 0 0.0)
(benchmark-run 1 (bench nil 65537)) ;; (1.175113209 0 0.0)

From this we learn that:

(1) Setting cargo-cult to *t* and running a useless `(copy-sequence message)' makes the code roughly 40 times *faster* on strings of length 65536.
(2) Adding one to the string length yields horrible performance (3.2MB/s), but at least the version without the useless copies is faster.

What's going on here?  strace suggests that the second benchmark is spending 98% of its time in pselect6 calls.

I confirmed that this isn't dependent on the version of Emacs in use (the same issue appears in Emacs 24.5).  It is not dependent on the program on the receiving end of the pipe, either (I tried with cat, read.py as above, and a minimal C program (attached)).  Replacing copy-sequence with concat preserves the surprising behavior, but using (ignore message) makes it disappear (everything is slow).  There are no transmission issues: the receiving side does read a string of "a"s of the right length.  Using a program other than Emacs to send the same strings doesn't show these performance issues.

Here is a bit more data, generated using the attached loop.sh and send-string-cli.el

;; (benchmark-send-string CARGO-CULT N-INNER-LOOP STRING-SIZE READ-PROGRAM) → BENCHMARK-RESULT
(benchmark-send-string t   200  1024  "python2" "read.py") → (0.067296848         1   0.004180474)
(benchmark-send-string nil 200  1024  "python2" "read.py") → (0.061934555         0   0.0)
(benchmark-send-string t   200  16384 "python2" "read.py") → (0.159620501         12  0.04479207899999999)
(benchmark-send-string nil 200  16384 "python2" "read.py") → (1.556031994         0   0.0)
(benchmark-send-string t   200  65536 "python2" "read.py") → (0.334425146         46  0.16486607800000005)
(benchmark-send-string nil 200  65536 "python2" "read.py") → (3.4018179170000002  0   0.0)
(benchmark-send-string t   200  65537 "python2" "read.py") → (12.418711961        46  0.239757055)
(benchmark-send-string nil 200  65537 "python2" "read.py") → (6.102755695         0   0.0)
(benchmark-send-string t   200  1024  "cat")               → (0.10330761          1   0.004161472999999999)
(benchmark-send-string nil 200  1024  "cat")               → (0.12250944600000001 1   0.0035327989999999997)
(benchmark-send-string t   200  16384 "cat")               → (1.3808745009999999  24  0.09748934199999998)
(benchmark-send-string nil 200  16384 "cat")               → (2.521698033         12  0.05619970500000001)
(benchmark-send-string t   200  65536 "cat")               → (4.120669224         99  0.4042515129999999)
(benchmark-send-string nil 200  65536 "cat")               → (9.412745340999999   50  0.218435563)
(benchmark-send-string t   200  65537 "cat")               → (12.176100455        99  0.502071993)
(benchmark-send-string nil 200  65537 "cat")               → (6.322310322000001   51  0.22764341199999993)
(benchmark-send-string t   200  1024  "read")              → (0.026363418         1   0.004118800999999998)
(benchmark-send-string nil 200  1024  "read")              → (0.061907866         0   0.0)
(benchmark-send-string t   200  16384 "read")              → (0.091360628         12  0.038132967000000004)
(benchmark-send-string nil 200  16384 "read")              → (0.12431489          0   0.0)
(benchmark-send-string t   200  65536 "read")              → (0.18765710700000002 46  0.14072835099999995)
(benchmark-send-string nil 200  65536 "read")              → (3.397287521         0   0.0)
(benchmark-send-string t   200  65537 "read")              → (12.389248452999999  46  0.257472908)
(benchmark-send-string nil 200  65537 "read")              → (3.258524461         0   0.0)
(benchmark-send-string t   1600 8192  "read")              → (0.258897164 49 0.17045154200000004)
(benchmark-send-string nil 1600 8192  "read")              → (0.07095391 0 0.0)

This shows one more thing:

(3) Sending 1600 strings of 8192 characters is much faster than sending 200 strings of length 65536.

I managed to reproduce these results on two different physical GNU/Linux machines.  Their configurations are:

    GNU Emacs 26.0.50 (build 9, x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 2017-03-11 built on clem-w50-mint
    Repository revision: fe68818be218537d341365c014f1a6234c9705d6
    Windowing system distributor 'The X.Org Foundation', version 11.0.11804000
    System Description:	Linux Mint 18.1 Serena

    In GNU Emacs 26.0.50.2 (x86_64-pc-linux-gnu, GTK+ Version 3.18.9) of 2017-02-08 built on goodsight
    Repository revision: eff901b8a39f42ddedf4c1db833b9071cae5962f
    System Description:	Ubuntu 16.04.2 LTS

Can anyone reproduce these issues, and help shed light on them? I'm trying to understand three things:

* Why does running additional, supposedly useless code speed things up so much? (1)
* Why does sending strings longer 2**16 characters slow things down so much? (2)
* What is the right way to get reliable performance from process-send-string? (3)

Thanks!
Clément.

[-- Attachment #2: loop.sh --]
[-- Type: application/x-shellscript, Size: 385 bytes --]

[-- Attachment #3: read.c --]
[-- Type: text/x-csrc, Size: 143 bytes --]

#include <unistd.h>

#define BUFSIZE (1 << 20)
char BUFFER[BUFSIZE];

int main () {
  while (read(STDIN_FILENO, BUFFER, BUFSIZE) > 0) {

  }
}

[-- Attachment #4: read.py --]
[-- Type: text/x-python-script, Size: 28 bytes --]

import sys
sys.stdin.read()

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: send-string-cli.el --]
[-- Type: text/x-emacs-lisp; name="send-string-cli.el", Size: 1059 bytes --]

(defun bench (cargo-cult size)
  (let* ((message (make-string size ?a))
         (process-connection-type nil)
         (proc (start-process "reader" nil "python2" "read.py")))
    (dotimes (_ 100)
      (when cargo-cult
        (copy-sequence message))
      (process-send-string proc message))
    (kill-process proc)))

(defun benchmark-send-string (cargo-cult n-inner-loops size prog &rest args)
  (let* ((message (make-string size ?a))
         (process-connection-type nil)
         (proc (apply #'start-process "reader" nil prog args)))
    (dotimes (_ n-inner-loops)
      (when cargo-cult
        (copy-sequence message))
      (process-send-string proc message))
    (kill-process proc)))

(pcase-let ((`(,cargo-cult ,n-inner-loops ,size . ,prog) argv))
  (setq cargo-cult (read cargo-cult))
  (setq size (string-to-number size))
  (setq n-inner-loops (string-to-number n-inner-loops))
  (let ((form `(benchmark-send-string ,cargo-cult ,n-inner-loops ,size ,@prog)))
    (message "%S → %S" form (eval `(benchmark-run 3 ,form)))))

(setq argv nil)

next             reply	other threads:[~2017-03-25  4:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-25  4:06 Clément Pit-Claudel [this message]
2017-03-25  4:24 ` Why does adding a useless copy-sequence and discarding the result make my ELisp 40 times faster? Clément Pit-Claudel
2017-03-25  7:05   ` Eli Zaretskii
2017-03-25 10:38     ` Andreas Politz
2017-03-25 13:49       ` Clément Pit-Claudel
2017-03-25 16:38         ` Andreas Politz
2017-03-25 17:02           ` Clément Pit-Claudel
2017-03-25 17:26             ` Eli Zaretskii
2017-03-25 17:40               ` Andreas Politz
2017-03-25 17:49                 ` Eli Zaretskii
2017-03-25 17:37             ` Andreas Politz
2017-03-25 13:45     ` Clément Pit-Claudel
2017-03-25 10:47   ` Andreas Politz
2017-03-25 13:54     ` Clément Pit-Claudel
2017-03-26 13:14 ` Stephen Leake
2017-03-26 13:35   ` Clément Pit-Claudel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e96962c4-f0b4-9769-87c3-94548c44ce22@gmail.com \
    --to=cpitclaudel@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.