[PATCH] Re: Bignum performance (was: Shrinking the C core)

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 12:38                   ` Emanuel Berg
@ 2023-08-11 14:07                     ` Ihor Radchenko
  2023-08-11 18:06                       ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-11 14:07 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

Emanuel Berg <incal@dataswamp.org> writes:

>> Maybe we could somehow re-use the already allocated bignum
>> objects, similar to what is done for cons cells (see
>> src/alloc.c:Fcons).
>
> Sounds reasonable :)

And... is has been already done, actually.
allocate_vectorlike calls allocate_vector_from_block, which re-uses
pre-allocated objects.

And looking into the call graph, this exact branch calling
allocate_vector_from_block is indeed called for the bignums:

   33.05%     0.00%  emacs    [unknown]                       [.] 0000000000000000
            |
            ---0
               |          
               |--28.04%--allocate_vectorlike
               |          |          
               |           --27.78%--allocate_vector_from_block (inlined)
               |                     |          
               |                     |--2.13%--next_vector (inlined)
               |                     |          
               |                      --0.74%--setup_on_free_list (inlined)

If it manually cut off `allocate_vector_from_block', the benchmark time
increases twice. So, there is already some improvement coming from
re-using allocated memory.

I looked deeper into the code tried to cut down on unnecessary looping
over the pre-allocated `vector_free_lists'. See the attached patch.

Without the patch:

 perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
 2.321 s
 
     28.60%  emacs    emacs                        [.] allocate_vectorlike
    24.36%  emacs    emacs                        [.] process_mark_stack
     3.76%  emacs    libgmp.so.10.5.0             [.] __gmpz_sizeinbase
     3.59%  emacs    emacs                        [.] pdumper_marked_p_impl
     3.53%  emacs    emacs                        [.] mark_char_table

With the patch:

perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
1.968 s

    33.17%  emacs    emacs                          [.] process_mark_stack
     5.51%  emacs    libgmp.so.10.5.0               [.] __gmpz_sizeinbase
     5.05%  emacs    emacs                          [.] mark_char_table
     4.88%  emacs    emacs                          [.] pdumper_marked_p_impl
     3.30%  emacs    emacs                          [.] pdumper_set_marked_impl
...
     2.52%  emacs    emacs                          [.] allocate_vectorlike

allocate_vectorlike clearly takes a lot less time by not trying to loop
over all the ~500 empty elements of vector_free_lists.

We can further get rid of the GC by temporarily disabling it (just for
demonstration):

(let ((beg (float-time)))
  (setq gc-cons-threshold most-positive-fixnum)
  (fib 10000 1000)
  (message "%.3f s" (- (float-time) beg)) )

perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

    17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
     7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
     6.51%  emacs    emacs                 [.] arith_driver
     6.03%  emacs    libc.so.6             [.] malloc
     5.57%  emacs    emacs                 [.] allocate_vectorlike
     5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
     4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
     3.72%  emacs    emacs                 [.] check_number_coerce_marker
     3.35%  emacs    fib.eln               [.] F666962_fib_0
     3.29%  emacs    emacs                 [.] allocate_pseudovector
     2.30%  emacs    emacs                 [.] Flss

Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time.

I am not sure what differs between Elisp gmp bindings and analogous SBCL
binding so that SBCL is so much faster.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: allocate_vector_from_block.diff --]
[-- Type: text/x-patch, Size: 1712 bytes --]

diff --git a/src/alloc.c b/src/alloc.c
index 17ca5c725d0..62e96b4c9de 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p)
    vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
 
 static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
 
 /* Singly-linked list of large vectors.  */
 
@@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t nbytes)
   set_next_vector (v, vector_free_lists[vindex]);
   ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
   vector_free_lists[vindex] = v;
+  if ( vindex < vector_free_lists_min_idx )
+    vector_free_lists_min_idx = vindex;
 }
 
 /* Get a new vector block.  */
@@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes)
   /* Next, check free lists containing larger vectors.  Since
      we will split the result, we should have remaining space
      large enough to use for one-slot vector at least.  */
-  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
-       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+  for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN), vector_free_lists_min_idx );
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++, vector_free_lists_min_idx++)
     if (vector_free_lists[index])
       {
 	/* This vector is larger than requested.  */
@@ -3413,6 +3416,7 @@ sweep_vectors (void)
   gcstat.total_vectors = 0;
   gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
   memset (vector_free_lists, 0, sizeof (vector_free_lists));
+  vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
 
   /* Looking through vector blocks.  */
 

[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 14:07                     ` [PATCH] " Ihor Radchenko
@ 2023-08-11 18:06                       ` Emanuel Berg
  2023-08-11 19:41                         ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-11 18:06 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> Maybe we could somehow re-use the already allocated bignum
>>> objects, similar to what is done for cons cells (see
>>> src/alloc.c:Fcons).
>>
>> Sounds reasonable :)
>
> And... is has been already done, actually.
> allocate_vectorlike calls allocate_vector_from_block, which
> re-uses pre-allocated objects.
>
> And looking into the call graph, this exact branch calling
> allocate_vector_from_block is indeed called for the bignums [...]

Are we talking a list of Emacs C functions executing with the
corresponding times they have been in execution in a tree data
structure? :O

E.g. where do we find allocate_vectorlike ?

> With the patch:
>
> perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 1.968 s
>
>     33.17%  emacs    emacs                          [.] process_mark_stack
>      5.51%  emacs    libgmp.so.10.5.0               [.] __gmpz_sizeinbase
>      5.05%  emacs    emacs                          [.] mark_char_table
>      4.88%  emacs    emacs                          [.] pdumper_marked_p_impl
>      3.30%  emacs    emacs                          [.] pdumper_set_marked_impl
> ...
>      2.52%  emacs    emacs                          [.] allocate_vectorlike
>
> allocate_vectorlike clearly takes a lot less time by not trying to loop
> over all the ~500 empty elements of vector_free_lists.
>
> We can further get rid of the GC by temporarily disabling it (just for
> demonstration):
>
> (let ((beg (float-time)))
>   (setq gc-cons-threshold most-positive-fixnum)
>   (fib 10000 1000)
>   (message "%.3f s" (- (float-time) beg)) )
>
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
>
>     17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>      7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>      6.51%  emacs    emacs                 [.] arith_driver
>      6.03%  emacs    libc.so.6             [.] malloc
>      5.57%  emacs    emacs                 [.] allocate_vectorlike
>      5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
>      4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
>      3.72%  emacs    emacs                 [.] check_number_coerce_marker
>      3.35%  emacs    fib.eln               [.] F666962_fib_0
>      3.29%  emacs    emacs                 [.] allocate_pseudovector
>      2.30%  emacs    emacs                 [.] Flss
>
> Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time.
>
> I am not sure what differs between Elisp gmp bindings and analogous SBCL
> binding so that SBCL is so much faster.
>
> diff --git a/src/alloc.c b/src/alloc.c
> index 17ca5c725d0..62e96b4c9de 100644
> --- a/src/alloc.c
> +++ b/src/alloc.c
> @@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p)
>     vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
>
>  static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
> +static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>  /* Singly-linked list of large vectors.  */
>
> @@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t
>  nbytes)
>    set_next_vector (v, vector_free_lists[vindex]);
>    ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
>    vector_free_lists[vindex] = v;
> +  if ( vindex < vector_free_lists_min_idx )
> +    vector_free_lists_min_idx = vindex;
>  }
>
>  /* Get a new vector block.  */
> @@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes)
>    /* Next, check free lists containing larger vectors.  Since
>       we will split the result, we should have remaining space
>       large enough to use for one-slot vector at least.  */
> -  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
> -       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
> +  for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN),
>  vector_free_lists_min_idx );
> +       index < VECTOR_MAX_FREE_LIST_INDEX; index++,
>  vector_free_lists_min_idx++)
>      if (vector_free_lists[index])
>        {
>  	/* This vector is larger than requested.  */
> @@ -3413,6 +3416,7 @@ sweep_vectors (void)
>    gcstat.total_vectors = 0;
>    gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
> +  vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>    /* Looking through vector blocks.  */

Amazing! :O

See if you can do my original test, which was 1-3 Elisp,
byte-compiled Elisp, and natively compiled Elisp, and the
Common Lisp execution (on your computer), if you'd like.

Actually it is a bit of a bummer to the community since Emacs
is like THE portal into Lisp. We should have the best Lisp in
the business, and I don't see why not? Emacs + SBCL + CL +
Elisp anyone?

I.e. real CL not the cl- which is actually in Elisp. Not that
there is anything wrong with that! On the contrary ;)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 18:06                       ` Emanuel Berg
@ 2023-08-11 19:41                         ` Ihor Radchenko
  2023-08-11 19:50                           ` Emanuel Berg
  2023-08-11 22:46                           ` Emanuel Berg
  0 siblings, 2 replies; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-11 19:41 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> And... is has been already done, actually.
>> allocate_vectorlike calls allocate_vector_from_block, which
>> re-uses pre-allocated objects.
>>
>> And looking into the call graph, this exact branch calling
>> allocate_vector_from_block is indeed called for the bignums [...]
>
> Are we talking a list of Emacs C functions executing with the
> corresponding times they have been in execution in a tree data
> structure? :O

That's what GNU perf does - it is a sampling profiler in GNU/Linux.
The Elisp equivalent is profiler.el, but it does not reveal underlying C
functions.

> E.g. where do we find allocate_vectorlike ?

I have listed the commands I used (from terminal):

1. perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
   <records CPU stats while running emacs>
2. perf report
   <displays the stats>

You need Emacs compiled with debug symbols the get meaningful output.

See more at https://www.brendangregg.com/perf.html

> See if you can do my original test, which was 1-3 Elisp,
> byte-compiled Elisp, and natively compiled Elisp, and the
> Common Lisp execution (on your computer), if you'd like.

As you wish:

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [5.783 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [1.961 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [1.901 s]
$ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/fib.cl [0.007 s]

without the patch (on my system)

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [6.546 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [2.498 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [2.518 s]


Also, the patch gives improvements for more than just bignums.
I ran elisp-benchmarks
(https://elpa.gnu.org/packages/elisp-benchmarks.html) and got


(before the patch)
  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.70 |       0.06 |       1 |        0.76 |            0.07 |
  | bubble-no-cons     |           1.17 |       0.00 |       0 |        1.17 |            0.02 |
  | bytecomp           |           1.74 |       0.29 |      13 |        2.03 |            0.12 |
  | dhrystone          |           2.30 |       0.00 |       0 |        2.30 |            0.07 |
  | eieio              |           1.25 |       0.13 |       7 |        1.38 |            0.03 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.53 |       0.00 |       0 |        1.53 |            0.03 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.48 |       0.00 |       0 |        1.48 |            0.04 |
  | inclist            |           1.07 |       0.00 |       0 |        1.07 |            0.02 |
  | inclist-type-hints |           1.00 |       0.00 |       0 |        1.00 |            0.07 |
  | listlen-tc         |           0.13 |       0.00 |       0 |        0.13 |            0.03 |
  | map-closure        |           5.26 |       0.00 |       0 |        5.26 |            0.09 |
  | nbody              |           1.61 |       0.17 |       1 |        1.78 |            0.06 |
  | pack-unpack        |           0.31 |       0.02 |       1 |        0.33 |            0.00 |
  | pack-unpack-old    |           0.50 |       0.05 |       3 |        0.55 |            0.02 |
  | pcase              |           1.85 |       0.00 |       0 |        1.85 |            0.05 |
  | pidigits           |           4.41 |       0.96 |      17 |        5.37 |            0.13 |
  | scroll             |           0.64 |       0.00 |       0 |        0.64 |            0.01 |
  | smie               |           1.59 |       0.04 |       2 |        1.63 |            0.03 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          28.54 |       1.72 |      45 |       30.26 |            0.26 |


(after the patch)
  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.68 |       0.05 |       1 |        0.73 |            0.04 |
  | bubble-no-cons     |           1.00 |       0.00 |       0 |        1.00 |            0.04 |
  | bytecomp           |           1.60 |       0.23 |      13 |        1.82 |            0.16 |
  | dhrystone          |           2.03 |       0.00 |       0 |        2.03 |            0.05 |
  | eieio              |           1.08 |       0.12 |       7 |        1.20 |            0.07 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.44 |       0.00 |       0 |        1.44 |            0.12 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.36 |       0.00 |       0 |        1.36 |            0.09 |
  | inclist            |           1.00 |       0.00 |       0 |        1.00 |            0.00 |
  | inclist-type-hints |           1.00 |       0.00 |       0 |        1.00 |            0.07 |
  | listlen-tc         |           0.11 |       0.00 |       0 |        0.11 |            0.02 |
  | map-closure        |           4.91 |       0.00 |       0 |        4.91 |            0.12 |
  | nbody              |           1.47 |       0.17 |       1 |        1.64 |            0.08 |
  | pack-unpack        |           0.29 |       0.02 |       1 |        0.31 |            0.01 |
  | pack-unpack-old    |           0.43 |       0.05 |       3 |        0.48 |            0.01 |
  | pcase              |           1.84 |       0.00 |       0 |        1.84 |            0.07 |
  | pidigits           |           3.16 |       0.94 |      17 |        4.11 |            0.10 |
  | scroll             |           0.58 |       0.00 |       0 |        0.58 |            0.00 |
  | smie               |           1.40 |       0.04 |       2 |        1.44 |            0.06 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          25.38 |       1.62 |      45 |       27.00 |            0.32 |

About ~10% improvement, with each individual benchmark being faster.

Note how fibn test takes 0.00 seconds. It is limited to fixnum range.

> Actually it is a bit of a bummer to the community since Emacs
> is like THE portal into Lisp. We should have the best Lisp in
> the business, and I don't see why not? Emacs + SBCL + CL +
> Elisp anyone?

This is a balancing act. Elisp is tailored for Emacs as an editor. So,
trade-offs are inevitable. I am skeptical about Elisp overperforming CL.
But it does not mean that we should not try to improve things.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 19:41                         ` Ihor Radchenko
@ 2023-08-11 19:50                           ` Emanuel Berg
  2023-08-12  8:24                             ` Ihor Radchenko
  2023-08-11 22:46                           ` Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-11 19:50 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> See if you can do my original test, which was 1-3 Elisp,
>> byte-compiled Elisp, and natively compiled Elisp, and the
>> Common Lisp execution (on your computer), if you'd like.
>
> As you wish:
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [5.783 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [1.961 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [1.901 s]
> $ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/fib.cl [0.007 s]
>
> without the patch (on my system)
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [6.546 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [2.498 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [2.518 s]

The stats seem to speak one language ...

> Also, the patch gives improvements for more than just
> bignums
>   | test | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s)
> | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble | 0.70 | 0.06 | 1 | 0.76 | 0.07 |
>   | bubble-no-cons | 1.17 | 0.00 | 0 | 1.17 | 0.02 |
>   | bytecomp | 1.74 | 0.29 | 13 | 2.03 | 0.12 |
>   | dhrystone | 2.30 | 0.00 | 0 | 2.30 | 0.07 |
>   | eieio | 1.25 | 0.13 | 7 | 1.38 | 0.03 |
>   | fibn | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-named-let | 1.53 | 0.00 | 0 | 1.53 | 0.03 |
>   | fibn-rec | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-tc | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | flet | 1.48 | 0.00 | 0 | 1.48 | 0.04 |
>   | inclist | 1.07 | 0.00 | 0 | 1.07 | 0.02 |
>   | inclist-type-hints | 1.00 | 0.00 | 0 | 1.00 | 0.07 |
>   | listlen-tc | 0.13 | 0.00 | 0 | 0.13 | 0.03 |
>   | map-closure | 5.26 | 0.00 | 0 | 5.26 | 0.09 |
>   | nbody | 1.61 | 0.17 | 1 | 1.78 | 0.06 |
>   | pack-unpack | 0.31 | 0.02 | 1 | 0.33 | 0.00 |
>   | pack-unpack-old | 0.50 | 0.05 | 3 | 0.55 | 0.02 |
>   | pcase | 1.85 | 0.00 | 0 | 1.85 | 0.05 |
>   | pidigits | 4.41 | 0.96 | 17 | 5.37 | 0.13 |
>   | scroll | 0.64 | 0.00 | 0 | 0.64 | 0.01 |
>   | smie | 1.59 | 0.04 | 2 | 1.63 | 0.03 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total | 28.54 | 1.72 | 45 | 30.26 | 0.26 |
>
> (after the patch)
>   | test | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s)
> | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble | 0.68 | 0.05 | 1 | 0.73 | 0.04 |
>   | bubble-no-cons | 1.00 | 0.00 | 0 | 1.00 | 0.04 |
>   | bytecomp | 1.60 | 0.23 | 13 | 1.82 | 0.16 |
>   | dhrystone | 2.03 | 0.00 | 0 | 2.03 | 0.05 |
>   | eieio | 1.08 | 0.12 | 7 | 1.20 | 0.07 |
>   | fibn | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-named-let | 1.44 | 0.00 | 0 | 1.44 | 0.12 |
>   | fibn-rec | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-tc | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | flet | 1.36 | 0.00 | 0 | 1.36 | 0.09 |
>   | inclist | 1.00 | 0.00 | 0 | 1.00 | 0.00 |
>   | inclist-type-hints | 1.00 | 0.00 | 0 | 1.00 | 0.07 |
>   | listlen-tc | 0.11 | 0.00 | 0 | 0.11 | 0.02 |
>   | map-closure | 4.91 | 0.00 | 0 | 4.91 | 0.12 |
>   | nbody | 1.47 | 0.17 | 1 | 1.64 | 0.08 |
>   | pack-unpack | 0.29 | 0.02 | 1 | 0.31 | 0.01 |
>   | pack-unpack-old | 0.43 | 0.05 | 3 | 0.48 | 0.01 |
>   | pcase | 1.84 | 0.00 | 0 | 1.84 | 0.07 |
>   | pidigits | 3.16 | 0.94 | 17 | 4.11 | 0.10 |
>   | scroll | 0.58 | 0.00 | 0 | 0.58 | 0.00 |
>   | smie | 1.40 | 0.04 | 2 | 1.44 | 0.06 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total | 25.38 | 1.62 | 45 | 27.00 | 0.32 |
>
> About ~10% improvement, with each individual benchmark being faster.

Ten percent? We take it :)

> Note how fibn test takes 0.00 seconds. It is limited to
> fixnum range.

What does that say/mean?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 19:41                         ` Ihor Radchenko
  2023-08-11 19:50                           ` Emanuel Berg
@ 2023-08-11 22:46                           ` Emanuel Berg
  2023-08-12  8:30                             ` Ihor Radchenko
  1 sibling, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-11 22:46 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> Are we talking a list of Emacs C functions executing with
>> the corresponding times they have been in execution in
>> a tree data structure? :O
>
> That's what GNU perf does - it is a sampling profiler in
> GNU/Linux. The Elisp equivalent is profiler.el, but it does
> not reveal underlying C functions.

Ah, we are compiling C in a special way to see this with the
tool later.

What one should do then is run it for 100 hours for 100 Emacs
users' arbitrary Emacs use, then we would see what everyone
was up to in the C part as well.

Some would say even slow C is pretty fast but as we just saw
even that can be improved a lot actually ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 19:50                           ` Emanuel Berg
@ 2023-08-12  8:24                             ` Ihor Radchenko
  2023-08-12 16:03                               ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-12  8:24 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> Note how fibn test takes 0.00 seconds. It is limited to
>> fixnum range.
>
> What does that say/mean?

It tells that normal int operations are much, much faster compared to bigint.
So, your benchmark is rather esoteric if we consider normal usage patterns.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 22:46                           ` Emanuel Berg
@ 2023-08-12  8:30                             ` Ihor Radchenko
  2023-08-12 16:22                               ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-12  8:30 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> What one should do then is run it for 100 hours for 100 Emacs
> users' arbitrary Emacs use, then we would see what everyone
> was up to in the C part as well.

There are known parts of Emacs C code that could see performance
optimization. But we need someone to actually go ahead and provide
patches.

For example, regexp search can see more optimizations (e.g. see
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=63225). As you can
imagine, it is rather commonly used in Emacs as an editor.

Another example is marker processing - Emacs re-adjusts all the markers
in buffer by altering each single marker in O(N_markers). As the number
of markers grow, we get a problem. See
https://yhetil.org/emacs-devel/jwvsfntduas.fsf-monnier+emacs@gnu.org,
for example (note that the new overlay implementation already provides
the necessary data structures).

etc etc.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12  8:24                             ` Ihor Radchenko
@ 2023-08-12 16:03                               ` Emanuel Berg
  2023-08-13  9:09                                 ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-12 16:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> Note how fibn test takes 0.00 seconds. It is limited to
>>> fixnum range.
>>
>> What does that say/mean?
>
> It tells that normal int operations are much, much faster
> compared to bigint. So, your benchmark is rather esoteric if
> we consider normal usage patterns.

Didn't you provide a whole set of benchmarks with an
approximated gain of 10%? Maybe some esoteric nature is
built-in in the benchmark concept ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12  8:30                             ` Ihor Radchenko
@ 2023-08-12 16:22                               ` Emanuel Berg
  2023-08-13  9:12                                 ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-12 16:22 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> What one should do then is run it for 100 hours for 100
>> Emacs users' arbitrary Emacs use, then we would see what
>> everyone was up to in the C part as well.
>
> There are known parts of Emacs C code that could see
> performance optimization. But we need someone to actually go
> ahead and provide patches.

So to make Emacs faster, we need to:

1. Fix certain individual and identified areas in C that are
   currently slow for known reasons of the properties of the
   algorithms and/or data structure involved.

2. On a system level, find and change areas in C that have to
   do with how the Lisp model is implemented and
   upheld generally.

3. I don't know where the native/byte compiler would go from
   there, it it depends how big changes are in step 2.

4. Actual and existing Elisp code doesn't have to be changed
   a lot, and it would still be Elisp only it would be
   compiled with the methods from the CL world and in
   particular from SBCL.

5. This would not be a complete integration between Elisp and
   CL, but since the methods, tools and supply chain from CL
   would now be working for Elisp, it would be a big step
   towards such a possible integration in the future,
   if desired.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12 16:03                               ` Emanuel Berg
@ 2023-08-13  9:09                                 ` Ihor Radchenko
  2023-08-13  9:49                                   ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-13  9:09 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> It tells that normal int operations are much, much faster
>> compared to bigint. So, your benchmark is rather esoteric if
>> we consider normal usage patterns.
>
> Didn't you provide a whole set of benchmarks with an
> approximated gain of 10%? Maybe some esoteric nature is
> built-in in the benchmark concept ...

The main problem your benchmark demonstrated is with bignum.
By accident, it also revealed slight inefficiency in vector allocation,
but this inefficiency is nowhere near SBCL 0.007 sec vs. Elisp 2.5 sec.

In practice, as more generic benchmarks demonstrated, we only had 10%
performance hit. Not something to claim that Elisp is much slower
compared to CL.

It would be more useful to compare CL with Elisp using less specialized
benchmarks that do not involve bignums. As Mattias commented, we do not
care much about bignum performance in Elisp - it is a rarely used
feature; we are content that it simply works, even if not fast, and the
core contributors (at least, Mattias) are not seeing improving bignums
as their priority.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12 16:22                               ` Emanuel Berg
@ 2023-08-13  9:12                                 ` Ihor Radchenko
  0 siblings, 0 replies; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-13  9:12 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> 4. Actual and existing Elisp code doesn't have to be changed
>    a lot, and it would still be Elisp only it would be
>    compiled with the methods from the CL world and in
>    particular from SBCL.
>
> 5. This would not be a complete integration between Elisp and
>    CL, but since the methods, tools and supply chain from CL
>    would now be working for Elisp, it would be a big step
>    towards such a possible integration in the future,
>    if desired.

I do not think that it is possible in practice. Elisp and CL internals
are different, and you cannot magically overcome this.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13  9:09                                 ` Ihor Radchenko
@ 2023-08-13  9:49                                   ` Emanuel Berg
  2023-08-13 10:21                                     ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-13  9:49 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> The main problem your benchmark demonstrated is with bignum.
> By accident, it also revealed slight inefficiency in vector
> allocation, but this inefficiency is nowhere near SBCL 0.007
> sec vs. Elisp 2.5 sec.

Yeah, we can't have that.

> In practice, as more generic benchmarks demonstrated, we
> only had 10% performance hit. Not something to claim that
> Elisp is much slower compared to CL.

What do you mean, generic +10% is a huge improvement.

> It would be more useful to compare CL with Elisp using less
> specialized benchmarks that do not involve bignums.
> As Mattias commented, we do not care much about bignum
> performance in Elisp - it is a rarely used feature; we are
> content that it simply works, even if not fast, and the core
> contributors (at least, Mattias) are not seeing improving
> bignums as their priority.

But didn't your patch do that already?

It would indicate that it is possible to do it all in/to
Elisp, which would be the best way to solve this problem _and_
not have any of the integration, maybe portability issues
described ...

So 1, the first explanation why CL is much faster is another
implementation of bignums handling which is faster in CL, if
that has already been solved here absolutely no reason not to
include it as 10% is a huge gain, even more so for a whole set
of benchmarks.

Instead of relying on a single benchmark, one should have
a set of benchmarks and every benchmark should have a purpose,
this doesn't have to be so involved tho, for example "bignums"
could be the purpose of my benchmark, so one would have
several, say a dozen, each with the purpose of slowing the
computer down with respect to some aspect or known
situation that one would try to provoke ... It can be
well-known algorithms for that matter.

One would then do the same thing in CL and see, where do CL
perform much better? The next question would be, why?

If it is just about piling up +10%, let's do it!

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13  9:49                                   ` Emanuel Berg
@ 2023-08-13 10:21                                     ` Ihor Radchenko
  2023-08-14  2:20                                       ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-13 10:21 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> In practice, as more generic benchmarks demonstrated, we
>> only had 10% performance hit. Not something to claim that
>> Elisp is much slower compared to CL.
>
> What do you mean, generic +10% is a huge improvement.

It is, but it is also tangent to comparison between Elisp and CL. The
main (AFAIU) difference between Elisp and CL is in how the bignums are
stored. Elisp uses its own internal object type while CL uses GMP's
native format. And we have huge overheads converting things
back-and-forth between GMP and Elisp formats. It is by choice. And my
patch did not do anything about this difference.

Also, +10% is just on my machine. We need someone else to test things
before jumping to far-reaching conclusions. I plan to submit the patch
in a less ad-hoc state later, as a separate ticket.

>> It would be more useful to compare CL with Elisp using less
>> specialized benchmarks that do not involve bignums.
>> As Mattias commented, we do not care much about bignum
>> performance in Elisp - it is a rarely used feature; we are
>> content that it simply works, even if not fast, and the core
>> contributors (at least, Mattias) are not seeing improving
>> bignums as their priority.
>
> But didn't your patch do that already?

No. The benchmark only compared between Elisp before/after the patch.
Not with CL.

> Instead of relying on a single benchmark, one should have
> a set of benchmarks and every benchmark should have a purpose,
> this doesn't have to be so involved tho, for example "bignums"
> could be the purpose of my benchmark, so one would have
> several, say a dozen, each with the purpose of slowing the
> computer down with respect to some aspect or known
> situation that one would try to provoke ... It can be
> well-known algorithms for that matter.
>
> One would then do the same thing in CL and see, where do CL
> perform much better? The next question would be, why?

Sure. Feel free to share such benchmark for Elisp vs. CL. I only know
the benchmark library for Elisp. No equivalent comparable benchmark for
CL.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13 10:21                                     ` Ihor Radchenko
@ 2023-08-14  2:20                                       ` Emanuel Berg
  2023-08-14  7:20                                         ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-14  2:20 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> In practice, as more generic benchmarks demonstrated, we
>>> only had 10% performance hit. Not something to claim that
>>> Elisp is much slower compared to CL.
>>
>> What do you mean, generic +10% is a huge improvement.
>
> It is, but it is also tangent to comparison between Elisp
> and CL. The main (AFAIU) difference between Elisp and CL is
> in how the bignums are stored. Elisp uses its own internal
> object type while CL uses GMP's native format.

GMP = GNU Multiple Precision Arithmetic Library.

https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library

> And we have huge overheads converting things back-and-forth
> between GMP and Elisp formats. It is by choice. And my patch
> did not do anything about this difference.

But that's all the better, your patch solved (very likely) the
problem and did so without causing havoc by trying to forcibly
merge opposing solutions.

And the method was: instead of reallocating new objects for
bignums, we are no reusing existing allocations for new data?

>>> It would be more useful to compare CL with Elisp using
>>> less specialized benchmarks that do not involve bignums.
>>> As Mattias commented, we do not care much about bignum
>>> performance in Elisp - it is a rarely used feature; we are
>>> content that it simply works, even if not fast, and the
>>> core contributors (at least, Mattias) are not seeing
>>> improving bignums as their priority.
>>
>> But didn't your patch do that already?
>
> No. The benchmark only compared between Elisp before/after
> the patch. Not with CL.

No, that much I understood. It was Elisp before and after the
patch, as you say. Isn't before/after all data you need? Nah,
it can be useful to have an external reference as well and
here were are also hoping we can use the benchmarks to answer
they question if CL is just so much faster in general, or if
there are certain areas where it excels - and if so - what
those areas are and what they contain to unlock all
that speed.

>> Instead of relying on a single benchmark, one should have
>> a set of benchmarks and every benchmark should have
>> a purpose, this doesn't have to be so involved tho, for
>> example "bignums" could be the purpose of my benchmark, so
>> one would have several, say a dozen, each with the purpose
>> of slowing the computer down with respect to some aspect or
>> known situation that one would try to provoke ... It can be
>> well-known algorithms for that matter.
>>
>> One would then do the same thing in CL and see, where do CL
>> perform much better? The next question would be, why?
>
> Sure. Feel free to share such benchmark for Elisp vs. CL.
> I only know the benchmark library for Elisp. No equivalent
> comparable benchmark for CL.

I'm working on it! This will be very interesting, for sure.

The need for speed - but in a very methodical way ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
@ 2023-08-14  6:28 Gerd Möllmann
  2023-08-14  6:56 ` Gerd Möllmann
  0 siblings, 1 reply; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-14  6:28 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

>> It is, but it is also tangent to comparison between Elisp
>> and CL. The main (AFAIU) difference between Elisp and CL is
>> in how the bignums are stored. Elisp uses its own internal
>> object type while CL uses GMP's native format.

Just for the record, SBCL/CMUCL don't use GMP.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  6:28 [PATCH] Re: Bignum performance (was: Shrinking the C core) Gerd Möllmann
@ 2023-08-14  6:56 ` Gerd Möllmann
  2023-08-14  7:04   ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-14  6:56 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

> Just for the record, SBCL/CMUCL don't use GMP.

Hm, thinking of this - did someone measure how much time is spent in 
malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL 
doesn't.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  6:56 ` Gerd Möllmann
@ 2023-08-14  7:04   ` Ihor Radchenko
  2023-08-14  7:35     ` Gerd Möllmann
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-14  7:04 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> Just for the record, SBCL/CMUCL don't use GMP.
>
> Hm, thinking of this - did someone measure how much time is spent in 
> malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL 
> doesn't.

https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost/

We can further get rid of the GC by temporarily disabling it (just for
demonstration):

(let ((beg (float-time)))
  (setq gc-cons-threshold most-positive-fixnum)
  (fib 10000 1000)
  (message "%.3f s" (- (float-time) beg)) )

perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

    17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
     7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
     6.51%  emacs    emacs                 [.] arith_driver
     6.03%  emacs    libc.so.6             [.] malloc
     5.57%  emacs    emacs                 [.] allocate_vectorlike
     5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
     4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
     3.72%  emacs    emacs                 [.] check_number_coerce_marker
     3.35%  emacs    fib.eln               [.] F666962_fib_0
     3.29%  emacs    emacs                 [.] allocate_pseudovector
     2.30%  emacs    emacs                 [.] Flss
-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  2:20                                       ` Emanuel Berg
@ 2023-08-14  7:20                                         ` Ihor Radchenko
  0 siblings, 0 replies; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-14  7:20 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> And the method was: instead of reallocating new objects for
> bignums, we are no reusing existing allocations for new data?

Nope. Reusing existing allocations was already in place. I just
optimized how Elisp searches if there are any existing allocations that
can be reused.

>> Sure. Feel free to share such benchmark for Elisp vs. CL.
>> I only know the benchmark library for Elisp. No equivalent
>> comparable benchmark for CL.
>
> I'm working on it! This will be very interesting, for sure.
>
> The need for speed - but in a very methodical way ...

Yup. It is always better to give very specific benchmark than bluntly
claiming that Elisp is slow. Because detailed benchmark provides
concrete data on what might be improved (or not; but we can at least
discuss without degrading the discussion quality).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  7:04   ` Ihor Radchenko
@ 2023-08-14  7:35     ` Gerd Möllmann
  2023-08-14  8:09       ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-14  7:35 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: incal, emacs-devel

On 14.08.23 09:04, Ihor Radchenko wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>>> Just for the record, SBCL/CMUCL don't use GMP.
>>
>> Hm, thinking of this - did someone measure how much time is spent in
>> malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL
>> doesn't.
> 
> https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost/
> 
> We can further get rid of the GC by temporarily disabling it (just for
> demonstration):
> 
> (let ((beg (float-time)))
>    (setq gc-cons-threshold most-positive-fixnum)
>    (fib 10000 1000)
>    (message "%.3f s" (- (float-time) beg)) )
> 
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
> 
>      17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>       7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>       6.51%  emacs    emacs                 [.] arith_driver
>       6.03%  emacs    libc.so.6             [.] malloc
>       5.57%  emacs    emacs                 [.] allocate_vectorlike
>       5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
>       4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
>       3.72%  emacs    emacs                 [.] check_number_coerce_marker
>       3.35%  emacs    fib.eln               [.] F666962_fib_0
>       3.29%  emacs    emacs                 [.] allocate_pseudovector
>       2.30%  emacs    emacs                 [.] Flss

Hm, then maybe we can look at the disassembly of then benchmark in SBCL? 
  Not that in the end the compiler is so smart that it optimizes a lot 
of stuff simply away because it can prove that the result of the 
computations cannot possibly be observed?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  7:35     ` Gerd Möllmann
@ 2023-08-14  8:09       ` Ihor Radchenko
  2023-08-14  9:28         ` Gerd Möllmann
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-14  8:09 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Hm, then maybe we can look at the disassembly of then benchmark in SBCL? 
>   Not that in the end the compiler is so smart that it optimizes a lot 
> of stuff simply away because it can prove that the result of the 
> computations cannot possibly be observed?

Sorry, but I do not know how to do it. Not familiar with CL.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  8:09       ` Ihor Radchenko
@ 2023-08-14  9:28         ` Gerd Möllmann
  2023-08-14  9:42           ` Ihor Radchenko
  2023-08-14 16:51           ` Emanuel Berg
  0 siblings, 2 replies; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-14  9:28 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: incal, emacs-devel

On 14.08.23 10:09, Ihor Radchenko wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>> Hm, then maybe we can look at the disassembly of then benchmark in SBCL?
>>    Not that in the end the compiler is so smart that it optimizes a lot
>> of stuff simply away because it can prove that the result of the
>> computations cannot possibly be observed?
> 
> Sorry, but I do not know how to do it. Not familiar with CL.
> 
Ok.  I used the code from https://dataswamp.org/~incal/cl/fib.cl/fib.cl.

And the first thing that stares at me in this function:

(defun fib (reps num)
   (declare (optimize speed (safety 0) (debug 0)))
   (let ((z 0))
     (declare (type (unsigned-byte 53) reps num z))
     (dotimes (r reps)
       (let*((p1 1)
             (p2 1))
         (dotimes (i (- num 2))
           (setf z (+ p1 p2)
                 p2 p1
                 p1 z))))
     z))

is the declaration (unsigned-byte 53).

The declaration means we are lying to the compiler because Z gets bigger 
than 53 bits eventually.  And all bets are off because of the OPTIMIZE 
declaration.  The result is that everything is done in fixnums on 64-bit 
machines.

; disassembly for FIB
; Size: 92 bytes. Origin: #x700530086C                        ; FIB
; 6C:       030080D2         MOVZ NL3, #0
; 70:       040080D2         MOVZ NL4, #0
; 74:       0E000014         B L3
; 78: L0:   410080D2         MOVZ NL1, #2
; 7C:       E20301AA         MOV NL2, NL1
; 80:       EB030CAA         MOV R1, R2
; 84:       651100D1         SUB NL5, R1, #4
; 88:       000080D2         MOVZ NL0, #0
; 8C:       05000014         B L2
; 90: L1:   2300028B         ADD NL3, NL1, NL2
; 94:       E20301AA         MOV NL2, NL1
; 98:       E10303AA         MOV NL1, NL3
; 9C:       00080091         ADD NL0, NL0, #2
; A0: L2:   1F0005EB         CMP NL0, NL5
; A4:       6BFFFF54         BLT L1
; A8:       84080091         ADD NL4, NL4, #2
; AC: L3:   9F000AEB         CMP NL4, R0
; B0:       4BFEFF54         BLT L0
; B4:       EA0303AA         MOV R0, NL3
; B8:       FB031AAA         MOV CSP, CFP
; BC:       5A7B40A9         LDP CFP, LR, [CFP]
; C0:       BF0300F1         CMP NULL, #0
; C4:       C0035FD6         RET

Tada!




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:28         ` Gerd Möllmann
@ 2023-08-14  9:42           ` Ihor Radchenko
  2023-08-15 14:03             ` Emanuel Berg
  2023-08-14 16:51           ` Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-14  9:42 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Ok.  I used the code from https://dataswamp.org/~incal/cl/fib.cl/fib.cl.
> ...
> The declaration means we are lying to the compiler because Z gets bigger 
> than 53 bits eventually.  And all bets are off because of the OPTIMIZE 
> declaration.  The result is that everything is done in fixnums on 64-bit 
> machines.

That explains a lot :)

I now tried

(defun fib (reps num)
  (declare (optimize speed (safety 0) (debug 0)))
  (let ((z 0))
    ;; (declare (type (unsigned-byte 53) reps num z))
    (dotimes (r reps)
      (let*((p1 1)
            (p2 1))
        (dotimes (i (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))
    z))

and got

$ SBCL_HOME=/usr/lib64/sbcl perf record sbcl --load /tmp/fib.cl

;;;  0.263333 s real time
;;;  0.263641 s run time

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

Still ~3x faster compared to Elisp, but not orders of magnitude.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:28         ` Gerd Möllmann
  2023-08-14  9:42           ` Ihor Radchenko
@ 2023-08-14 16:51           ` Emanuel Berg
  2023-08-15  4:58             ` Gerd Möllmann
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
  1 sibling, 2 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-14 16:51 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> Sorry, but I do not know how to do it. Not familiar
>> with CL.
>> 
> Ok. I used the code from
> https://dataswamp.org/~incal/cl/fib.cl/fib.cl

Yikes, how did that happen, some slip involving symbolic
links ...

Here it is: 
  https://dataswamp.org/~incal/cl/bench/fib.cl

And timing is done with this:
  https://dataswamp.org/~incal/cl/bench/timing.cl

Note the ugly absolute path in fib.cl BTW, otherwise you get
the path not of the file but of SBCL or Slime, maybe. I think
one is supposed to use ASDF, but surely there must some easy
way to just load a file using a relative path to the
current file?

(load "~/public_html/cl/bench/timing.cl")

> is the declaration (unsigned-byte 53).
>
> The declaration means we are lying to the compiler because
> Z gets bigger than 53 bits eventually. And all bets are off
> because of the OPTIMIZE declaration. The result is that
> everything is done in fixnums on 64-bit machines.

A very impressive optimization indeed, and expressed in
a cryptic way.

> ; disassembly for FIB
> ; Size: 92 bytes. Origin: #x700530086C                        ; FIB
> ; 6C:       030080D2         MOVZ NL3, #0
> ; 70:       040080D2         MOVZ NL4, #0
> ; 74:       0E000014         B L3
> ; 78: L0:   410080D2         MOVZ NL1, #2
> ; 7C:       E20301AA         MOV NL2, NL1
> ; 80:       EB030CAA         MOV R1, R2
> ; 84:       651100D1         SUB NL5, R1, #4
> ; 88:       000080D2         MOVZ NL0, #0
> ; 8C:       05000014         B L2
> ; 90: L1:   2300028B         ADD NL3, NL1, NL2
> ; 94:       E20301AA         MOV NL2, NL1
> ; 98:       E10303AA         MOV NL1, NL3
> ; 9C:       00080091         ADD NL0, NL0, #2
> ; A0: L2:   1F0005EB         CMP NL0, NL5
> ; A4:       6BFFFF54         BLT L1
> ; A8:       84080091         ADD NL4, NL4, #2
> ; AC: L3:   9F000AEB         CMP NL4, R0
> ; B0:       4BFEFF54         BLT L0
> ; B4:       EA0303AA         MOV R0, NL3
> ; B8:       FB031AAA         MOV CSP, CFP
> ; BC:       5A7B40A9         LDP CFP, LR, [CFP]
> ; C0:       BF0300F1         CMP NULL, #0
> ; C4:       C0035FD6         RET
>
> Tada!

How do you see only fixnums are used?

Are we talking 1 word = 2 bytes = 16 bits here, s2c?

If so, the range of fixnums are -32 768 to 32 767 inclusive,
so those are hardly huge numbers.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14 16:51           ` Emanuel Berg
@ 2023-08-15  4:58             ` Gerd Möllmann
  2023-08-15 14:20               ` Emanuel Berg
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
  1 sibling, 1 reply; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-15  4:58 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

> How do you see only fixnums are used?

By reading the assembly, and remembering a thing or two from my times as 
a CMUCL contributor, e.g. its fixnum representation.  SBCL is a fork of 
CMUCL.

> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
> 
> If so, the range of fixnums are -32 768 to 32 767 inclusive,
> so those are hardly huge numbers.

It's a 64-bit machine, more specifically an M1. i.e. arm64.  How you get 
from there to these 16 bits escapes me.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14 16:51           ` Emanuel Berg
  2023-08-15  4:58             ` Gerd Möllmann
@ 2023-08-15  6:26             ` Po Lu
  2023-08-15 14:33               ` Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Po Lu @ 2023-08-15  6:26 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>
> If so, the range of fixnums are -32 768 to 32 767 inclusive,
> so those are hardly huge numbers.

Under Arm64, general purpose integer registers are 64 bits wide.  That
is also the word size of said machine.

I agree that vague terminology such as ``word'' is confusing, because of
its use by the Unix assembler.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:42           ` Ihor Radchenko
@ 2023-08-15 14:03             ` Emanuel Berg
  2023-08-15 15:01               ` Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> The declaration means we are lying to the compiler because
>> Z gets bigger than 53 bits eventually. And all bets are off
>> because of the OPTIMIZE declaration. The result is that
>> everything is done in fixnums on 64-bit machines.
>
> That explains a lot :)
>
> I now tried
>
> (defun fib (reps num)
>   (declare (optimize speed (safety 0) (debug 0)))
>   (let ((z 0))
>     ;; (declare (type (unsigned-byte 53) reps num z))
>     (dotimes (r reps)
>       (let*((p1 1)
>             (p2 1))
>         (dotimes (i (- num 2))
>           (setf z (+ p1 p2)
>                 p2 p1
>                 p1 z))))
>     z))
>
> and got
>
> $ SBCL_HOME=/usr/lib64/sbcl perf record sbcl --load /tmp/fib.cl
>
> ;;;  0.263333 s real time
> ;;;  0.263641 s run time
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
>
> Still ~3x faster compared to Elisp, but not orders
> of magnitude.

A pretty good optimization! :O

But what kind of optimization is it?

Also, what happens if you remove the OPTIMIZE declaration
as well?

Still, isn't the rule of the "beat the benchmark" game to beat
it as fast as possible?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15  4:58             ` Gerd Möllmann
@ 2023-08-15 14:20               ` Emanuel Berg
  0 siblings, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:20 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> Are we talking 1 word = 2 bytes = 16 bits here, s2c? If so,
>> the range of fixnums are -32 768 to 32 767 inclusive, so
>> those are hardly huge numbers.
>
> It's a 64-bit machine, more specifically an M1. i.e. arm64.
> How you get from there to these 16 bits escapes me.

Here [1] it says a word for ARM is 32 bits.

So to store zero in a fixnum word it will look like this

  00000000 00000000 00000000 00000000

If one bit, the MSB, is used as the sign bit, the interval,
inclusive, is

  (list (* -1 (expt 2 (1- 32)))
        (1-   (expt 2 (1- 32))) )

  (-2147483648 2147483647)

Okay, now we are talking some pretty big number alltho I have
seen even bigger ...

[1] https://modexp.wordpress.com/2018/10/30/arm64-assembly/

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
@ 2023-08-15 14:33               ` Emanuel Berg
  2023-08-15 17:07                 ` tomas
  2023-08-16  1:31                 ` Po Lu
  0 siblings, 2 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:33 UTC (permalink / raw)
  To: emacs-devel

Po Lu wrote:

>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>
>> If so, the range of fixnums are -32 768 to 32 767
>> inclusive, so those are hardly huge numbers.
>
> Under Arm64, general purpose integer registers are 64 bits
> wide. That is also the word size of said machine.

If they are, the range for fixnums is

(list (* -1 (expt 2 (1- 64)))
      (1-   (expt 2 (1- 64))) )

(-9223372036854775808 9223372036854775807)

Only after that it gets slower :P

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 14:03             ` Emanuel Berg
@ 2023-08-15 15:01               ` Ihor Radchenko
  2023-08-15 22:21                 ` Emanuel Berg
  2023-08-15 22:33                 ` Emanuel Berg
  0 siblings, 2 replies; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-15 15:01 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>>     ;; (declare (type (unsigned-byte 53) reps num z))
>> ...
>> Still ~3x faster compared to Elisp, but not orders
>> of magnitude.
>
> A pretty good optimization! :O
>
> But what kind of optimization is it?

The commented "optimization" is: "Hey, SBCL, do not use bignums. If ints
overflow, so be it".

> Also, what happens if you remove the OPTIMIZE declaration
> as well?

No difference.

> Still, isn't the rule of the "beat the benchmark" game to beat
> it as fast as possible?

Yes, but when CBCL is orders of magnitude faster, it indicates something
conceptually wrong in the algo. 3x is a matter of variation in the
internal details (like extra type checking in Elisp that Po Lu outlined).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 14:33               ` Emanuel Berg
@ 2023-08-15 17:07                 ` tomas
  2023-08-15 22:46                   ` Emanuel Berg
  2023-08-16  1:31                 ` Po Lu
  1 sibling, 1 reply; 56+ messages in thread
From: tomas @ 2023-08-15 17:07 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

On Tue, Aug 15, 2023 at 04:33:04PM +0200, Emanuel Berg wrote:
> Po Lu wrote:
> 
> >> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
> >>
> >> If so, the range of fixnums are -32 768 to 32 767
> >> inclusive, so those are hardly huge numbers.
> >
> > Under Arm64, general purpose integer registers are 64 bits
> > wide. That is also the word size of said machine.
> 
> If they are, the range for fixnums is
> 
> (list (* -1 (expt 2 (1- 64)))
>       (1-   (expt 2 (1- 64))) )
> 
> (-9223372036854775808 9223372036854775807)
> 
> Only after that it gets slower :P

Unless SBCL uses tagged object representation. Unless the
compiler can prove that some "thing" is going to be an
int. Unless...

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 15:01               ` Ihor Radchenko
@ 2023-08-15 22:21                 ` Emanuel Berg
  2023-08-15 22:33                 ` Emanuel Berg
  1 sibling, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:21 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> A pretty good optimization! :O
>>
>> But what kind of optimization is it?
>
> The commented "optimization" is: "Hey, SBCL, do not use
> bignums. If ints overflow, so be it".

?

But then how can the algorithm execute correctly?

>> Still, isn't the rule of the "beat the benchmark" game to
>> beat it as fast as possible?
>
> Yes, but when CBCL is orders of magnitude faster, it
> indicates something conceptually wrong in the algo. 3x is
> a matter of variation in the internal details (like extra
> type checking in Elisp that Po Lu outlined).

If you are saying the algorithm doesn't output correct data
for the conventional conception of the Fibonacci algorithm,
then that optimization and whatever time it makes isn't valid,
I'll remove it this instant.

Hm, maybe we need something like unit testing to confirm that
the algorithms perform not just fast, but also as intended to
solve whatever problem they were designed to ...

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 15:01               ` Ihor Radchenko
  2023-08-15 22:21                 ` Emanuel Berg
@ 2023-08-15 22:33                 ` Emanuel Berg
  2023-08-16  4:36                   ` tomas
  1 sibling, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:33 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> Yes, but when CBCL is orders of magnitude faster, it
> indicates something conceptually wrong in the algo.

Indeed, I'll remove it, thanks.

But my CL skills aren't at that level so someone else added
it. A strange optimization indeed, that breaks the code. Hm,
maybe not that unusual when I think about it. But that is for
normal code, not supposed benchmarks ...

So this is the explanation for the +78 875% speed disadvantage
for Elisp! As reported a long time ago when comparing Elisp
and CL. I.e., what is documented in this file

  https://dataswamp.org/~incal/emacs-init/fib.el

and discussed here (or be it gnu.emacs.help) several
months ago.

\o/

Fires up a cigar!

Always a pleasure when a mystery gets solved ... but TBH
I actually believed Elisp was that much slower. Turns out, the
CL implementation wasn't even correct. Bummer, but ultimately
good for us as it turned out.

> 3x is a matter of variation in the internal details (like
> extra type checking in Elisp that Po Lu outlined).

I'll remove the supposed optimization, and we'll take it from
there. We (you) have already improved the bignum object
allocation reuse and Mr. Möllmann solved this issue, so we
have a positive trajectory already. But 78 875% or 3x doesn't
matter in principle, we do it until we are done regardless ...

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 17:07                 ` tomas
@ 2023-08-15 22:46                   ` Emanuel Berg
  0 siblings, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:46 UTC (permalink / raw)
  To: emacs-devel

tomas wrote:

>>>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>>>
>>>> If so, the range of fixnums are -32 768 to 32 767
>>>> inclusive, so those are hardly huge numbers.
>>>
>>> Under Arm64, general purpose integer registers are 64 bits
>>> wide. That is also the word size of said machine.
>> 
>> If they are, the range for fixnums is
>> 
>> (list (* -1 (expt 2 (1- 64)))
>>       (1-   (expt 2 (1- 64))) )
>> 
>> (-9223372036854775808 9223372036854775807)
>> 
>> Only after that it gets slower :P
>
> Unless SBCL uses tagged object representation. Unless the
> compiler can prove that some "thing" is going to be an int.
> Unless...

Actually the rules are quite simple. As long as the expected
execution and correct return value is observed and ultimately
achieved by the algorithm, all optimizations are fair.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 14:33               ` Emanuel Berg
  2023-08-15 17:07                 ` tomas
@ 2023-08-16  1:31                 ` Po Lu
  2023-08-16  1:37                   ` Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Po Lu @ 2023-08-16  1:31 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Po Lu wrote:
>
>>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>>
>>> If so, the range of fixnums are -32 768 to 32 767
>>> inclusive, so those are hardly huge numbers.
>>
>> Under Arm64, general purpose integer registers are 64 bits
>> wide. That is also the word size of said machine.
>
> If they are, the range for fixnums is
>
> (list (* -1 (expt 2 (1- 64)))
>       (1-   (expt 2 (1- 64))) )
>
> (-9223372036854775808 9223372036854775807)
>
> Only after that it gets slower :P

Lisp systems normally set aside several of the high or low bits of a
register as a tag linking a type to the object represented.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  1:31                 ` Po Lu
@ 2023-08-16  1:37                   ` Emanuel Berg
  2023-08-16  3:17                     ` Po Lu
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-16  1:37 UTC (permalink / raw)
  To: emacs-devel

Po Lu wrote:

>>> Under Arm64, general purpose integer registers are 64 bits
>>> wide. That is also the word size of said machine.
>>
>> If they are, the range for fixnums is
>>
>> (list (* -1 (expt 2 (1- 64)))
>>       (1-   (expt 2 (1- 64))) )
>>
>> (-9223372036854775808 9223372036854775807)
>>
>> Only after that it gets slower :P
>
> Lisp systems normally set aside several of the high or low
> bits of a register as a tag linking a type to the
> object represented.

But here we are at the CPU architecture level (register
length), surely Lisp don't meddle with that?

No, I sense that it is, actually. So please explain, then, how
it works. And in particular, how many bits do we (Elisp and
CL) actually have for our fixnums?

Or, ar we talking bignums now?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  1:37                   ` Emanuel Berg
@ 2023-08-16  3:17                     ` Po Lu
  2023-08-16  4:44                       ` tomas
  2023-08-16  5:18                       ` Gerd Möllmann
  0 siblings, 2 replies; 56+ messages in thread
From: Po Lu @ 2023-08-16  3:17 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Po Lu wrote:
>
>>>> Under Arm64, general purpose integer registers are 64 bits
>>>> wide. That is also the word size of said machine.
>>>
>>> If they are, the range for fixnums is
>>>
>>> (list (* -1 (expt 2 (1- 64)))
>>>       (1-   (expt 2 (1- 64))) )
>>>
>>> (-9223372036854775808 9223372036854775807)
>>>
>>> Only after that it gets slower :P
>>
>> Lisp systems normally set aside several of the high or low
>> bits of a register as a tag linking a type to the
>> object represented.
>
> But here we are at the CPU architecture level (register
> length), surely Lisp don't meddle with that?
>
> No, I sense that it is, actually. So please explain, then, how
> it works. And in particular, how many bits do we (Elisp and
> CL) actually have for our fixnums?

I don't know about SBCL, but as for Emacs, refer to the definition of
VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
I have no idea where they've disappeared to.)



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 22:33                 ` Emanuel Berg
@ 2023-08-16  4:36                   ` tomas
  2023-08-16  5:23                     ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: tomas @ 2023-08-16  4:36 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

On Wed, Aug 16, 2023 at 12:33:33AM +0200, Emanuel Berg wrote:
> Ihor Radchenko wrote:
> 
> > Yes, but when CBCL is orders of magnitude faster, it
> > indicates something conceptually wrong in the algo.
> 
> Indeed, I'll remove it, thanks.
> 
> But my CL skills aren't at that level so someone else added
> it. A strange optimization indeed, that breaks the code.

It only breaks the code if you "don't know what you are doing".

See, without the optimization the code will have, at each and
every arithmetic operation, to check "Hmm... Is this thing going
to overflow? Hm. It might, so better use bignums. Phew, it didn't,
so back to fixnums".

Now we know that modern CPU architectures have a hard time with
conditional statements (pipeline stalls, cache mispredictions,
all that nasty stuff). So this "Hmm..." above is costing real
money. Even in cases you won't need it, because things ain't
gonna overflow.

The compiler tries to do a good job of looking into calculations
and deciding "this incf down there won't ever push us over the
fixnum limit, because we know we are starting with a number
below 10".

But the programmer sometimes has more knowledge and can prove
that things won't overflow, ever. Or that, should things overflow,
it won't matter anyway.

It's for those cases that this kind of optimizations are made.

C, by the way, always runs in this mode. Unsigned integers will
silently wrap around, that's documented behaviour. Signed integers
will do whatever their thing is (technically this is called
"unspecified behaviour".

Perhaps you wanted just to compute fib modulo some big power
of two? Then your program was correct, after all...

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  3:17                     ` Po Lu
@ 2023-08-16  4:44                       ` tomas
  2023-08-16  5:18                       ` Gerd Möllmann
  1 sibling, 0 replies; 56+ messages in thread
From: tomas @ 2023-08-16  4:44 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

On Wed, Aug 16, 2023 at 11:17:01AM +0800, Po Lu wrote:
> Emanuel Berg <incal@dataswamp.org> writes:
> 
> > Po Lu wrote:

[...]

> >> Lisp systems normally set aside several of the high or low
> >> bits of a register as a tag linking a type to the
> >> object represented.
> >
> > But here we are at the CPU architecture level (register
> > length), surely Lisp don't meddle with that?
> >
> > No, I sense that it is, actually. So please explain, then, how
> > it works. And in particular, how many bits do we (Elisp and
> > CL) actually have for our fixnums?
> 
> I don't know about SBCL, but as for Emacs, refer to the definition of
> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
> I have no idea where they've disappeared to.)

That is what I was hinting at with "tagged representation": Emacs
Lisp does it, we don't know about SBCL. Typically, a good implementation
has small stretches of code where the values are as-is because the
compiler can prove what their type is (fixnum, whatever). But that
means that fixnums are usually limited to less than the full machine
word's width (e.g. 60 bits if your tag is four bits wide), because
your lisp has to be able to stuff them back into such a place.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  3:17                     ` Po Lu
  2023-08-16  4:44                       ` tomas
@ 2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
                                           ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-16  5:18 UTC (permalink / raw)
  To: luangruo; +Cc: emacs-devel

> I don't know about SBCL, but as for Emacs, refer to the definition of
> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
> I have no idea where they've disappeared to.)

The SBCL I have here, a Homebrew installation, uses the scheme where

....0	-> fixnum
....1   -> other objects (discriminated by additional tag bits)

I had the same for Emacs in the branch gerd_int in the 2000s, if memory 
serves me.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-16  4:36                   ` tomas
@ 2023-08-16  5:23                     ` Emanuel Berg
  0 siblings, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-16  5:23 UTC (permalink / raw)
  To: emacs-devel

tomas wrote:

> Perhaps you wanted just to compute fib modulo some big power
> of two? Then your program was correct, after all...

So, it works for fixnums but not for bignums?

The code must always work for valid indata, if it
doesn't, even for a single such case, the optimization breaks
the algorithm and will be removed.

Maybe we can duplicate it and remove the declaration in one
of them. So we would have one Fibonacci to test the speed of
fixnums, and one for bignums. In the fixnums one, the
declaration would still be legal.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
@ 2023-08-16  5:35                         ` Emanuel Berg
  2023-08-18  7:14                           ` Simon Leinen
  2023-08-16  5:41                         ` Gerd Möllmann
  2023-08-16  6:42                         ` Po Lu
  2 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-16  5:35 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> I don't know about SBCL, but as for Emacs, refer to the
>> definition of VALBITS in lisp.h (maybe also the right files
>> among m/*.h and s/*.h, but I have no idea where they've
>> disappeared to.)
>
> The SBCL I have here, a Homebrew installation, uses the
> scheme where
>
> ....0	-> fixnum
> ....1   -> other objects (discriminated by additional tag bits)
>
> I had the same for Emacs in the branch gerd_int in the
> 2000s, if memory serves me.

Ah, there is `fixnump' (and `bignump') to test for a specific
number,

  (fixnump (expt 2 60)) ; t
  (fixnump (expt 2 61)) ; nil

2^60 = 1152921504606846976, so that's pretty big.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
@ 2023-08-16  5:41                         ` Gerd Möllmann
  2023-08-16  6:42                         ` Po Lu
  2 siblings, 0 replies; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-16  5:41 UTC (permalink / raw)
  To: luangruo; +Cc: emacs-devel

On 16.08.23 07:18, Gerd Möllmann wrote:
>> I don't know about SBCL, but as for Emacs, refer to the definition of
>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>> I have no idea where they've disappeared to.)
> 
> The SBCL I have here, a Homebrew installation, uses the scheme where

Oops, I'm actually not running the SBCL from Homebrew, but my own...
Anyway, the tagging is like this:

@c 64-bit lowtag assignment (wider-fixnums)
@c xyz0 -- Fixnum (where z or yz may also be 0 depending on 
n-fixnum-tag-bits)
@c xx01 -- Other-immediate
@c xx11 -- Pointer
@c 0011 --   Instance-pointer
@c 0111 --   List-pointer
@c 1011 --   Function-pointer
@c 1111 --   Other-pointer

https://github.com/sbcl/sbcl/blob/master/doc/internals/objects-in-memory.texinfo



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
  2023-08-16  5:41                         ` Gerd Möllmann
@ 2023-08-16  6:42                         ` Po Lu
  2023-08-16  8:05                           ` Gerd Möllmann
  2 siblings, 1 reply; 56+ messages in thread
From: Po Lu @ 2023-08-16  6:42 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> I don't know about SBCL, but as for Emacs, refer to the definition of
>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>> I have no idea where they've disappeared to.)
>
> The SBCL I have here, a Homebrew installation, uses the scheme where
>
> ....0	-> fixnum
> ....1   -> other objects (discriminated by additional tag bits)
>
> I had the same for Emacs in the branch gerd_int in the 2000s, if
> memory serves me.

In today's Emacs, 0 is Lisp_Symbol; this facilitates representing Qnil
as C NULL.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  6:42                         ` Po Lu
@ 2023-08-16  8:05                           ` Gerd Möllmann
  0 siblings, 0 replies; 56+ messages in thread
From: Gerd Möllmann @ 2023-08-16  8:05 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

On 16.08.23 08:42, Po Lu wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>>> I don't know about SBCL, but as for Emacs, refer to the definition of
>>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>>> I have no idea where they've disappeared to.)
>>
>> The SBCL I have here, a Homebrew installation, uses the scheme where
>>
>> ....0	-> fixnum
>> ....1   -> other objects (discriminated by additional tag bits)
>>
>> I had the same for Emacs in the branch gerd_int in the 2000s, if
>> memory serves me.
> 
> In today's Emacs, 0 is Lisp_Symbol; this facilitates representing Qnil
> as C NULL.

Yes, and I don't see a need to change anything in this regard in Emacs. 
IMHO, the fixnum range is more than sufficient nowadays, in general.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:35                         ` Emanuel Berg
@ 2023-08-18  7:14                           ` Simon Leinen
  2023-08-19 13:10                             ` Emanuel Berg
  2023-09-04  4:13                             ` Emanuel Berg
  0 siblings, 2 replies; 56+ messages in thread
From: Simon Leinen @ 2023-08-18  7:14 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

Emacs also has `most-negative-fixnum' and `most-positive-fixnum' (borrowed
from Common Lisp but now part of the core).

On my 64-bit system (GNU Emacs 30.0.50 on aarch64-apple-darwin22.5.0):

(= (- (expt 2 61)) most-negative-fixnum) ⇒ t
(= (1- (expt 2 61)) most-positive-fixnum) ⇒ t

(Same as on Emanuel's.)
-- 
Simon.

On Wed, Aug 16, 2023 at 1:12 PM Emanuel Berg <incal@dataswamp.org> wrote:

> Gerd Möllmann wrote:
>
> >> I don't know about SBCL, but as for Emacs, refer to the
> >> definition of VALBITS in lisp.h (maybe also the right files
> >> among m/*.h and s/*.h, but I have no idea where they've
> >> disappeared to.)
> >
> > The SBCL I have here, a Homebrew installation, uses the
> > scheme where
> >
> > ....0 -> fixnum
> > ....1   -> other objects (discriminated by additional tag bits)
> >
> > I had the same for Emacs in the branch gerd_int in the
> > 2000s, if memory serves me.
>
> Ah, there is `fixnump' (and `bignump') to test for a specific
> number,
>
>   (fixnump (expt 2 60)) ; t
>   (fixnump (expt 2 61)) ; nil
>
> 2^60 = 1152921504606846976, so that's pretty big.
>
> --
> underground experts united
> https://dataswamp.org/~incal
>
>
>

[-- Attachment #2: Type: text/html, Size: 1806 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-18  7:14                           ` Simon Leinen
@ 2023-08-19 13:10                             ` Emanuel Berg
  2023-08-20  5:07                               ` Ihor Radchenko
  2023-09-04  4:13                             ` Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-08-19 13:10 UTC (permalink / raw)
  To: emacs-devel

Simon Leinen wrote:

> Emacs also has `most-negative-fixnum' and
> `most-positive-fixnum' (borrowed from Common Lisp but now
> part of the core).
>
> On my 64-bit system (GNU Emacs 30.0.50 on
> aarch64-apple-darwin22.5.0):
>
> (= (- (expt 2 61)) most-negative-fixnum) → t
> (= (1- (expt 2 61)) most-positive-fixnum) → t
>
> (Same as on Emanuel's.)

At least that computation was correct then!

Now that Gerd has solved the little mystery why Fibonacci was
seemingly so much faster on Common Lisp and we also got that
performance gain patch from Ihor I don't know how much sense
it makes to continue translating the emacs-benchmarks to
Common Lisp, or rather if anyone is motivated enough to do it,
but this is how far I got:

  https://dataswamp.org/~incal/cl/bench/

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-19 13:10                             ` Emanuel Berg
@ 2023-08-20  5:07                               ` Ihor Radchenko
  2023-08-20  6:20                                 ` Emanuel Berg
  2023-08-28  5:32                                 ` Emanuel Berg
  0 siblings, 2 replies; 56+ messages in thread
From: Ihor Radchenko @ 2023-08-20  5:07 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Now that Gerd has solved the little mystery why Fibonacci was
> seemingly so much faster on Common Lisp and we also got that
> performance gain patch from Ihor I don't know how much sense
> it makes to continue translating the emacs-benchmarks to
> Common Lisp, or rather if anyone is motivated enough to do it,
> but this is how far I got:
>
>   https://dataswamp.org/~incal/cl/bench/

You can again compare Elisp with CL and let us know what is being
noticeably slower. It will be an indication that something might be
improved.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-20  5:07                               ` Ihor Radchenko
@ 2023-08-20  6:20                                 ` Emanuel Berg
  2023-08-28  5:32                                 ` Emanuel Berg
  1 sibling, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-20  6:20 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> Now that Gerd has solved the little mystery why Fibonacci
>> was seemingly so much faster on Common Lisp and we also got
>> that performance gain patch from Ihor I don't know how much
>> sense it makes to continue translating the emacs-benchmarks
>> to Common Lisp, or rather if anyone is motivated enough to
>> do it, but this is how far I got:
>>
>>   https://dataswamp.org/~incal/cl/bench/
>
> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Thanks, you are right, hopefully it will happen.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-20  5:07                               ` Ihor Radchenko
  2023-08-20  6:20                                 ` Emanuel Berg
@ 2023-08-28  5:32                                 ` Emanuel Berg
  2023-09-03  0:48                                   ` Emanuel Berg
  2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
  1 sibling, 2 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-08-28  5:32 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Here is a new file:

  https://dataswamp.org/~incal/cl/bench/flet.cl

The CL time is 0.84 vs Elisp time at 1.35.

So here, CL is 61% faster!

(format "%d%%" (round (* 100 (1- (/ 1.35 0.84)))))

flet
0.839996 s real time
0.839262 s run time

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-28  5:32                                 ` Emanuel Berg
@ 2023-09-03  0:48                                   ` Emanuel Berg
  2023-09-03  8:50                                     ` Ihor Radchenko
  2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
  1 sibling, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-09-03  0:48 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Her is yet another file,

  https://dataswamp.org/~incal/cl/bench/inclist.cl

The Elisp time for this benchmark is, with `native-comp-speed'
at the default - for the benchmarks package - maximal
optimization level, namely 3 - at 5.86 sec, while the CL is at
1.635993 sec.

So here, CL is 258% faster.

I run the benchmarks from Emacs, and the CL also from Emacs,
with SLIME and SBCL. So that should be pretty fair, but maybe
when all benchmarks are translated one could do the Elisp with
batch and the SBCL not using Emacs at all.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-28  5:32                                 ` Emanuel Berg
  2023-09-03  0:48                                   ` Emanuel Berg
@ 2023-09-03  1:57                                   ` Emanuel Berg
  1 sibling, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-09-03  1:57 UTC (permalink / raw)
  To: emacs-devel

Yet another file,

  https://dataswamp.org/~incal/cl/bench/inclist-type-hints.cl

Here we have type hints, answering my own questions a while
back how to do it!

It is done with `declare', as in this

  (declare (optimize (speed 3) (safety 0)))

for the concerned function, then they are actually put to use
with `the'. ("Type hints enabled"?)

Elisp vs SBCL, in seconds:

  (faster 5.73 0.675997) ; CL is 748% faster

Note that the optimization worked a lot better for SBCL than
in did for Elisp, if we compare with the non-optimized (i.e.
no type hints) file I just posted [1] [it hasn't arrived
yet on this ML as I type this, but should arrive] - anyway

  (faster 5.86     5.73)     ;   2% - Elisp vs `cl-the' Elisp
  (faster 1.635993 0.675997) ; 142% - SBCL  vs `the' SBCL

SBCL become 142% faster with the optimization, and the Elisp -
just 2%.

Another interesting thing is that here, superficially or on
the interface level at least, we have the same type hint
possibilities as in SBCL.

[1] https://dataswamp.org/~incal/cl/bench/inclist.cl

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-09-03  0:48                                   ` Emanuel Berg
@ 2023-09-03  8:50                                     ` Ihor Radchenko
  2023-09-03  9:05                                       ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-09-03  8:50 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Her is yet another file,
>
>   https://dataswamp.org/~incal/cl/bench/inclist.cl

Unfortunately, I cannot use this file because it is referring to
~/public_html/cl/bench/timing.cl, which I do not have.

> The Elisp time for this benchmark is, with `native-comp-speed'
> at the default - for the benchmarks package - maximal
> optimization level, namely 3 - at 5.86 sec, while the CL is at
> 1.635993 sec.

For me, Elisp runs in 1.2 sec.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-09-03  8:50                                     ` Ihor Radchenko
@ 2023-09-03  9:05                                       ` Emanuel Berg
  2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
  0 siblings, 1 reply; 56+ messages in thread
From: Emanuel Berg @ 2023-09-03  9:05 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> Her is yet another file,
>>
>>   https://dataswamp.org/~incal/cl/bench/inclist.cl
>
> Unfortunately, I cannot use this file because it is referring to
> ~/public_html/cl/bench/timing.cl, which I do not have.

It is in the same directory

  https://dataswamp.org/~incal/cl/bench/timing.cl

but how to not have to use an absolute path when loading it
I don't know how to do in CL, maybe it cannot be done without
adsf or some other such solution.

So you have to set that manually as for now.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance)
  2023-09-03  9:05                                       ` Emanuel Berg
@ 2023-09-03 10:30                                         ` Ihor Radchenko
  2023-09-04  1:03                                           ` Emanuel Berg
  0 siblings, 1 reply; 56+ messages in thread
From: Ihor Radchenko @ 2023-09-03 10:30 UTC (permalink / raw)
  To: Emanuel Berg, Andrea Corallo; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> Unfortunately, I cannot use this file because it is referring to
>> ~/public_html/cl/bench/timing.cl, which I do not have.
>
> It is in the same directory
>
>   https://dataswamp.org/~incal/cl/bench/timing.cl

It would be nice if you attached the files to email.
Otherwise, people examining this threads a few years in future may not
be able to access the files.

> So you have to set that manually as for now.

I did, and the results are very different from yours:

$  SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/inclist.cl

;; 1.096667 s real time
;; 1.096235 s run time

$ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/inclist-type-hints.cl 
;; 0.55 s real time
;; 0.549992 s run time

(emacs master)
$ perf record ./src/emacs -batch -l /home/yantar92/.emacs.d/straight/repos/elisp-benchmarks/elisp-benchmarks.el --eval '(setq elb-speed 2)' --eval '(elisp-benchmarks-run "inclist")'

* Results

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | inclist            |           1.20 |       0.00 |       0 |        1.20 |            0.02 |
  | inclist-type-hints |           1.05 |       0.00 |       0 |        1.05 |            0.02 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |           2.26 |       0.00 |       0 |        2.26 |            0.02 |

inclist: 1.1 sec vs. 1.2 sec
inclist-type-hints: 0.55 sec vs. 1.05 sec

with native-comp speed 3, the results are on par for inclist

$ perf record ./src/emacs -batch -l /home/yantar92/.emacs.d/straight/repos/elisp-benchmarks/elisp-benchmarks.el --eval '(setq elb-speed 3)' --eval '(elisp-benchmarks-run "inclist")'

* Results

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | inclist            |           1.07 |       0.00 |       0 |        1.07 |            0.02 |
  | inclist-type-hints |           0.99 |       0.00 |       0 |        0.99 |            0.00 |
  |--------------------+----------------+------------+---------+-------------+-----------------|

inclist: 1.1 sec vs. 1.07 sec
inclist-type-hints: 0.55 sec vs. 0.99 sec

There is nothing obvious that is slower in Elisp - most of the time is
spent in the native compiled functions:

    47.83%  emacs     inclist-b6453dcf-34842bf7.eln                  [.] F656c622d696e636c697374_elb_inclist_0
    44.80%  emacs     inclist-type-hints-bb635d76-535ebfb0.eln       [.] F656c622d696e636c6973742d7468_elb_inclist_th_0
     1.45%  emacs     emacs                                          [.] process_mark_stack

So, might be some missing optimization in native-comp itself.
CCing Andrea, in case if he has some insight about further analysis.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance)
  2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
@ 2023-09-04  1:03                                           ` Emanuel Berg
  0 siblings, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-09-04  1:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> I did, and the results are very different from yours

It is probably because of the Emacs batch and/or the SBCL
non-SLIME style of execution, using commands similar to
yours [last] I get the following results.

Elisp vs SBCL

inclist:            (faster 1.04 1.59) ; CL 35% slower
inclist-type-hints: (faster 1.05 0.69) ; CL 52% faster

Elisp optimization: (faster 1.04 1.05) ; Elisp 1% slower from optimization
CL optimization:    (faster 1.59 0.69) ; CL 130% faster from optimization

We see that, surprisingly, CL is slower for plain inclist.

With type hints tho, CL benefits hugely to beat the Elisp
non-optimized record.

While Elisp doesn't seem to benefit from the optimization
at all.

#! /bin/zsh
#
# this file:
#   https://dataswamp.org/~incal/cl/bench/inc2-cl

sbcl --noinform --load inclist.cl --load inclist-type-hints.cl --quit

#! /bin/zsh
#
# this file:
#   https://dataswamp.org/~incal/cl/bench/inc2-el

emacs                                                            \
    -batch                                                       \
    -l ~/.emacs.d/elpa/elisp-benchmarks-1.14/elisp-benchmarks.el \
    --eval '(setq elb-speed 2)'                                  \
    --eval '(elisp-benchmarks-run "inclist")'

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-18  7:14                           ` Simon Leinen
  2023-08-19 13:10                             ` Emanuel Berg
@ 2023-09-04  4:13                             ` Emanuel Berg
  1 sibling, 0 replies; 56+ messages in thread
From: Emanuel Berg @ 2023-09-04  4:13 UTC (permalink / raw)
  To: emacs-devel

Simon Leinen wrote:

> Emacs also has `most-negative-fixnum' and
> `most-positive-fixnum' (borrowed from Common Lisp but now
> part of the core).
>
> On my 64-bit system (GNU Emacs 30.0.50 on aarch64-apple-darwin22.5.0):
>
> (= (- (expt 2 61)) most-negative-fixnum) → t
> (= (1- (expt 2 61)) most-positive-fixnum) → t
>
> (Same as on Emanuel's.)

(let ((bits 62))
  (and (= (-  (expt 2 (1- bits))) most-negative-fixnum)
       (= (1- (expt 2 (1- bits))) most-positive-fixnum) )) ; t

Sweet B)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-09-04  4:13 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14  6:28 [PATCH] Re: Bignum performance (was: Shrinking the C core) Gerd Möllmann
2023-08-14  6:56 ` Gerd Möllmann
2023-08-14  7:04   ` Ihor Radchenko
2023-08-14  7:35     ` Gerd Möllmann
2023-08-14  8:09       ` Ihor Radchenko
2023-08-14  9:28         ` Gerd Möllmann
2023-08-14  9:42           ` Ihor Radchenko
2023-08-15 14:03             ` Emanuel Berg
2023-08-15 15:01               ` Ihor Radchenko
2023-08-15 22:21                 ` Emanuel Berg
2023-08-15 22:33                 ` Emanuel Berg
2023-08-16  4:36                   ` tomas
2023-08-16  5:23                     ` Emanuel Berg
2023-08-14 16:51           ` Emanuel Berg
2023-08-15  4:58             ` Gerd Möllmann
2023-08-15 14:20               ` Emanuel Berg
2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
2023-08-15 14:33               ` Emanuel Berg
2023-08-15 17:07                 ` tomas
2023-08-15 22:46                   ` Emanuel Berg
2023-08-16  1:31                 ` Po Lu
2023-08-16  1:37                   ` Emanuel Berg
2023-08-16  3:17                     ` Po Lu
2023-08-16  4:44                       ` tomas
2023-08-16  5:18                       ` Gerd Möllmann
2023-08-16  5:35                         ` Emanuel Berg
2023-08-18  7:14                           ` Simon Leinen
2023-08-19 13:10                             ` Emanuel Berg
2023-08-20  5:07                               ` Ihor Radchenko
2023-08-20  6:20                                 ` Emanuel Berg
2023-08-28  5:32                                 ` Emanuel Berg
2023-09-03  0:48                                   ` Emanuel Berg
2023-09-03  8:50                                     ` Ihor Radchenko
2023-09-03  9:05                                       ` Emanuel Berg
2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
2023-09-04  1:03                                           ` Emanuel Berg
2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
2023-09-04  4:13                             ` Emanuel Berg
2023-08-16  5:41                         ` Gerd Möllmann
2023-08-16  6:42                         ` Po Lu
2023-08-16  8:05                           ` Gerd Möllmann
  -- strict thread matches above, loose matches on Subject: below --
2023-08-09  9:46 Shrinking the C core Eric S. Raymond
2023-08-09 12:34 ` Po Lu
2023-08-09 15:51   ` Eric S. Raymond
2023-08-09 23:56     ` Po Lu
2023-08-10  1:19       ` Eric S. Raymond
2023-08-10  7:44         ` Eli Zaretskii
2023-08-10 21:54           ` Emanuel Berg
2023-08-11 10:27             ` Bignum performance (was: Shrinking the C core) Ihor Radchenko
2023-08-11 12:10               ` Emanuel Berg
2023-08-11 12:32                 ` Ihor Radchenko
2023-08-11 12:38                   ` Emanuel Berg
2023-08-11 14:07                     ` [PATCH] " Ihor Radchenko
2023-08-11 18:06                       ` Emanuel Berg
2023-08-11 19:41                         ` Ihor Radchenko
2023-08-11 19:50                           ` Emanuel Berg
2023-08-12  8:24                             ` Ihor Radchenko
2023-08-12 16:03                               ` Emanuel Berg
2023-08-13  9:09                                 ` Ihor Radchenko
2023-08-13  9:49                                   ` Emanuel Berg
2023-08-13 10:21                                     ` Ihor Radchenko
2023-08-14  2:20                                       ` Emanuel Berg
2023-08-14  7:20                                         ` Ihor Radchenko
2023-08-11 22:46                           ` Emanuel Berg
2023-08-12  8:30                             ` Ihor Radchenko
2023-08-12 16:22                               ` Emanuel Berg
2023-08-13  9:12                                 ` Ihor Radchenko

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).