Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
@ 2023-08-14  6:28 Gerd Möllmann
  2023-08-14  6:56 ` Gerd Möllmann
  0 siblings, 1 reply; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-14  6:28 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

>> It is, but it is also tangent to comparison between Elisp
>> and CL. The main (AFAIU) difference between Elisp and CL is
>> in how the bignums are stored. Elisp uses its own internal
>> object type while CL uses GMP's native format.

Just for the record, SBCL/CMUCL don't use GMP.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  6:28 [PATCH] Re: Bignum performance (was: Shrinking the C core) Gerd Möllmann
@ 2023-08-14  6:56 ` Gerd Möllmann
  2023-08-14  7:04   ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-14  6:56 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

> Just for the record, SBCL/CMUCL don't use GMP.

Hm, thinking of this - did someone measure how much time is spent in 
malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL 
doesn't.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  6:56 ` Gerd Möllmann
@ 2023-08-14  7:04   ` Ihor Radchenko
  2023-08-14  7:35     ` Gerd Möllmann
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-14  7:04 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> Just for the record, SBCL/CMUCL don't use GMP.
>
> Hm, thinking of this - did someone measure how much time is spent in 
> malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL 
> doesn't.

https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost/

We can further get rid of the GC by temporarily disabling it (just for
demonstration):

(let ((beg (float-time)))
  (setq gc-cons-threshold most-positive-fixnum)
  (fib 10000 1000)
  (message "%.3f s" (- (float-time) beg)) )

perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

    17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
     7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
     6.51%  emacs    emacs                 [.] arith_driver
     6.03%  emacs    libc.so.6             [.] malloc
     5.57%  emacs    emacs                 [.] allocate_vectorlike
     5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
     4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
     3.72%  emacs    emacs                 [.] check_number_coerce_marker
     3.35%  emacs    fib.eln               [.] F666962_fib_0
     3.29%  emacs    emacs                 [.] allocate_pseudovector
     2.30%  emacs    emacs                 [.] Flss
-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  7:04   ` Ihor Radchenko
@ 2023-08-14  7:35     ` Gerd Möllmann
  2023-08-14  8:09       ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-14  7:35 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: incal, emacs-devel

On 14.08.23 09:04, Ihor Radchenko wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>>> Just for the record, SBCL/CMUCL don't use GMP.
>>
>> Hm, thinking of this - did someone measure how much time is spent in
>> malloc/realloc/free in the benchmarks?  That is what GMP uses, and SBCL
>> doesn't.
> 
> https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost/
> 
> We can further get rid of the GC by temporarily disabling it (just for
> demonstration):
> 
> (let ((beg (float-time)))
>    (setq gc-cons-threshold most-positive-fixnum)
>    (fib 10000 1000)
>    (message "%.3f s" (- (float-time) beg)) )
> 
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
> 
>      17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>       7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>       6.51%  emacs    emacs                 [.] arith_driver
>       6.03%  emacs    libc.so.6             [.] malloc
>       5.57%  emacs    emacs                 [.] allocate_vectorlike
>       5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
>       4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
>       3.72%  emacs    emacs                 [.] check_number_coerce_marker
>       3.35%  emacs    fib.eln               [.] F666962_fib_0
>       3.29%  emacs    emacs                 [.] allocate_pseudovector
>       2.30%  emacs    emacs                 [.] Flss

Hm, then maybe we can look at the disassembly of then benchmark in SBCL? 
  Not that in the end the compiler is so smart that it optimizes a lot 
of stuff simply away because it can prove that the result of the 
computations cannot possibly be observed?



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  7:35     ` Gerd Möllmann
@ 2023-08-14  8:09       ` Ihor Radchenko
  2023-08-14  9:28         ` Gerd Möllmann
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-14  8:09 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Hm, then maybe we can look at the disassembly of then benchmark in SBCL? 
>   Not that in the end the compiler is so smart that it optimizes a lot 
> of stuff simply away because it can prove that the result of the 
> computations cannot possibly be observed?

Sorry, but I do not know how to do it. Not familiar with CL.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  8:09       ` Ihor Radchenko
@ 2023-08-14  9:28         ` Gerd Möllmann
  2023-08-14  9:42           ` Ihor Radchenko
  2023-08-14 16:51           ` Emanuel Berg
  0 siblings, 2 replies; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-14  9:28 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: incal, emacs-devel

On 14.08.23 10:09, Ihor Radchenko wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>> Hm, then maybe we can look at the disassembly of then benchmark in SBCL?
>>    Not that in the end the compiler is so smart that it optimizes a lot
>> of stuff simply away because it can prove that the result of the
>> computations cannot possibly be observed?
> 
> Sorry, but I do not know how to do it. Not familiar with CL.
> 
Ok.  I used the code from https://dataswamp.org/~incal/cl/fib.cl/fib.cl.

And the first thing that stares at me in this function:

(defun fib (reps num)
   (declare (optimize speed (safety 0) (debug 0)))
   (let ((z 0))
     (declare (type (unsigned-byte 53) reps num z))
     (dotimes (r reps)
       (let*((p1 1)
             (p2 1))
         (dotimes (i (- num 2))
           (setf z (+ p1 p2)
                 p2 p1
                 p1 z))))
     z))

is the declaration (unsigned-byte 53).

The declaration means we are lying to the compiler because Z gets bigger 
than 53 bits eventually.  And all bets are off because of the OPTIMIZE 
declaration.  The result is that everything is done in fixnums on 64-bit 
machines.

; disassembly for FIB
; Size: 92 bytes. Origin: #x700530086C                        ; FIB
; 6C:       030080D2         MOVZ NL3, #0
; 70:       040080D2         MOVZ NL4, #0
; 74:       0E000014         B L3
; 78: L0:   410080D2         MOVZ NL1, #2
; 7C:       E20301AA         MOV NL2, NL1
; 80:       EB030CAA         MOV R1, R2
; 84:       651100D1         SUB NL5, R1, #4
; 88:       000080D2         MOVZ NL0, #0
; 8C:       05000014         B L2
; 90: L1:   2300028B         ADD NL3, NL1, NL2
; 94:       E20301AA         MOV NL2, NL1
; 98:       E10303AA         MOV NL1, NL3
; 9C:       00080091         ADD NL0, NL0, #2
; A0: L2:   1F0005EB         CMP NL0, NL5
; A4:       6BFFFF54         BLT L1
; A8:       84080091         ADD NL4, NL4, #2
; AC: L3:   9F000AEB         CMP NL4, R0
; B0:       4BFEFF54         BLT L0
; B4:       EA0303AA         MOV R0, NL3
; B8:       FB031AAA         MOV CSP, CFP
; BC:       5A7B40A9         LDP CFP, LR, [CFP]
; C0:       BF0300F1         CMP NULL, #0
; C4:       C0035FD6         RET

Tada!




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:28         ` Gerd Möllmann
@ 2023-08-14  9:42           ` Ihor Radchenko
  2023-08-15 14:03             ` Emanuel Berg
  2023-08-14 16:51           ` Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-14  9:42 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: incal, emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

> Ok.  I used the code from https://dataswamp.org/~incal/cl/fib.cl/fib.cl.
> ...
> The declaration means we are lying to the compiler because Z gets bigger 
> than 53 bits eventually.  And all bets are off because of the OPTIMIZE 
> declaration.  The result is that everything is done in fixnums on 64-bit 
> machines.

That explains a lot :)

I now tried

(defun fib (reps num)
  (declare (optimize speed (safety 0) (debug 0)))
  (let ((z 0))
    ;; (declare (type (unsigned-byte 53) reps num z))
    (dotimes (r reps)
      (let*((p1 1)
            (p2 1))
        (dotimes (i (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))
    z))

and got

$ SBCL_HOME=/usr/lib64/sbcl perf record sbcl --load /tmp/fib.cl

;;;  0.263333 s real time
;;;  0.263641 s run time

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

Still ~3x faster compared to Elisp, but not orders of magnitude.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:42           ` Ihor Radchenko
@ 2023-08-15 14:03             ` Emanuel Berg
  2023-08-15 15:01               ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> The declaration means we are lying to the compiler because
>> Z gets bigger than 53 bits eventually. And all bets are off
>> because of the OPTIMIZE declaration. The result is that
>> everything is done in fixnums on 64-bit machines.
>
> That explains a lot :)
>
> I now tried
>
> (defun fib (reps num)
>   (declare (optimize speed (safety 0) (debug 0)))
>   (let ((z 0))
>     ;; (declare (type (unsigned-byte 53) reps num z))
>     (dotimes (r reps)
>       (let*((p1 1)
>             (p2 1))
>         (dotimes (i (- num 2))
>           (setf z (+ p1 p2)
>                 p2 p1
>                 p1 z))))
>     z))
>
> and got
>
> $ SBCL_HOME=/usr/lib64/sbcl perf record sbcl --load /tmp/fib.cl
>
> ;;;  0.263333 s real time
> ;;;  0.263641 s run time
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
>
> Still ~3x faster compared to Elisp, but not orders
> of magnitude.

A pretty good optimization! :O

But what kind of optimization is it?

Also, what happens if you remove the OPTIMIZE declaration
as well?

Still, isn't the rule of the "beat the benchmark" game to beat
it as fast as possible?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 14:03             ` Emanuel Berg
@ 2023-08-15 15:01               ` Ihor Radchenko
  2023-08-15 22:21                 ` Emanuel Berg
  2023-08-15 22:33                 ` Emanuel Berg
  0 siblings, 2 replies; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-15 15:01 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>>     ;; (declare (type (unsigned-byte 53) reps num z))
>> ...
>> Still ~3x faster compared to Elisp, but not orders
>> of magnitude.
>
> A pretty good optimization! :O
>
> But what kind of optimization is it?

The commented "optimization" is: "Hey, SBCL, do not use bignums. If ints
overflow, so be it".

> Also, what happens if you remove the OPTIMIZE declaration
> as well?

No difference.

> Still, isn't the rule of the "beat the benchmark" game to beat
> it as fast as possible?

Yes, but when CBCL is orders of magnitude faster, it indicates something
conceptually wrong in the algo. 3x is a matter of variation in the
internal details (like extra type checking in Elisp that Po Lu outlined).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 15:01               ` Ihor Radchenko
@ 2023-08-15 22:21                 ` Emanuel Berg
  2023-08-15 22:33                 ` Emanuel Berg
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:21 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> A pretty good optimization! :O
>>
>> But what kind of optimization is it?
>
> The commented "optimization" is: "Hey, SBCL, do not use
> bignums. If ints overflow, so be it".

?

But then how can the algorithm execute correctly?

>> Still, isn't the rule of the "beat the benchmark" game to
>> beat it as fast as possible?
>
> Yes, but when CBCL is orders of magnitude faster, it
> indicates something conceptually wrong in the algo. 3x is
> a matter of variation in the internal details (like extra
> type checking in Elisp that Po Lu outlined).

If you are saying the algorithm doesn't output correct data
for the conventional conception of the Fibonacci algorithm,
then that optimization and whatever time it makes isn't valid,
I'll remove it this instant.

Hm, maybe we need something like unit testing to confirm that
the algorithms perform not just fast, but also as intended to
solve whatever problem they were designed to ...

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 15:01               ` Ihor Radchenko
  2023-08-15 22:21                 ` Emanuel Berg
@ 2023-08-15 22:33                 ` Emanuel Berg
  2023-08-16  4:36                   ` tomas
  1 sibling, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:33 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> Yes, but when CBCL is orders of magnitude faster, it
> indicates something conceptually wrong in the algo.

Indeed, I'll remove it, thanks.

But my CL skills aren't at that level so someone else added
it. A strange optimization indeed, that breaks the code. Hm,
maybe not that unusual when I think about it. But that is for
normal code, not supposed benchmarks ...

So this is the explanation for the +78 875% speed disadvantage
for Elisp! As reported a long time ago when comparing Elisp
and CL. I.e., what is documented in this file

  https://dataswamp.org/~incal/emacs-init/fib.el

and discussed here (or be it gnu.emacs.help) several
months ago.

\o/

Fires up a cigar!

Always a pleasure when a mystery gets solved ... but TBH
I actually believed Elisp was that much slower. Turns out, the
CL implementation wasn't even correct. Bummer, but ultimately
good for us as it turned out.

> 3x is a matter of variation in the internal details (like
> extra type checking in Elisp that Po Lu outlined).

I'll remove the supposed optimization, and we'll take it from
there. We (you) have already improved the bignum object
allocation reuse and Mr. Möllmann solved this issue, so we
have a positive trajectory already. But 78 875% or 3x doesn't
matter in principle, we do it until we are done regardless ...

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15 22:33                 ` Emanuel Berg
@ 2023-08-16  4:36                   ` tomas
  2023-08-16  5:23                     ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: tomas @ 2023-08-16  4:36 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]

On Wed, Aug 16, 2023 at 12:33:33AM +0200, Emanuel Berg wrote:
> Ihor Radchenko wrote:
> 
> > Yes, but when CBCL is orders of magnitude faster, it
> > indicates something conceptually wrong in the algo.
> 
> Indeed, I'll remove it, thanks.
> 
> But my CL skills aren't at that level so someone else added
> it. A strange optimization indeed, that breaks the code.

It only breaks the code if you "don't know what you are doing".

See, without the optimization the code will have, at each and
every arithmetic operation, to check "Hmm... Is this thing going
to overflow? Hm. It might, so better use bignums. Phew, it didn't,
so back to fixnums".

Now we know that modern CPU architectures have a hard time with
conditional statements (pipeline stalls, cache mispredictions,
all that nasty stuff). So this "Hmm..." above is costing real
money. Even in cases you won't need it, because things ain't
gonna overflow.

The compiler tries to do a good job of looking into calculations
and deciding "this incf down there won't ever push us over the
fixnum limit, because we know we are starting with a number
below 10".

But the programmer sometimes has more knowledge and can prove
that things won't overflow, ever. Or that, should things overflow,
it won't matter anyway.

It's for those cases that this kind of optimizations are made.

C, by the way, always runs in this mode. Unsigned integers will
silently wrap around, that's documented behaviour. Signed integers
will do whatever their thing is (technically this is called
"unspecified behaviour".

Perhaps you wanted just to compute fib modulo some big power
of two? Then your program was correct, after all...

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-16  4:36                   ` tomas
@ 2023-08-16  5:23                     ` Emanuel Berg
  0 siblings, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-16  5:23 UTC (permalink / raw)
  To: emacs-devel

tomas wrote:

> Perhaps you wanted just to compute fib modulo some big power
> of two? Then your program was correct, after all...

So, it works for fixnums but not for bignums?

The code must always work for valid indata, if it
doesn't, even for a single such case, the optimization breaks
the algorithm and will be removed.

Maybe we can duplicate it and remove the declaration in one
of them. So we would have one Fibonacci to test the speed of
fixnums, and one for bignums. In the fixnums one, the
declaration would still be legal.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14  9:28         ` Gerd Möllmann
  2023-08-14  9:42           ` Ihor Radchenko
@ 2023-08-14 16:51           ` Emanuel Berg
  2023-08-15  4:58             ` Gerd Möllmann
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
  1 sibling, 2 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-14 16:51 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> Sorry, but I do not know how to do it. Not familiar
>> with CL.
>> 
> Ok. I used the code from
> https://dataswamp.org/~incal/cl/fib.cl/fib.cl

Yikes, how did that happen, some slip involving symbolic
links ...

Here it is: 
  https://dataswamp.org/~incal/cl/bench/fib.cl

And timing is done with this:
  https://dataswamp.org/~incal/cl/bench/timing.cl

Note the ugly absolute path in fib.cl BTW, otherwise you get
the path not of the file but of SBCL or Slime, maybe. I think
one is supposed to use ASDF, but surely there must some easy
way to just load a file using a relative path to the
current file?

(load "~/public_html/cl/bench/timing.cl")

> is the declaration (unsigned-byte 53).
>
> The declaration means we are lying to the compiler because
> Z gets bigger than 53 bits eventually. And all bets are off
> because of the OPTIMIZE declaration. The result is that
> everything is done in fixnums on 64-bit machines.

A very impressive optimization indeed, and expressed in
a cryptic way.

> ; disassembly for FIB
> ; Size: 92 bytes. Origin: #x700530086C                        ; FIB
> ; 6C:       030080D2         MOVZ NL3, #0
> ; 70:       040080D2         MOVZ NL4, #0
> ; 74:       0E000014         B L3
> ; 78: L0:   410080D2         MOVZ NL1, #2
> ; 7C:       E20301AA         MOV NL2, NL1
> ; 80:       EB030CAA         MOV R1, R2
> ; 84:       651100D1         SUB NL5, R1, #4
> ; 88:       000080D2         MOVZ NL0, #0
> ; 8C:       05000014         B L2
> ; 90: L1:   2300028B         ADD NL3, NL1, NL2
> ; 94:       E20301AA         MOV NL2, NL1
> ; 98:       E10303AA         MOV NL1, NL3
> ; 9C:       00080091         ADD NL0, NL0, #2
> ; A0: L2:   1F0005EB         CMP NL0, NL5
> ; A4:       6BFFFF54         BLT L1
> ; A8:       84080091         ADD NL4, NL4, #2
> ; AC: L3:   9F000AEB         CMP NL4, R0
> ; B0:       4BFEFF54         BLT L0
> ; B4:       EA0303AA         MOV R0, NL3
> ; B8:       FB031AAA         MOV CSP, CFP
> ; BC:       5A7B40A9         LDP CFP, LR, [CFP]
> ; C0:       BF0300F1         CMP NULL, #0
> ; C4:       C0035FD6         RET
>
> Tada!

How do you see only fixnums are used?

Are we talking 1 word = 2 bytes = 16 bits here, s2c?

If so, the range of fixnums are -32 768 to 32 767 inclusive,
so those are hardly huge numbers.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-14 16:51           ` Emanuel Berg
@ 2023-08-15  4:58             ` Gerd Möllmann
  2023-08-15 14:20               ` Emanuel Berg
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
  1 sibling, 1 reply; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-15  4:58 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

> How do you see only fixnums are used?

By reading the assembly, and remembering a thing or two from my times as 
a CMUCL contributor, e.g. its fixnum representation.  SBCL is a fork of 
CMUCL.

> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
> 
> If so, the range of fixnums are -32 768 to 32 767 inclusive,
> so those are hardly huge numbers.

It's a 64-bit machine, more specifically an M1. i.e. arm64.  How you get 
from there to these 16 bits escapes me.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-15  4:58             ` Gerd Möllmann
@ 2023-08-15 14:20               ` Emanuel Berg
  0 siblings, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:20 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> Are we talking 1 word = 2 bytes = 16 bits here, s2c? If so,
>> the range of fixnums are -32 768 to 32 767 inclusive, so
>> those are hardly huge numbers.
>
> It's a 64-bit machine, more specifically an M1. i.e. arm64.
> How you get from there to these 16 bits escapes me.

Here [1] it says a word for ARM is 32 bits.

So to store zero in a fixnum word it will look like this

  00000000 00000000 00000000 00000000

If one bit, the MSB, is used as the sign bit, the interval,
inclusive, is

  (list (* -1 (expt 2 (1- 32)))
        (1-   (expt 2 (1- 32))) )

  (-2147483648 2147483647)

Okay, now we are talking some pretty big number alltho I have
seen even bigger ...

[1] https://modexp.wordpress.com/2018/10/30/arm64-assembly/

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14 16:51           ` Emanuel Berg
  2023-08-15  4:58             ` Gerd Möllmann
@ 2023-08-15  6:26             ` Po Lu
  2023-08-15 14:33               ` Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-15  6:26 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>
> If so, the range of fixnums are -32 768 to 32 767 inclusive,
> so those are hardly huge numbers.

Under Arm64, general purpose integer registers are 64 bits wide.  That
is also the word size of said machine.

I agree that vague terminology such as ``word'' is confusing, because of
its use by the Unix assembler.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
@ 2023-08-15 14:33               ` Emanuel Berg
  2023-08-15 17:07                 ` tomas
  2023-08-16  1:31                 ` Po Lu
  0 siblings, 2 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:33 UTC (permalink / raw)
  To: emacs-devel

Po Lu wrote:

>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>
>> If so, the range of fixnums are -32 768 to 32 767
>> inclusive, so those are hardly huge numbers.
>
> Under Arm64, general purpose integer registers are 64 bits
> wide. That is also the word size of said machine.

If they are, the range for fixnums is

(list (* -1 (expt 2 (1- 64)))
      (1-   (expt 2 (1- 64))) )

(-9223372036854775808 9223372036854775807)

Only after that it gets slower :P

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 14:33               ` Emanuel Berg
@ 2023-08-15 17:07                 ` tomas
  2023-08-15 22:46                   ` Emanuel Berg
  2023-08-16  1:31                 ` Po Lu
  1 sibling, 1 reply; 47+ messages in thread
From: tomas @ 2023-08-15 17:07 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 738 bytes --]

On Tue, Aug 15, 2023 at 04:33:04PM +0200, Emanuel Berg wrote:
> Po Lu wrote:
> 
> >> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
> >>
> >> If so, the range of fixnums are -32 768 to 32 767
> >> inclusive, so those are hardly huge numbers.
> >
> > Under Arm64, general purpose integer registers are 64 bits
> > wide. That is also the word size of said machine.
> 
> If they are, the range for fixnums is
> 
> (list (* -1 (expt 2 (1- 64)))
>       (1-   (expt 2 (1- 64))) )
> 
> (-9223372036854775808 9223372036854775807)
> 
> Only after that it gets slower :P

Unless SBCL uses tagged object representation. Unless the
compiler can prove that some "thing" is going to be an
int. Unless...

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 17:07                 ` tomas
@ 2023-08-15 22:46                   ` Emanuel Berg
  0 siblings, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 22:46 UTC (permalink / raw)
  To: emacs-devel

tomas wrote:

>>>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>>>
>>>> If so, the range of fixnums are -32 768 to 32 767
>>>> inclusive, so those are hardly huge numbers.
>>>
>>> Under Arm64, general purpose integer registers are 64 bits
>>> wide. That is also the word size of said machine.
>> 
>> If they are, the range for fixnums is
>> 
>> (list (* -1 (expt 2 (1- 64)))
>>       (1-   (expt 2 (1- 64))) )
>> 
>> (-9223372036854775808 9223372036854775807)
>> 
>> Only after that it gets slower :P
>
> Unless SBCL uses tagged object representation. Unless the
> compiler can prove that some "thing" is going to be an int.
> Unless...

Actually the rules are quite simple. As long as the expected
execution and correct return value is observed and ultimately
achieved by the algorithm, all optimizations are fair.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-15 14:33               ` Emanuel Berg
  2023-08-15 17:07                 ` tomas
@ 2023-08-16  1:31                 ` Po Lu
  2023-08-16  1:37                   ` Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-16  1:31 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Po Lu wrote:
>
>>> Are we talking 1 word = 2 bytes = 16 bits here, s2c?
>>>
>>> If so, the range of fixnums are -32 768 to 32 767
>>> inclusive, so those are hardly huge numbers.
>>
>> Under Arm64, general purpose integer registers are 64 bits
>> wide. That is also the word size of said machine.
>
> If they are, the range for fixnums is
>
> (list (* -1 (expt 2 (1- 64)))
>       (1-   (expt 2 (1- 64))) )
>
> (-9223372036854775808 9223372036854775807)
>
> Only after that it gets slower :P

Lisp systems normally set aside several of the high or low bits of a
register as a tag linking a type to the object represented.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  1:31                 ` Po Lu
@ 2023-08-16  1:37                   ` Emanuel Berg
  2023-08-16  3:17                     ` Po Lu
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-16  1:37 UTC (permalink / raw)
  To: emacs-devel

Po Lu wrote:

>>> Under Arm64, general purpose integer registers are 64 bits
>>> wide. That is also the word size of said machine.
>>
>> If they are, the range for fixnums is
>>
>> (list (* -1 (expt 2 (1- 64)))
>>       (1-   (expt 2 (1- 64))) )
>>
>> (-9223372036854775808 9223372036854775807)
>>
>> Only after that it gets slower :P
>
> Lisp systems normally set aside several of the high or low
> bits of a register as a tag linking a type to the
> object represented.

But here we are at the CPU architecture level (register
length), surely Lisp don't meddle with that?

No, I sense that it is, actually. So please explain, then, how
it works. And in particular, how many bits do we (Elisp and
CL) actually have for our fixnums?

Or, ar we talking bignums now?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  1:37                   ` Emanuel Berg
@ 2023-08-16  3:17                     ` Po Lu
  2023-08-16  4:44                       ` tomas
  2023-08-16  5:18                       ` Gerd Möllmann
  0 siblings, 2 replies; 47+ messages in thread
From: Po Lu @ 2023-08-16  3:17 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Po Lu wrote:
>
>>>> Under Arm64, general purpose integer registers are 64 bits
>>>> wide. That is also the word size of said machine.
>>>
>>> If they are, the range for fixnums is
>>>
>>> (list (* -1 (expt 2 (1- 64)))
>>>       (1-   (expt 2 (1- 64))) )
>>>
>>> (-9223372036854775808 9223372036854775807)
>>>
>>> Only after that it gets slower :P
>>
>> Lisp systems normally set aside several of the high or low
>> bits of a register as a tag linking a type to the
>> object represented.
>
> But here we are at the CPU architecture level (register
> length), surely Lisp don't meddle with that?
>
> No, I sense that it is, actually. So please explain, then, how
> it works. And in particular, how many bits do we (Elisp and
> CL) actually have for our fixnums?

I don't know about SBCL, but as for Emacs, refer to the definition of
VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
I have no idea where they've disappeared to.)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  3:17                     ` Po Lu
@ 2023-08-16  4:44                       ` tomas
  2023-08-16  5:18                       ` Gerd Möllmann
  1 sibling, 0 replies; 47+ messages in thread
From: tomas @ 2023-08-16  4:44 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

On Wed, Aug 16, 2023 at 11:17:01AM +0800, Po Lu wrote:
> Emanuel Berg <incal@dataswamp.org> writes:
> 
> > Po Lu wrote:

[...]

> >> Lisp systems normally set aside several of the high or low
> >> bits of a register as a tag linking a type to the
> >> object represented.
> >
> > But here we are at the CPU architecture level (register
> > length), surely Lisp don't meddle with that?
> >
> > No, I sense that it is, actually. So please explain, then, how
> > it works. And in particular, how many bits do we (Elisp and
> > CL) actually have for our fixnums?
> 
> I don't know about SBCL, but as for Emacs, refer to the definition of
> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
> I have no idea where they've disappeared to.)

That is what I was hinting at with "tagged representation": Emacs
Lisp does it, we don't know about SBCL. Typically, a good implementation
has small stretches of code where the values are as-is because the
compiler can prove what their type is (fixnum, whatever). But that
means that fixnums are usually limited to less than the full machine
word's width (e.g. 60 bits if your tag is four bits wide), because
your lisp has to be able to stuff them back into such a place.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  3:17                     ` Po Lu
  2023-08-16  4:44                       ` tomas
@ 2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
                                           ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-16  5:18 UTC (permalink / raw)
  To: luangruo; +Cc: emacs-devel

> I don't know about SBCL, but as for Emacs, refer to the definition of
> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
> I have no idea where they've disappeared to.)

The SBCL I have here, a Homebrew installation, uses the scheme where

....0	-> fixnum
....1   -> other objects (discriminated by additional tag bits)

I had the same for Emacs in the branch gerd_int in the 2000s, if memory 
serves me.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
@ 2023-08-16  5:35                         ` Emanuel Berg
  2023-08-18  7:14                           ` Simon Leinen
  2023-08-16  5:41                         ` Gerd Möllmann
  2023-08-16  6:42                         ` Po Lu
  2 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-16  5:35 UTC (permalink / raw)
  To: emacs-devel

Gerd Möllmann wrote:

>> I don't know about SBCL, but as for Emacs, refer to the
>> definition of VALBITS in lisp.h (maybe also the right files
>> among m/*.h and s/*.h, but I have no idea where they've
>> disappeared to.)
>
> The SBCL I have here, a Homebrew installation, uses the
> scheme where
>
> ....0	-> fixnum
> ....1   -> other objects (discriminated by additional tag bits)
>
> I had the same for Emacs in the branch gerd_int in the
> 2000s, if memory serves me.

Ah, there is `fixnump' (and `bignump') to test for a specific
number,

  (fixnump (expt 2 60)) ; t
  (fixnump (expt 2 61)) ; nil

2^60 = 1152921504606846976, so that's pretty big.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:35                         ` Emanuel Berg
@ 2023-08-18  7:14                           ` Simon Leinen
  2023-08-19 13:10                             ` Emanuel Berg
  2023-09-04  4:13                             ` Emanuel Berg
  0 siblings, 2 replies; 47+ messages in thread
From: Simon Leinen @ 2023-08-18  7:14 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

Emacs also has `most-negative-fixnum' and `most-positive-fixnum' (borrowed
from Common Lisp but now part of the core).

On my 64-bit system (GNU Emacs 30.0.50 on aarch64-apple-darwin22.5.0):

(= (- (expt 2 61)) most-negative-fixnum) ⇒ t
(= (1- (expt 2 61)) most-positive-fixnum) ⇒ t

(Same as on Emanuel's.)
-- 
Simon.

On Wed, Aug 16, 2023 at 1:12 PM Emanuel Berg <incal@dataswamp.org> wrote:

> Gerd Möllmann wrote:
>
> >> I don't know about SBCL, but as for Emacs, refer to the
> >> definition of VALBITS in lisp.h (maybe also the right files
> >> among m/*.h and s/*.h, but I have no idea where they've
> >> disappeared to.)
> >
> > The SBCL I have here, a Homebrew installation, uses the
> > scheme where
> >
> > ....0 -> fixnum
> > ....1   -> other objects (discriminated by additional tag bits)
> >
> > I had the same for Emacs in the branch gerd_int in the
> > 2000s, if memory serves me.
>
> Ah, there is `fixnump' (and `bignump') to test for a specific
> number,
>
>   (fixnump (expt 2 60)) ; t
>   (fixnump (expt 2 61)) ; nil
>
> 2^60 = 1152921504606846976, so that's pretty big.
>
> --
> underground experts united
> https://dataswamp.org/~incal
>
>
>

[-- Attachment #2: Type: text/html, Size: 1806 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-18  7:14                           ` Simon Leinen
@ 2023-08-19 13:10                             ` Emanuel Berg
  2023-08-20  5:07                               ` Ihor Radchenko
  2023-09-04  4:13                             ` Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-19 13:10 UTC (permalink / raw)
  To: emacs-devel

Simon Leinen wrote:

> Emacs also has `most-negative-fixnum' and
> `most-positive-fixnum' (borrowed from Common Lisp but now
> part of the core).
>
> On my 64-bit system (GNU Emacs 30.0.50 on
> aarch64-apple-darwin22.5.0):
>
> (= (- (expt 2 61)) most-negative-fixnum) → t
> (= (1- (expt 2 61)) most-positive-fixnum) → t
>
> (Same as on Emanuel's.)

At least that computation was correct then!

Now that Gerd has solved the little mystery why Fibonacci was
seemingly so much faster on Common Lisp and we also got that
performance gain patch from Ihor I don't know how much sense
it makes to continue translating the emacs-benchmarks to
Common Lisp, or rather if anyone is motivated enough to do it,
but this is how far I got:

  https://dataswamp.org/~incal/cl/bench/

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-19 13:10                             ` Emanuel Berg
@ 2023-08-20  5:07                               ` Ihor Radchenko
  2023-08-20  6:20                                 ` Emanuel Berg
  2023-08-28  5:32                                 ` Emanuel Berg
  0 siblings, 2 replies; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-20  5:07 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Now that Gerd has solved the little mystery why Fibonacci was
> seemingly so much faster on Common Lisp and we also got that
> performance gain patch from Ihor I don't know how much sense
> it makes to continue translating the emacs-benchmarks to
> Common Lisp, or rather if anyone is motivated enough to do it,
> but this is how far I got:
>
>   https://dataswamp.org/~incal/cl/bench/

You can again compare Elisp with CL and let us know what is being
noticeably slower. It will be an indication that something might be
improved.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-20  5:07                               ` Ihor Radchenko
@ 2023-08-20  6:20                                 ` Emanuel Berg
  2023-08-28  5:32                                 ` Emanuel Berg
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-20  6:20 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> Now that Gerd has solved the little mystery why Fibonacci
>> was seemingly so much faster on Common Lisp and we also got
>> that performance gain patch from Ihor I don't know how much
>> sense it makes to continue translating the emacs-benchmarks
>> to Common Lisp, or rather if anyone is motivated enough to
>> do it, but this is how far I got:
>>
>>   https://dataswamp.org/~incal/cl/bench/
>
> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Thanks, you are right, hopefully it will happen.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-20  5:07                               ` Ihor Radchenko
  2023-08-20  6:20                                 ` Emanuel Berg
@ 2023-08-28  5:32                                 ` Emanuel Berg
  2023-09-03  0:48                                   ` Emanuel Berg
  2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
  1 sibling, 2 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-28  5:32 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Here is a new file:

  https://dataswamp.org/~incal/cl/bench/flet.cl

The CL time is 0.84 vs Elisp time at 1.35.

So here, CL is 61% faster!

(format "%d%%" (round (* 100 (1- (/ 1.35 0.84)))))

flet
0.839996 s real time
0.839262 s run time

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-28  5:32                                 ` Emanuel Berg
@ 2023-09-03  0:48                                   ` Emanuel Berg
  2023-09-03  8:50                                     ` Ihor Radchenko
  2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-09-03  0:48 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> You can again compare Elisp with CL and let us know what is
> being noticeably slower. It will be an indication that
> something might be improved.

Her is yet another file,

  https://dataswamp.org/~incal/cl/bench/inclist.cl

The Elisp time for this benchmark is, with `native-comp-speed'
at the default - for the benchmarks package - maximal
optimization level, namely 3 - at 5.86 sec, while the CL is at
1.635993 sec.

So here, CL is 258% faster.

I run the benchmarks from Emacs, and the CL also from Emacs,
with SLIME and SBCL. So that should be pretty fair, but maybe
when all benchmarks are translated one could do the Elisp with
batch and the SBCL not using Emacs at all.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-09-03  0:48                                   ` Emanuel Berg
@ 2023-09-03  8:50                                     ` Ihor Radchenko
  2023-09-03  9:05                                       ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-09-03  8:50 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Her is yet another file,
>
>   https://dataswamp.org/~incal/cl/bench/inclist.cl

Unfortunately, I cannot use this file because it is referring to
~/public_html/cl/bench/timing.cl, which I do not have.

> The Elisp time for this benchmark is, with `native-comp-speed'
> at the default - for the benchmarks package - maximal
> optimization level, namely 3 - at 5.86 sec, while the CL is at
> 1.635993 sec.

For me, Elisp runs in 1.2 sec.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-09-03  8:50                                     ` Ihor Radchenko
@ 2023-09-03  9:05                                       ` Emanuel Berg
  2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-09-03  9:05 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> Her is yet another file,
>>
>>   https://dataswamp.org/~incal/cl/bench/inclist.cl
>
> Unfortunately, I cannot use this file because it is referring to
> ~/public_html/cl/bench/timing.cl, which I do not have.

It is in the same directory

  https://dataswamp.org/~incal/cl/bench/timing.cl

but how to not have to use an absolute path when loading it
I don't know how to do in CL, maybe it cannot be done without
adsf or some other such solution.

So you have to set that manually as for now.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance)
  2023-09-03  9:05                                       ` Emanuel Berg
@ 2023-09-03 10:30                                         ` Ihor Radchenko
  2023-09-04  1:03                                           ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-09-03 10:30 UTC (permalink / raw)
  To: Emanuel Berg, Andrea Corallo; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> Unfortunately, I cannot use this file because it is referring to
>> ~/public_html/cl/bench/timing.cl, which I do not have.
>
> It is in the same directory
>
>   https://dataswamp.org/~incal/cl/bench/timing.cl

It would be nice if you attached the files to email.
Otherwise, people examining this threads a few years in future may not
be able to access the files.

> So you have to set that manually as for now.

I did, and the results are very different from yours:

$  SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/inclist.cl

;; 1.096667 s real time
;; 1.096235 s run time

$ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/inclist-type-hints.cl 
;; 0.55 s real time
;; 0.549992 s run time

(emacs master)
$ perf record ./src/emacs -batch -l /home/yantar92/.emacs.d/straight/repos/elisp-benchmarks/elisp-benchmarks.el --eval '(setq elb-speed 2)' --eval '(elisp-benchmarks-run "inclist")'

* Results

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | inclist            |           1.20 |       0.00 |       0 |        1.20 |            0.02 |
  | inclist-type-hints |           1.05 |       0.00 |       0 |        1.05 |            0.02 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |           2.26 |       0.00 |       0 |        2.26 |            0.02 |

inclist: 1.1 sec vs. 1.2 sec
inclist-type-hints: 0.55 sec vs. 1.05 sec

with native-comp speed 3, the results are on par for inclist

$ perf record ./src/emacs -batch -l /home/yantar92/.emacs.d/straight/repos/elisp-benchmarks/elisp-benchmarks.el --eval '(setq elb-speed 3)' --eval '(elisp-benchmarks-run "inclist")'

* Results

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | inclist            |           1.07 |       0.00 |       0 |        1.07 |            0.02 |
  | inclist-type-hints |           0.99 |       0.00 |       0 |        0.99 |            0.00 |
  |--------------------+----------------+------------+---------+-------------+-----------------|

inclist: 1.1 sec vs. 1.07 sec
inclist-type-hints: 0.55 sec vs. 0.99 sec

There is nothing obvious that is slower in Elisp - most of the time is
spent in the native compiled functions:

    47.83%  emacs     inclist-b6453dcf-34842bf7.eln                  [.] F656c622d696e636c697374_elb_inclist_0
    44.80%  emacs     inclist-type-hints-bb635d76-535ebfb0.eln       [.] F656c622d696e636c6973742d7468_elb_inclist_th_0
     1.45%  emacs     emacs                                          [.] process_mark_stack

So, might be some missing optimization in native-comp itself.
CCing Andrea, in case if he has some insight about further analysis.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance)
  2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
@ 2023-09-04  1:03                                           ` Emanuel Berg
  0 siblings, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-09-04  1:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> I did, and the results are very different from yours

It is probably because of the Emacs batch and/or the SBCL
non-SLIME style of execution, using commands similar to
yours [last] I get the following results.

Elisp vs SBCL

inclist:            (faster 1.04 1.59) ; CL 35% slower
inclist-type-hints: (faster 1.05 0.69) ; CL 52% faster

Elisp optimization: (faster 1.04 1.05) ; Elisp 1% slower from optimization
CL optimization:    (faster 1.59 0.69) ; CL 130% faster from optimization

We see that, surprisingly, CL is slower for plain inclist.

With type hints tho, CL benefits hugely to beat the Elisp
non-optimized record.

While Elisp doesn't seem to benefit from the optimization
at all.

#! /bin/zsh
#
# this file:
#   https://dataswamp.org/~incal/cl/bench/inc2-cl

sbcl --noinform --load inclist.cl --load inclist-type-hints.cl --quit

#! /bin/zsh
#
# this file:
#   https://dataswamp.org/~incal/cl/bench/inc2-el

emacs                                                            \
    -batch                                                       \
    -l ~/.emacs.d/elpa/elisp-benchmarks-1.14/elisp-benchmarks.el \
    --eval '(setq elb-speed 2)'                                  \
    --eval '(elisp-benchmarks-run "inclist")'

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-28  5:32                                 ` Emanuel Berg
  2023-09-03  0:48                                   ` Emanuel Berg
@ 2023-09-03  1:57                                   ` Emanuel Berg
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-09-03  1:57 UTC (permalink / raw)
  To: emacs-devel

Yet another file,

  https://dataswamp.org/~incal/cl/bench/inclist-type-hints.cl

Here we have type hints, answering my own questions a while
back how to do it!

It is done with `declare', as in this

  (declare (optimize (speed 3) (safety 0)))

for the concerned function, then they are actually put to use
with `the'. ("Type hints enabled"?)

Elisp vs SBCL, in seconds:

  (faster 5.73 0.675997) ; CL is 748% faster

Note that the optimization worked a lot better for SBCL than
in did for Elisp, if we compare with the non-optimized (i.e.
no type hints) file I just posted [1] [it hasn't arrived
yet on this ML as I type this, but should arrive] - anyway

  (faster 5.86     5.73)     ;   2% - Elisp vs `cl-the' Elisp
  (faster 1.635993 0.675997) ; 142% - SBCL  vs `the' SBCL

SBCL become 142% faster with the optimization, and the Elisp -
just 2%.

Another interesting thing is that here, superficially or on
the interface level at least, we have the same type hint
possibilities as in SBCL.

[1] https://dataswamp.org/~incal/cl/bench/inclist.cl

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-18  7:14                           ` Simon Leinen
  2023-08-19 13:10                             ` Emanuel Berg
@ 2023-09-04  4:13                             ` Emanuel Berg
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-09-04  4:13 UTC (permalink / raw)
  To: emacs-devel

Simon Leinen wrote:

> Emacs also has `most-negative-fixnum' and
> `most-positive-fixnum' (borrowed from Common Lisp but now
> part of the core).
>
> On my 64-bit system (GNU Emacs 30.0.50 on aarch64-apple-darwin22.5.0):
>
> (= (- (expt 2 61)) most-negative-fixnum) → t
> (= (1- (expt 2 61)) most-positive-fixnum) → t
>
> (Same as on Emanuel's.)

(let ((bits 62))
  (and (= (-  (expt 2 (1- bits))) most-negative-fixnum)
       (= (1- (expt 2 (1- bits))) most-positive-fixnum) )) ; t

Sweet B)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
@ 2023-08-16  5:41                         ` Gerd Möllmann
  2023-08-16  6:42                         ` Po Lu
  2 siblings, 0 replies; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-16  5:41 UTC (permalink / raw)
  To: luangruo; +Cc: emacs-devel

On 16.08.23 07:18, Gerd Möllmann wrote:
>> I don't know about SBCL, but as for Emacs, refer to the definition of
>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>> I have no idea where they've disappeared to.)
> 
> The SBCL I have here, a Homebrew installation, uses the scheme where

Oops, I'm actually not running the SBCL from Homebrew, but my own...
Anyway, the tagging is like this:

@c 64-bit lowtag assignment (wider-fixnums)
@c xyz0 -- Fixnum (where z or yz may also be 0 depending on 
n-fixnum-tag-bits)
@c xx01 -- Other-immediate
@c xx11 -- Pointer
@c 0011 --   Instance-pointer
@c 0111 --   List-pointer
@c 1011 --   Function-pointer
@c 1111 --   Other-pointer

https://github.com/sbcl/sbcl/blob/master/doc/internals/objects-in-memory.texinfo



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  5:18                       ` Gerd Möllmann
  2023-08-16  5:35                         ` Emanuel Berg
  2023-08-16  5:41                         ` Gerd Möllmann
@ 2023-08-16  6:42                         ` Po Lu
  2023-08-16  8:05                           ` Gerd Möllmann
  2 siblings, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-16  6:42 UTC (permalink / raw)
  To: Gerd Möllmann; +Cc: emacs-devel

Gerd Möllmann <gerd.moellmann@gmail.com> writes:

>> I don't know about SBCL, but as for Emacs, refer to the definition of
>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>> I have no idea where they've disappeared to.)
>
> The SBCL I have here, a Homebrew installation, uses the scheme where
>
> ....0	-> fixnum
> ....1   -> other objects (discriminated by additional tag bits)
>
> I had the same for Emacs in the branch gerd_int in the 2000s, if
> memory serves me.

In today's Emacs, 0 is Lisp_Symbol; this facilitates representing Qnil
as C NULL.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-16  6:42                         ` Po Lu
@ 2023-08-16  8:05                           ` Gerd Möllmann
  0 siblings, 0 replies; 47+ messages in thread
From: Gerd Möllmann @ 2023-08-16  8:05 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

On 16.08.23 08:42, Po Lu wrote:
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
>>> I don't know about SBCL, but as for Emacs, refer to the definition of
>>> VALBITS in lisp.h (maybe also the right files among m/*.h and s/*.h, but
>>> I have no idea where they've disappeared to.)
>>
>> The SBCL I have here, a Homebrew installation, uses the scheme where
>>
>> ....0	-> fixnum
>> ....1   -> other objects (discriminated by additional tag bits)
>>
>> I had the same for Emacs in the branch gerd_int in the 2000s, if
>> memory serves me.
> 
> In today's Emacs, 0 is Lisp_Symbol; this facilitates representing Qnil
> as C NULL.

Yes, and I don't see a need to change anything in this regard in Emacs. 
IMHO, the fixnum range is more than sufficient nowadays, in general.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Shrinking the C core
@ 2023-08-09  9:46 Eric S. Raymond
  2023-08-09 12:34 ` Po Lu
  0 siblings, 1 reply; 47+ messages in thread
From: Eric S. Raymond @ 2023-08-09  9:46 UTC (permalink / raw)
  To: emacs-devel

Recently I have been refamiliarizing myself with the Emacs C core.

Some days ago, as a test that I understand the C core API and the
current build recipe, I made and pushed a small commit that moved
the policy code in delete-file out to Lisp, basing it on a smaller
and simpler new entry point named delete-file-internal (this is
parallel to the way delete-directory is already partitioned).

I've since been poking around the C core code and am now wondering why
there is so much C-core code that seems like it could be pushed out to
Lisp.  For example, in src/fileio.c:

DEFUN ("unhandled-file-name-directory", Funhandled_file_name_directory,
       Sunhandled_file_name_directory, 1, 1, 0,
       doc: /* Return a directly usable directory name somehow associated with FILENAME.
A `directly usable' directory name is one that may be used without the
intervention of any file name handler.
If FILENAME is a directly usable file itself, return
\(file-name-as-directory FILENAME).
If FILENAME refers to a file which is not accessible from a local process,
then this should return nil.
The `call-process' and `start-process' functions use this function to
get a current directory to run processes in.  */)
  (Lisp_Object filename)
{
  Lisp_Object handler;

  /* If the file name has special constructs in it,
     call the corresponding file name handler.  */
  handler = Ffind_file_name_handler (filename, Qunhandled_file_name_directory);
  if (!NILP (handler))
    {
      Lisp_Object handled_name = call2 (handler, Qunhandled_file_name_directory,
					filename);
      return STRINGP (handled_name) ? handled_name : Qnil;
    }

  return Ffile_name_as_directory (filename);
}

Why is this in C? Is there any reason not to push it out to Lisp and
reduce the core complexity?  More generally, if I do this kind of
refactor, will I be stepping on anyone's toes?
--
						>>esr>>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-09  9:46 Shrinking the C core Eric S. Raymond
@ 2023-08-09 12:34 ` Po Lu
  2023-08-09 15:51   ` Eric S. Raymond
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-09 12:34 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

"Eric S. Raymond" <esr@thyrsus.com> writes:

> Recently I have been refamiliarizing myself with the Emacs C core.
>
> Some days ago, as a test that I understand the C core API and the
> current build recipe, I made and pushed a small commit that moved
> the policy code in delete-file out to Lisp, basing it on a smaller
> and simpler new entry point named delete-file-internal (this is
> parallel to the way delete-directory is already partitioned).
>
> I've since been poking around the C core code and am now wondering why
> there is so much C-core code that seems like it could be pushed out to
> Lisp.  For example, in src/fileio.c:
>
> DEFUN ("unhandled-file-name-directory", Funhandled_file_name_directory,
>        Sunhandled_file_name_directory, 1, 1, 0,
>        doc: /* Return a directly usable directory name somehow associated with FILENAME.
> A `directly usable' directory name is one that may be used without the
> intervention of any file name handler.
> If FILENAME is a directly usable file itself, return
> \(file-name-as-directory FILENAME).
> If FILENAME refers to a file which is not accessible from a local process,
> then this should return nil.
> The `call-process' and `start-process' functions use this function to
> get a current directory to run processes in.  */)
>   (Lisp_Object filename)
> {
>   Lisp_Object handler;
>
>   /* If the file name has special constructs in it,
>      call the corresponding file name handler.  */
>   handler = Ffind_file_name_handler (filename, Qunhandled_file_name_directory);
>   if (!NILP (handler))
>     {
>       Lisp_Object handled_name = call2 (handler, Qunhandled_file_name_directory,
> 					filename);
>       return STRINGP (handled_name) ? handled_name : Qnil;
>     }
>
>   return Ffile_name_as_directory (filename);
> }
>
> Why is this in C? Is there any reason not to push it out to Lisp and
> reduce the core complexity?

There is a plenitude of such reasons.  Whenever some code is moved to
Lisp, its structure and history is lost.  Often, comments within the
extracted C code remain, but the code itself is left ajar.  Bootstrap
problems are frequently introduced, as well as latent bugs.  And Emacs
becomes ever so much slower.

These are not simply theoretical concerns, but problems I've encountered
many times in practice.  Compounding that, fileio.c is abundant with
complex logic amended and iteratively refined over many years, which is
dangerously prone to loss or mistranscription during a refactor or a
rewrite.

Finally, this specific case is because we don't want to provide Lisp
with an easy means to bypass file name handlers.  All primitives
operating on file names should thus consult file name handlers, enabling
packages like TRAMP to continue operating correctly.

> More generally, if I do this kind of refactor, will I be stepping on
> anyone's toes?

Probably.  I think a better idea for a first project is this item in
etc/TODO:

** A better display of the bar cursor
Distribute a bar cursor of width > 1 evenly between the two glyphs on
each side of the bar (what to do at the edges?).

which has been on my plate for a while.  I would be grateful to anyone
who decides to preempt me.

Thanks in advance!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-09 12:34 ` Po Lu
@ 2023-08-09 15:51   ` Eric S. Raymond
  2023-08-09 23:56     ` Po Lu
  0 siblings, 1 reply; 47+ messages in thread
From: Eric S. Raymond @ 2023-08-09 15:51 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com>:
> There is a plenitude of such reasons.  Whenever some code is moved to
> Lisp, its structure and history is lost.  Often, comments within the
> extracted C code remain, but the code itself is left ajar.  Bootstrap
> problems are frequently introduced, as well as latent bugs.  And Emacs
> becomes ever so much slower.

When I first worked on Emacs code in the 1980s Lisp was already fast
enough, and machine speeds have gone up by something like 10^3 since.
I plain don't believe the "slower" part can be an issue on modern
hardware, not even on tiny SBCs. 

> Finally, this specific case is because we don't want to provide Lisp
> with an easy means to bypass file name handlers.  All primitives
> operating on file names should thus consult file name handlers, enabling
> packages like TRAMP to continue operating correctly.

If calling the file-name handlers through Lisp is a crash landing,
you were already out of luck.  Go have a look at delete-directory.

> Probably.  I think a better idea for a first project is this item in
> etc/TODO:

This would ... not be my first project. :-)
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-09 15:51   ` Eric S. Raymond
@ 2023-08-09 23:56     ` Po Lu
  2023-08-10  1:19       ` Eric S. Raymond
  0 siblings, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-09 23:56 UTC (permalink / raw)
  To: Eric S. Raymond; +Cc: emacs-devel

"Eric S. Raymond" <esr@thyrsus.com> writes:

> When I first worked on Emacs code in the 1980s Lisp was already fast
> enough, and machine speeds have gone up by something like 10^3 since.
> I plain don't believe the "slower" part can be an issue on modern
> hardware, not even on tiny SBCs.

Can you promise the same, if your changes are not restricted to one or
two functions in fileio.c, but instead pervade throughout C source?

Finally, you haven't addressed the remainder of the reasons I itemized.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-09 23:56     ` Po Lu
@ 2023-08-10  1:19       ` Eric S. Raymond
  2023-08-10  7:44         ` Eli Zaretskii
  0 siblings, 1 reply; 47+ messages in thread
From: Eric S. Raymond @ 2023-08-10  1:19 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com>:
> "Eric S. Raymond" <esr@thyrsus.com> writes:
> 
> > When I first worked on Emacs code in the 1980s Lisp was already fast
> > enough, and machine speeds have gone up by something like 10^3 since.
> > I plain don't believe the "slower" part can be an issue on modern
> > hardware, not even on tiny SBCs.
> 
> Can you promise the same, if your changes are not restricted to one or
> two functions in fileio.c, but instead pervade throughout C source?

Yes, in fact, I can. Because if by some miracle we were able to
instantly rewrite the entirety of Emacs in Python (which I'm not
advocating, I chose it because it's the slowest of the major modern
scripting languages) basic considerations of clocks per second would
predict it to run a *dead minimum* of two orders of magnitude faster
than the Emacs of, say, 1990.

And 1990 Emacs was already way fast enough for the human eye and
brain, which can't even register interface lag of less than 0.17
seconds (look up the story of Jef Raskin and how he exploited this
psychophysical fact in the design of the Canon Cat sometime; it's very
instructive). The human auditory system can perceive finer timeslices,
down to about 0.02s in skilled musicians, but we're not using elisp
for audio signal processing.

If you take away nothing else from this conversation, at least get it
through your head that "more Lisp might make Emacs too slow" is a
deeply, *deeply* silly idea. It's 2023 and the only ways you can make
a user-facing program slow enough for response lag to be noticeable
are disk latency on spinning rust, network round-trips, or operations
with a superlinear big-O in critical paths.  Mere interpretive overhead
won't do it.

> Finally, you haven't addressed the remainder of the reasons I itemized.

They were too obvious, describing problems that competent software
engineers know how to prevent or hedge against, and you addressed me
as though I were a n00b that just fell off a cabbage truck. My
earliest contributions to Emacs were done so long ago that they
predated the systematic Changelog convention; have you heard the
expression "teaching your grandmother to suck eggs"?  My patience for
that sort of thing is limited.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-10  1:19       ` Eric S. Raymond
@ 2023-08-10  7:44         ` Eli Zaretskii
  2023-08-10 21:54           ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Eli Zaretskii @ 2023-08-10  7:44 UTC (permalink / raw)
  To: esr; +Cc: luangruo, emacs-devel

> Date: Wed, 9 Aug 2023 21:19:11 -0400
> From: "Eric S. Raymond" <esr@thyrsus.com>
> Cc: emacs-devel@gnu.org
> 
> Po Lu <luangruo@yahoo.com>:
> > "Eric S. Raymond" <esr@thyrsus.com> writes:
> > 
> > > When I first worked on Emacs code in the 1980s Lisp was already fast
> > > enough, and machine speeds have gone up by something like 10^3 since.
> > > I plain don't believe the "slower" part can be an issue on modern
> > > hardware, not even on tiny SBCs.
> > 
> > Can you promise the same, if your changes are not restricted to one or
> > two functions in fileio.c, but instead pervade throughout C source?
> 
> Yes, in fact, I can. Because if by some miracle we were able to
> instantly rewrite the entirety of Emacs in Python (which I'm not
> advocating, I chose it because it's the slowest of the major modern
> scripting languages) basic considerations of clocks per second would
> predict it to run a *dead minimum* of two orders of magnitude faster
> than the Emacs of, say, 1990.
> 
> And 1990 Emacs was already way fast enough for the human eye and
> brain, which can't even register interface lag of less than 0.17
> seconds (look up the story of Jef Raskin and how he exploited this
> psychophysical fact in the design of the Canon Cat sometime; it's very
> instructive). The human auditory system can perceive finer timeslices,
> down to about 0.02s in skilled musicians, but we're not using elisp
> for audio signal processing.

This kind of argument is inherently flawed: it's true that today's
machines are much faster than those in, say, 1990, but Emacs nowadays
demands much more horsepower from the CPU than it did back then.
What's more, Emacs is still a single-threaded Lisp machine, although
in the last 10 years CPU power develops more and more in the direction
of multiple cores and execution units, with single execution units
being basically as fast (or as slow) today as they were a decade ago.

And if these theoretical arguments don't convince you, then there are
facts: the Emacs display engine, for example, was completely rewritten
since the 1990s, and is significantly more expensive than the old one
(because it lifts several of the gravest limitations of the old
redisplay).  Similarly with some other core parts and internals.

We are trying to make Lisp programs faster all the time, precisely
because users do complain about annoying delays and slowness.  Various
optimizations in the byte-compiler and the whole native-compilation
feature are parts of this effort, and are another evidence that the
performance concerns are not illusory in Emacs.  And we are still not
there yet: people still do complain from time to time, and not always
because someone selected a sub-optimal algorithm where better ones
exist.

The slowdown caused by moving one primitive to Lisp might be
insignificant, but these slowdowns add up and eventually do show in
user-experience reports.  Rewriting code in Lisp also increases the GC
pressure, and GC cycles are known as one of the significant causes of
slow performance in quite a few cases.  We are currently tracking the
GC performance (see the emacs-gc-stats@gnu.org mailing list) for that
reason, in the hope that we can modify GC and/or its thresholds to
improve performance.

> If you take away nothing else from this conversation, at least get it
> through your head that "more Lisp might make Emacs too slow" is a
> deeply, *deeply* silly idea. It's 2023 and the only ways you can make
> a user-facing program slow enough for response lag to be noticeable
> are disk latency on spinning rust, network round-trips, or operations
> with a superlinear big-O in critical paths.  Mere interpretive overhead
> won't do it.

We found this conclusion to be false in practice, at least in Emacs
practice.

> > Finally, you haven't addressed the remainder of the reasons I itemized.
> 
> They were too obvious, describing problems that competent software
> engineers know how to prevent or hedge against, and you addressed me
> as though I were a n00b that just fell off a cabbage truck. My
> earliest contributions to Emacs were done so long ago that they
> predated the systematic Changelog convention; have you heard the
> expression "teaching your grandmother to suck eggs"?  My patience for
> that sort of thing is limited.

Please be more patient, and please consider what others here say to be
mostly in good-faith and based on non-trivial experience.  If
something in what others here say sounds like an offense to your
intelligence, it is most probably a misunderstanding: for most people
here English is not their first language, so don't expect them to
always be able to find the best words to express what they want to
say.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Shrinking the C core
  2023-08-10  7:44         ` Eli Zaretskii
@ 2023-08-10 21:54           ` Emanuel Berg
  2023-08-11 10:27             ` Bignum performance (was: Shrinking the C core) Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-10 21:54 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii wrote:

> We are trying to make Lisp programs faster all the time,
> precisely because users do complain about annoying delays
> and slowness. Various optimizations in the byte-compiler and
> the whole native-compilation feature are parts of this
> effort

It's very fast with that, we should encourage more people do
use native-compilation.

>> If you take away nothing else from this conversation, at least get it
>> through your head that "more Lisp might make Emacs too slow" is a
>> deeply, *deeply* silly idea. It's 2023 and the only ways you can make
>> a user-facing program slow enough for response lag to be noticeable
>> are disk latency on spinning rust, network round-trips, or operations
>> with a superlinear big-O in critical paths.  Mere interpretive overhead
>> won't do it.
>
> We found this conclusion to be false in practice, at least in Emacs
> practice.

In theory Lisp can be as fast as any other language but in
practice it is not the case with Elisp and Emacs at least.

Here is a n experiment with stats how Emacs/Elisp compares
to SBCL/CL, for this particular one it shows that Elisp, even
natively compiled, is still +78875% slower than Common Lisp.

;;; -*- lexical-binding: t -*-
;;
;; this file:
;;   https://dataswamp.org/~incal/emacs-init/fib.el
;;
;; the CL:
;;   https://dataswamp.org/~incal/cl/fib.cl
;;
;; code from:
;;   elisp-benchmarks-1.14
;;
;; commands: [results]
;;   $ emacs -Q -batch -l fib.el                    [8.660 s]
;;   $ emacs -Q -batch -l fib.elc                   [3.386 s]
;;   $ emacs -Q -batch -l fib-54a44480-bad305eb.eln [3.159 s]
;;   $ sbcl -l fib.cl                               [0.004 s]
;;
;; (stats)
;;   plain  -> byte:     +156%
;;   plain  -> native:   +174%
;;   plain  -> sbcl:  +216400%
;;
;;   byte   -> native:     +7%
;;   byte   -> sbcl:   +84550%
;;
;;   native -> sbcl:   +78875%

(require 'cl-lib)

(defun compare-table (l)
  (cl-loop for (ni ti) in l
           with first = t
        do (setq first t)
           (cl-loop for (nj tj) in l
                 do (when first
                      (insert "\n")
                      (setq first nil))
                    (unless (string= ni nj)
                      (let ((imp (* (- (/ ti tj) 1.0) 100)))
                        (when (< 0 imp)
                          (insert
                            (format ";; %s -> %s: %+.0f%%\n" ni nj imp) )))))))

(defun stats ()
  (let ((p '("plain"  8.660))
        (b '("byte"   3.386))
        (n '("native" 3.159))
        (s '("sbcl"   0.004)) )
    (compare-table (list p b n s)) ))

(defun fib (reps num)
  (let ((z 0))
    (dotimes (_ reps)
      (let ((p1 1)
            (p2 1))
        (dotimes (_ (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))
    z))

(let ((beg (float-time)))
  (fib 10000 1000)
  (message "%.3f s" (- (float-time) beg)) )

;; (shell-command "emacs -Q -batch -l \"~/.emacs.d/emacs-init/fib.el\"")
;; (shell-command "emacs -Q -batch -l \"~/.emacs.d/emacs-init/fib.elc\"")
;; (shell-command "emacs -Q -batch -l \"~/.emacs.d/eln-cache/30.0.50-3b889b4a/fib-54a44480-8bbda87b.eln\"")

(provide 'fib)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Bignum performance (was: Shrinking the C core)
  2023-08-10 21:54           ` Emanuel Berg
@ 2023-08-11 10:27             ` Ihor Radchenko
  2023-08-11 12:10               ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-11 10:27 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> In theory Lisp can be as fast as any other language but in
> practice it is not the case with Elisp and Emacs at least.
>
> Here is a n experiment with stats how Emacs/Elisp compares
> to SBCL/CL, for this particular one it shows that Elisp, even
> natively compiled, is still +78875% slower than Common Lisp.
>
> ...
> (defun fib (reps num)
>   (let ((z 0))
>     (dotimes (_ reps)
>       (let ((p1 1)
>             (p2 1))
>         (dotimes (_ (- num 2))
>           (setf z (+ p1 p2)
>                 p2 p1
>                 p1 z))))
>     z))
>
> (let ((beg (float-time)))
>   (fib 10000 1000)
>   (message "%.3f s" (- (float-time) beg)) )

Most of the time is spent in (1) GC; (2) Creating bigint:

perf record emacs -Q -batch -l /tmp/fib.eln

perf report:

Creating bignums:
    40.95%  emacs    emacs                    [.] allocate_vectorlike
GC:
    20.21%  emacs    emacs                    [.] process_mark_stack
     3.41%  emacs    libgmp.so.10.5.0         [.] __gmpz_sizeinbase
GC:
     3.21%  emacs    emacs                    [.] mark_char_table
     2.82%  emacs    emacs                    [.] pdumper_marked_p_impl
     2.23%  emacs    libc.so.6                [.] 0x0000000000090076
     1.78%  emacs    libgmp.so.10.5.0         [.] __gmpz_add
     1.71%  emacs    emacs                    [.] pdumper_set_marked_impl
     1.59%  emacs    emacs                    [.] arith_driver
     1.31%  emacs    libc.so.6                [.] malloc
GC:
     1.15%  emacs    emacs                    [.] sweep_vectors
     1.03%  emacs    libgmp.so.10.5.0         [.] __gmpn_add_n_coreisbr
     0.88%  emacs    libc.so.6                [.] cfree
     0.87%  emacs    fib.eln                  [.] F666962_fib_0
     0.85%  emacs    emacs                    [.] check_number_coerce_marker
     0.80%  emacs    libc.so.6                [.] 0x0000000000091043
     0.74%  emacs    emacs                    [.] allocate_pseudovector
     0.65%  emacs    emacs                    [.] Flss
     0.57%  emacs    libgmp.so.10.5.0         [.] __gmpz_realloc
     0.56%  emacs    emacs                    [.] make_bignum_bits

My conclusion from this is that big number implementation is not
optimal. Mostly because it does not reuse the existing bignum objects
and always create new ones - every single time we perform an arithmetic
operation.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 10:27             ` Bignum performance (was: Shrinking the C core) Ihor Radchenko
@ 2023-08-11 12:10               ` Emanuel Berg
  2023-08-11 12:32                 ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-11 12:10 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> In theory Lisp can be as fast as any other language but in
>> practice it is not the case with Elisp and Emacs at least.
>>
>> Here is a n experiment with stats how Emacs/Elisp compares
>> to SBCL/CL, for this particular one it shows that Elisp,
>> even natively compiled, is still +78875% slower than
>> Common Lisp.
>>
>> ...
>> (defun fib (reps num)
>>   (let ((z 0))
>>     (dotimes (_ reps)
>>       (let ((p1 1)
>>             (p2 1))
>>         (dotimes (_ (- num 2))
>>           (setf z (+ p1 p2)
>>                 p2 p1
>>                 p1 z))))
>>     z))
>>
>> (let ((beg (float-time)))
>>   (fib 10000 1000)
>>   (message "%.3f s" (- (float-time) beg)) )
>
> Most of the time is spent in (1) GC; (2) Creating bigint:
>
> perf record emacs -Q -batch -l /tmp/fib.eln
>
> perf report:
>
> Creating bignums:
>     40.95%  emacs    emacs                    [.] allocate_vectorlike
> GC:
>     20.21%  emacs    emacs                    [.] process_mark_stack
>      3.41%  emacs    libgmp.so.10.5.0         [.] __gmpz_sizeinbase
> GC:
>      3.21%  emacs    emacs                    [.] mark_char_table
>      2.82%  emacs    emacs                    [.] pdumper_marked_p_impl
>      2.23%  emacs    libc.so.6                [.] 0x0000000000090076
>      1.78%  emacs    libgmp.so.10.5.0         [.] __gmpz_add
>      1.71%  emacs    emacs                    [.] pdumper_set_marked_impl
>      1.59%  emacs    emacs                    [.] arith_driver
>      1.31%  emacs    libc.so.6                [.] malloc
> GC:
>      1.15%  emacs    emacs                    [.] sweep_vectors
>      1.03%  emacs    libgmp.so.10.5.0         [.] __gmpn_add_n_coreisbr
>      0.88%  emacs    libc.so.6                [.] cfree
>      0.87%  emacs    fib.eln                  [.] F666962_fib_0
>      0.85%  emacs    emacs                    [.] check_number_coerce_marker
>      0.80%  emacs    libc.so.6                [.] 0x0000000000091043
>      0.74%  emacs    emacs                    [.] allocate_pseudovector
>      0.65%  emacs    emacs                    [.] Flss
>      0.57%  emacs    libgmp.so.10.5.0         [.] __gmpz_realloc
>      0.56%  emacs    emacs                    [.] make_bignum_bits
>
> My conclusion from this is that big number implementation is
> not optimal. Mostly because it does not reuse the existing
> bignum objects and always create new ones - every single
> time we perform an arithmetic operation.

Okay, interesting, how can you see that from the above data?

So is this a problem with the compiler? Or some
associated library?

If so, I'll see if I can upgrade gcc to gcc 13 and see if that
improves it, maybe they already fixed it ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 12:10               ` Emanuel Berg
@ 2023-08-11 12:32                 ` Ihor Radchenko
  2023-08-11 12:38                   ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-11 12:32 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> perf record emacs -Q -batch -l /tmp/fib.eln
>>
>> perf report:
>>
>> Creating bignums:
>>     40.95%  emacs    emacs                    [.] allocate_vectorlike
>> GC:
>>     20.21%  emacs    emacs                    [.] process_mark_stack
>> ...
>> My conclusion from this is that big number implementation is
>> not optimal. Mostly because it does not reuse the existing
>> bignum objects and always create new ones - every single
>> time we perform an arithmetic operation.
>
> Okay, interesting, how can you see that from the above data?

process_mark_stack is the GC routine. And I see no other reason to call
allocate_vectorlike so much except allocating new bignum objects (which
are vectorlike; see src/lisp.h:pvec_type and src/bignum.h:Lisp_Bignum).

> So is this a problem with the compiler? Or some
> associated library?

GC is the well-known problem of garbage-collector being slow when we
allocate a large number of objects.

And the fact that we allocate many objects is related to immutability of
bignums. Every time we do (setq bignum (* bignum fixint)), we abandon
the old object holding BIGNUM value and allocate a new bignum object
with a new value. Clearly, this allocation is not free and takes a lot
of CPU time. While the computation itself is fast.

Maybe we could somehow re-use the already allocated bignum objects,
similar to what is done for cons cells (see src/alloc.c:Fcons).

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 12:32                 ` Ihor Radchenko
@ 2023-08-11 12:38                   ` Emanuel Berg
  2023-08-11 14:07                     ` [PATCH] " Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-11 12:38 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> And the fact that we allocate many objects is related to
> immutability of bignums. Every time we do (setq bignum (*
> bignum fixint)), we abandon the old object holding BIGNUM
> value and allocate a new bignum object with a new value.
> Clearly, this allocation is not free and takes a lot of CPU
> time. While the computation itself is fast.

So this happens in Emacs C code, OK.

> Maybe we could somehow re-use the already allocated bignum
> objects, similar to what is done for cons cells (see
> src/alloc.c:Fcons).

Sounds reasonable :)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 12:38                   ` Emanuel Berg
@ 2023-08-11 14:07                     ` Ihor Radchenko
  2023-08-11 18:06                       ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-11 14:07 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3633 bytes --]

Emanuel Berg <incal@dataswamp.org> writes:

>> Maybe we could somehow re-use the already allocated bignum
>> objects, similar to what is done for cons cells (see
>> src/alloc.c:Fcons).
>
> Sounds reasonable :)

And... is has been already done, actually.
allocate_vectorlike calls allocate_vector_from_block, which re-uses
pre-allocated objects.

And looking into the call graph, this exact branch calling
allocate_vector_from_block is indeed called for the bignums:

   33.05%     0.00%  emacs    [unknown]                       [.] 0000000000000000
            |
            ---0
               |          
               |--28.04%--allocate_vectorlike
               |          |          
               |           --27.78%--allocate_vector_from_block (inlined)
               |                     |          
               |                     |--2.13%--next_vector (inlined)
               |                     |          
               |                      --0.74%--setup_on_free_list (inlined)

If it manually cut off `allocate_vector_from_block', the benchmark time
increases twice. So, there is already some improvement coming from
re-using allocated memory.

I looked deeper into the code tried to cut down on unnecessary looping
over the pre-allocated `vector_free_lists'. See the attached patch.

Without the patch:

 perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
 2.321 s
 
     28.60%  emacs    emacs                        [.] allocate_vectorlike
    24.36%  emacs    emacs                        [.] process_mark_stack
     3.76%  emacs    libgmp.so.10.5.0             [.] __gmpz_sizeinbase
     3.59%  emacs    emacs                        [.] pdumper_marked_p_impl
     3.53%  emacs    emacs                        [.] mark_char_table

With the patch:

perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
1.968 s

    33.17%  emacs    emacs                          [.] process_mark_stack
     5.51%  emacs    libgmp.so.10.5.0               [.] __gmpz_sizeinbase
     5.05%  emacs    emacs                          [.] mark_char_table
     4.88%  emacs    emacs                          [.] pdumper_marked_p_impl
     3.30%  emacs    emacs                          [.] pdumper_set_marked_impl
...
     2.52%  emacs    emacs                          [.] allocate_vectorlike

allocate_vectorlike clearly takes a lot less time by not trying to loop
over all the ~500 empty elements of vector_free_lists.

We can further get rid of the GC by temporarily disabling it (just for
demonstration):

(let ((beg (float-time)))
  (setq gc-cons-threshold most-positive-fixnum)
  (fib 10000 1000)
  (message "%.3f s" (- (float-time) beg)) )

perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s

    17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
     7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
     6.51%  emacs    emacs                 [.] arith_driver
     6.03%  emacs    libc.so.6             [.] malloc
     5.57%  emacs    emacs                 [.] allocate_vectorlike
     5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
     4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
     3.72%  emacs    emacs                 [.] check_number_coerce_marker
     3.35%  emacs    fib.eln               [.] F666962_fib_0
     3.29%  emacs    emacs                 [.] allocate_pseudovector
     2.30%  emacs    emacs                 [.] Flss

Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time.

I am not sure what differs between Elisp gmp bindings and analogous SBCL
binding so that SBCL is so much faster.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: allocate_vector_from_block.diff --]
[-- Type: text/x-patch, Size: 1712 bytes --]

diff --git a/src/alloc.c b/src/alloc.c
index 17ca5c725d0..62e96b4c9de 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p)
    vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
 
 static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
 
 /* Singly-linked list of large vectors.  */
 
@@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t nbytes)
   set_next_vector (v, vector_free_lists[vindex]);
   ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
   vector_free_lists[vindex] = v;
+  if ( vindex < vector_free_lists_min_idx )
+    vector_free_lists_min_idx = vindex;
 }
 
 /* Get a new vector block.  */
@@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes)
   /* Next, check free lists containing larger vectors.  Since
      we will split the result, we should have remaining space
      large enough to use for one-slot vector at least.  */
-  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
-       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+  for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN), vector_free_lists_min_idx );
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++, vector_free_lists_min_idx++)
     if (vector_free_lists[index])
       {
 	/* This vector is larger than requested.  */
@@ -3413,6 +3416,7 @@ sweep_vectors (void)
   gcstat.total_vectors = 0;
   gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
   memset (vector_free_lists, 0, sizeof (vector_free_lists));
+  vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
 
   /* Looking through vector blocks.  */
 

[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 14:07                     ` [PATCH] " Ihor Radchenko
@ 2023-08-11 18:06                       ` Emanuel Berg
  2023-08-11 19:41                         ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-11 18:06 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> Maybe we could somehow re-use the already allocated bignum
>>> objects, similar to what is done for cons cells (see
>>> src/alloc.c:Fcons).
>>
>> Sounds reasonable :)
>
> And... is has been already done, actually.
> allocate_vectorlike calls allocate_vector_from_block, which
> re-uses pre-allocated objects.
>
> And looking into the call graph, this exact branch calling
> allocate_vector_from_block is indeed called for the bignums [...]

Are we talking a list of Emacs C functions executing with the
corresponding times they have been in execution in a tree data
structure? :O

E.g. where do we find allocate_vectorlike ?

> With the patch:
>
> perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 1.968 s
>
>     33.17%  emacs    emacs                          [.] process_mark_stack
>      5.51%  emacs    libgmp.so.10.5.0               [.] __gmpz_sizeinbase
>      5.05%  emacs    emacs                          [.] mark_char_table
>      4.88%  emacs    emacs                          [.] pdumper_marked_p_impl
>      3.30%  emacs    emacs                          [.] pdumper_set_marked_impl
> ...
>      2.52%  emacs    emacs                          [.] allocate_vectorlike
>
> allocate_vectorlike clearly takes a lot less time by not trying to loop
> over all the ~500 empty elements of vector_free_lists.
>
> We can further get rid of the GC by temporarily disabling it (just for
> demonstration):
>
> (let ((beg (float-time)))
>   (setq gc-cons-threshold most-positive-fixnum)
>   (fib 10000 1000)
>   (message "%.3f s" (- (float-time) beg)) )
>
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
>
>     17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>      7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>      6.51%  emacs    emacs                 [.] arith_driver
>      6.03%  emacs    libc.so.6             [.] malloc
>      5.57%  emacs    emacs                 [.] allocate_vectorlike
>      5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
>      4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
>      3.72%  emacs    emacs                 [.] check_number_coerce_marker
>      3.35%  emacs    fib.eln               [.] F666962_fib_0
>      3.29%  emacs    emacs                 [.] allocate_pseudovector
>      2.30%  emacs    emacs                 [.] Flss
>
> Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time.
>
> I am not sure what differs between Elisp gmp bindings and analogous SBCL
> binding so that SBCL is so much faster.
>
> diff --git a/src/alloc.c b/src/alloc.c
> index 17ca5c725d0..62e96b4c9de 100644
> --- a/src/alloc.c
> +++ b/src/alloc.c
> @@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p)
>     vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
>
>  static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
> +static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>  /* Singly-linked list of large vectors.  */
>
> @@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t
>  nbytes)
>    set_next_vector (v, vector_free_lists[vindex]);
>    ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
>    vector_free_lists[vindex] = v;
> +  if ( vindex < vector_free_lists_min_idx )
> +    vector_free_lists_min_idx = vindex;
>  }
>
>  /* Get a new vector block.  */
> @@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes)
>    /* Next, check free lists containing larger vectors.  Since
>       we will split the result, we should have remaining space
>       large enough to use for one-slot vector at least.  */
> -  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
> -       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
> +  for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN),
>  vector_free_lists_min_idx );
> +       index < VECTOR_MAX_FREE_LIST_INDEX; index++,
>  vector_free_lists_min_idx++)
>      if (vector_free_lists[index])
>        {
>  	/* This vector is larger than requested.  */
> @@ -3413,6 +3416,7 @@ sweep_vectors (void)
>    gcstat.total_vectors = 0;
>    gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
> +  vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX;
>
>    /* Looking through vector blocks.  */

Amazing! :O

See if you can do my original test, which was 1-3 Elisp,
byte-compiled Elisp, and natively compiled Elisp, and the
Common Lisp execution (on your computer), if you'd like.

Actually it is a bit of a bummer to the community since Emacs
is like THE portal into Lisp. We should have the best Lisp in
the business, and I don't see why not? Emacs + SBCL + CL +
Elisp anyone?

I.e. real CL not the cl- which is actually in Elisp. Not that
there is anything wrong with that! On the contrary ;)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 18:06                       ` Emanuel Berg
@ 2023-08-11 19:41                         ` Ihor Radchenko
  2023-08-11 19:50                           ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-11 19:41 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> And... is has been already done, actually.
>> allocate_vectorlike calls allocate_vector_from_block, which
>> re-uses pre-allocated objects.
>>
>> And looking into the call graph, this exact branch calling
>> allocate_vector_from_block is indeed called for the bignums [...]
>
> Are we talking a list of Emacs C functions executing with the
> corresponding times they have been in execution in a tree data
> structure? :O

That's what GNU perf does - it is a sampling profiler in GNU/Linux.
The Elisp equivalent is profiler.el, but it does not reveal underlying C
functions.

> E.g. where do we find allocate_vectorlike ?

I have listed the commands I used (from terminal):

1. perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
   <records CPU stats while running emacs>
2. perf report
   <displays the stats>

You need Emacs compiled with debug symbols the get meaningful output.

See more at https://www.brendangregg.com/perf.html

> See if you can do my original test, which was 1-3 Elisp,
> byte-compiled Elisp, and natively compiled Elisp, and the
> Common Lisp execution (on your computer), if you'd like.

As you wish:

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [5.783 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [1.961 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [1.901 s]
$ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/fib.cl [0.007 s]

without the patch (on my system)

$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [6.546 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [2.498 s]
$ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [2.518 s]


Also, the patch gives improvements for more than just bignums.
I ran elisp-benchmarks
(https://elpa.gnu.org/packages/elisp-benchmarks.html) and got


(before the patch)
  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.70 |       0.06 |       1 |        0.76 |            0.07 |
  | bubble-no-cons     |           1.17 |       0.00 |       0 |        1.17 |            0.02 |
  | bytecomp           |           1.74 |       0.29 |      13 |        2.03 |            0.12 |
  | dhrystone          |           2.30 |       0.00 |       0 |        2.30 |            0.07 |
  | eieio              |           1.25 |       0.13 |       7 |        1.38 |            0.03 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.53 |       0.00 |       0 |        1.53 |            0.03 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.48 |       0.00 |       0 |        1.48 |            0.04 |
  | inclist            |           1.07 |       0.00 |       0 |        1.07 |            0.02 |
  | inclist-type-hints |           1.00 |       0.00 |       0 |        1.00 |            0.07 |
  | listlen-tc         |           0.13 |       0.00 |       0 |        0.13 |            0.03 |
  | map-closure        |           5.26 |       0.00 |       0 |        5.26 |            0.09 |
  | nbody              |           1.61 |       0.17 |       1 |        1.78 |            0.06 |
  | pack-unpack        |           0.31 |       0.02 |       1 |        0.33 |            0.00 |
  | pack-unpack-old    |           0.50 |       0.05 |       3 |        0.55 |            0.02 |
  | pcase              |           1.85 |       0.00 |       0 |        1.85 |            0.05 |
  | pidigits           |           4.41 |       0.96 |      17 |        5.37 |            0.13 |
  | scroll             |           0.64 |       0.00 |       0 |        0.64 |            0.01 |
  | smie               |           1.59 |       0.04 |       2 |        1.63 |            0.03 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          28.54 |       1.72 |      45 |       30.26 |            0.26 |


(after the patch)
  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.68 |       0.05 |       1 |        0.73 |            0.04 |
  | bubble-no-cons     |           1.00 |       0.00 |       0 |        1.00 |            0.04 |
  | bytecomp           |           1.60 |       0.23 |      13 |        1.82 |            0.16 |
  | dhrystone          |           2.03 |       0.00 |       0 |        2.03 |            0.05 |
  | eieio              |           1.08 |       0.12 |       7 |        1.20 |            0.07 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.44 |       0.00 |       0 |        1.44 |            0.12 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.36 |       0.00 |       0 |        1.36 |            0.09 |
  | inclist            |           1.00 |       0.00 |       0 |        1.00 |            0.00 |
  | inclist-type-hints |           1.00 |       0.00 |       0 |        1.00 |            0.07 |
  | listlen-tc         |           0.11 |       0.00 |       0 |        0.11 |            0.02 |
  | map-closure        |           4.91 |       0.00 |       0 |        4.91 |            0.12 |
  | nbody              |           1.47 |       0.17 |       1 |        1.64 |            0.08 |
  | pack-unpack        |           0.29 |       0.02 |       1 |        0.31 |            0.01 |
  | pack-unpack-old    |           0.43 |       0.05 |       3 |        0.48 |            0.01 |
  | pcase              |           1.84 |       0.00 |       0 |        1.84 |            0.07 |
  | pidigits           |           3.16 |       0.94 |      17 |        4.11 |            0.10 |
  | scroll             |           0.58 |       0.00 |       0 |        0.58 |            0.00 |
  | smie               |           1.40 |       0.04 |       2 |        1.44 |            0.06 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          25.38 |       1.62 |      45 |       27.00 |            0.32 |

About ~10% improvement, with each individual benchmark being faster.

Note how fibn test takes 0.00 seconds. It is limited to fixnum range.

> Actually it is a bit of a bummer to the community since Emacs
> is like THE portal into Lisp. We should have the best Lisp in
> the business, and I don't see why not? Emacs + SBCL + CL +
> Elisp anyone?

This is a balancing act. Elisp is tailored for Emacs as an editor. So,
trade-offs are inevitable. I am skeptical about Elisp overperforming CL.
But it does not mean that we should not try to improve things.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 19:41                         ` Ihor Radchenko
@ 2023-08-11 19:50                           ` Emanuel Berg
  2023-08-12  8:24                             ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-11 19:50 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> See if you can do my original test, which was 1-3 Elisp,
>> byte-compiled Elisp, and natively compiled Elisp, and the
>> Common Lisp execution (on your computer), if you'd like.
>
> As you wish:
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [5.783 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [1.961 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [1.901 s]
> $ SBCL_HOME=/usr/lib64/sbcl sbcl --load /tmp/fib.cl [0.007 s]
>
> without the patch (on my system)
>
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.el    [6.546 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.elc   [2.498 s]
> $ ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln   [2.518 s]

The stats seem to speak one language ...

> Also, the patch gives improvements for more than just
> bignums
>   | test | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s)
> | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble | 0.70 | 0.06 | 1 | 0.76 | 0.07 |
>   | bubble-no-cons | 1.17 | 0.00 | 0 | 1.17 | 0.02 |
>   | bytecomp | 1.74 | 0.29 | 13 | 2.03 | 0.12 |
>   | dhrystone | 2.30 | 0.00 | 0 | 2.30 | 0.07 |
>   | eieio | 1.25 | 0.13 | 7 | 1.38 | 0.03 |
>   | fibn | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-named-let | 1.53 | 0.00 | 0 | 1.53 | 0.03 |
>   | fibn-rec | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-tc | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | flet | 1.48 | 0.00 | 0 | 1.48 | 0.04 |
>   | inclist | 1.07 | 0.00 | 0 | 1.07 | 0.02 |
>   | inclist-type-hints | 1.00 | 0.00 | 0 | 1.00 | 0.07 |
>   | listlen-tc | 0.13 | 0.00 | 0 | 0.13 | 0.03 |
>   | map-closure | 5.26 | 0.00 | 0 | 5.26 | 0.09 |
>   | nbody | 1.61 | 0.17 | 1 | 1.78 | 0.06 |
>   | pack-unpack | 0.31 | 0.02 | 1 | 0.33 | 0.00 |
>   | pack-unpack-old | 0.50 | 0.05 | 3 | 0.55 | 0.02 |
>   | pcase | 1.85 | 0.00 | 0 | 1.85 | 0.05 |
>   | pidigits | 4.41 | 0.96 | 17 | 5.37 | 0.13 |
>   | scroll | 0.64 | 0.00 | 0 | 0.64 | 0.01 |
>   | smie | 1.59 | 0.04 | 2 | 1.63 | 0.03 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total | 28.54 | 1.72 | 45 | 30.26 | 0.26 |
>
> (after the patch)
>   | test | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s)
> | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble | 0.68 | 0.05 | 1 | 0.73 | 0.04 |
>   | bubble-no-cons | 1.00 | 0.00 | 0 | 1.00 | 0.04 |
>   | bytecomp | 1.60 | 0.23 | 13 | 1.82 | 0.16 |
>   | dhrystone | 2.03 | 0.00 | 0 | 2.03 | 0.05 |
>   | eieio | 1.08 | 0.12 | 7 | 1.20 | 0.07 |
>   | fibn | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-named-let | 1.44 | 0.00 | 0 | 1.44 | 0.12 |
>   | fibn-rec | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | fibn-tc | 0.00 | 0.00 | 0 | 0.00 | 0.00 |
>   | flet | 1.36 | 0.00 | 0 | 1.36 | 0.09 |
>   | inclist | 1.00 | 0.00 | 0 | 1.00 | 0.00 |
>   | inclist-type-hints | 1.00 | 0.00 | 0 | 1.00 | 0.07 |
>   | listlen-tc | 0.11 | 0.00 | 0 | 0.11 | 0.02 |
>   | map-closure | 4.91 | 0.00 | 0 | 4.91 | 0.12 |
>   | nbody | 1.47 | 0.17 | 1 | 1.64 | 0.08 |
>   | pack-unpack | 0.29 | 0.02 | 1 | 0.31 | 0.01 |
>   | pack-unpack-old | 0.43 | 0.05 | 3 | 0.48 | 0.01 |
>   | pcase | 1.84 | 0.00 | 0 | 1.84 | 0.07 |
>   | pidigits | 3.16 | 0.94 | 17 | 4.11 | 0.10 |
>   | scroll | 0.58 | 0.00 | 0 | 0.58 | 0.00 |
>   | smie | 1.40 | 0.04 | 2 | 1.44 | 0.06 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total | 25.38 | 1.62 | 45 | 27.00 | 0.32 |
>
> About ~10% improvement, with each individual benchmark being faster.

Ten percent? We take it :)

> Note how fibn test takes 0.00 seconds. It is limited to
> fixnum range.

What does that say/mean?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-11 19:50                           ` Emanuel Berg
@ 2023-08-12  8:24                             ` Ihor Radchenko
  2023-08-12 16:03                               ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-12  8:24 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> Note how fibn test takes 0.00 seconds. It is limited to
>> fixnum range.
>
> What does that say/mean?

It tells that normal int operations are much, much faster compared to bigint.
So, your benchmark is rather esoteric if we consider normal usage patterns.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12  8:24                             ` Ihor Radchenko
@ 2023-08-12 16:03                               ` Emanuel Berg
  2023-08-13  9:09                                 ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-12 16:03 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> Note how fibn test takes 0.00 seconds. It is limited to
>>> fixnum range.
>>
>> What does that say/mean?
>
> It tells that normal int operations are much, much faster
> compared to bigint. So, your benchmark is rather esoteric if
> we consider normal usage patterns.

Didn't you provide a whole set of benchmarks with an
approximated gain of 10%? Maybe some esoteric nature is
built-in in the benchmark concept ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-12 16:03                               ` Emanuel Berg
@ 2023-08-13  9:09                                 ` Ihor Radchenko
  2023-08-13  9:49                                   ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-13  9:09 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> It tells that normal int operations are much, much faster
>> compared to bigint. So, your benchmark is rather esoteric if
>> we consider normal usage patterns.
>
> Didn't you provide a whole set of benchmarks with an
> approximated gain of 10%? Maybe some esoteric nature is
> built-in in the benchmark concept ...

The main problem your benchmark demonstrated is with bignum.
By accident, it also revealed slight inefficiency in vector allocation,
but this inefficiency is nowhere near SBCL 0.007 sec vs. Elisp 2.5 sec.

In practice, as more generic benchmarks demonstrated, we only had 10%
performance hit. Not something to claim that Elisp is much slower
compared to CL.

It would be more useful to compare CL with Elisp using less specialized
benchmarks that do not involve bignums. As Mattias commented, we do not
care much about bignum performance in Elisp - it is a rarely used
feature; we are content that it simply works, even if not fast, and the
core contributors (at least, Mattias) are not seeing improving bignums
as their priority.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13  9:09                                 ` Ihor Radchenko
@ 2023-08-13  9:49                                   ` Emanuel Berg
  2023-08-13 10:21                                     ` Ihor Radchenko
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-13  9:49 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

> The main problem your benchmark demonstrated is with bignum.
> By accident, it also revealed slight inefficiency in vector
> allocation, but this inefficiency is nowhere near SBCL 0.007
> sec vs. Elisp 2.5 sec.

Yeah, we can't have that.

> In practice, as more generic benchmarks demonstrated, we
> only had 10% performance hit. Not something to claim that
> Elisp is much slower compared to CL.

What do you mean, generic +10% is a huge improvement.

> It would be more useful to compare CL with Elisp using less
> specialized benchmarks that do not involve bignums.
> As Mattias commented, we do not care much about bignum
> performance in Elisp - it is a rarely used feature; we are
> content that it simply works, even if not fast, and the core
> contributors (at least, Mattias) are not seeing improving
> bignums as their priority.

But didn't your patch do that already?

It would indicate that it is possible to do it all in/to
Elisp, which would be the best way to solve this problem _and_
not have any of the integration, maybe portability issues
described ...

So 1, the first explanation why CL is much faster is another
implementation of bignums handling which is faster in CL, if
that has already been solved here absolutely no reason not to
include it as 10% is a huge gain, even more so for a whole set
of benchmarks.

Instead of relying on a single benchmark, one should have
a set of benchmarks and every benchmark should have a purpose,
this doesn't have to be so involved tho, for example "bignums"
could be the purpose of my benchmark, so one would have
several, say a dozen, each with the purpose of slowing the
computer down with respect to some aspect or known
situation that one would try to provoke ... It can be
well-known algorithms for that matter.

One would then do the same thing in CL and see, where do CL
perform much better? The next question would be, why?

If it is just about piling up +10%, let's do it!

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13  9:49                                   ` Emanuel Berg
@ 2023-08-13 10:21                                     ` Ihor Radchenko
  2023-08-14  2:20                                       ` Emanuel Berg
  0 siblings, 1 reply; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-13 10:21 UTC (permalink / raw)
  To: Emanuel Berg; +Cc: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

>> In practice, as more generic benchmarks demonstrated, we
>> only had 10% performance hit. Not something to claim that
>> Elisp is much slower compared to CL.
>
> What do you mean, generic +10% is a huge improvement.

It is, but it is also tangent to comparison between Elisp and CL. The
main (AFAIU) difference between Elisp and CL is in how the bignums are
stored. Elisp uses its own internal object type while CL uses GMP's
native format. And we have huge overheads converting things
back-and-forth between GMP and Elisp formats. It is by choice. And my
patch did not do anything about this difference.

Also, +10% is just on my machine. We need someone else to test things
before jumping to far-reaching conclusions. I plan to submit the patch
in a less ad-hoc state later, as a separate ticket.

>> It would be more useful to compare CL with Elisp using less
>> specialized benchmarks that do not involve bignums.
>> As Mattias commented, we do not care much about bignum
>> performance in Elisp - it is a rarely used feature; we are
>> content that it simply works, even if not fast, and the core
>> contributors (at least, Mattias) are not seeing improving
>> bignums as their priority.
>
> But didn't your patch do that already?

No. The benchmark only compared between Elisp before/after the patch.
Not with CL.

> Instead of relying on a single benchmark, one should have
> a set of benchmarks and every benchmark should have a purpose,
> this doesn't have to be so involved tho, for example "bignums"
> could be the purpose of my benchmark, so one would have
> several, say a dozen, each with the purpose of slowing the
> computer down with respect to some aspect or known
> situation that one would try to provoke ... It can be
> well-known algorithms for that matter.
>
> One would then do the same thing in CL and see, where do CL
> perform much better? The next question would be, why?

Sure. Feel free to share such benchmark for Elisp vs. CL. I only know
the benchmark library for Elisp. No equivalent comparable benchmark for
CL.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance (was: Shrinking the C core)
  2023-08-13 10:21                                     ` Ihor Radchenko
@ 2023-08-14  2:20                                       ` Emanuel Berg
  2023-08-14  2:42                                         ` [PATCH] Re: Bignum performance Po Lu
  0 siblings, 1 reply; 47+ messages in thread
From: Emanuel Berg @ 2023-08-14  2:20 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>>> In practice, as more generic benchmarks demonstrated, we
>>> only had 10% performance hit. Not something to claim that
>>> Elisp is much slower compared to CL.
>>
>> What do you mean, generic +10% is a huge improvement.
>
> It is, but it is also tangent to comparison between Elisp
> and CL. The main (AFAIU) difference between Elisp and CL is
> in how the bignums are stored. Elisp uses its own internal
> object type while CL uses GMP's native format.

GMP = GNU Multiple Precision Arithmetic Library.

https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library

> And we have huge overheads converting things back-and-forth
> between GMP and Elisp formats. It is by choice. And my patch
> did not do anything about this difference.

But that's all the better, your patch solved (very likely) the
problem and did so without causing havoc by trying to forcibly
merge opposing solutions.

And the method was: instead of reallocating new objects for
bignums, we are no reusing existing allocations for new data?

>>> It would be more useful to compare CL with Elisp using
>>> less specialized benchmarks that do not involve bignums.
>>> As Mattias commented, we do not care much about bignum
>>> performance in Elisp - it is a rarely used feature; we are
>>> content that it simply works, even if not fast, and the
>>> core contributors (at least, Mattias) are not seeing
>>> improving bignums as their priority.
>>
>> But didn't your patch do that already?
>
> No. The benchmark only compared between Elisp before/after
> the patch. Not with CL.

No, that much I understood. It was Elisp before and after the
patch, as you say. Isn't before/after all data you need? Nah,
it can be useful to have an external reference as well and
here were are also hoping we can use the benchmarks to answer
they question if CL is just so much faster in general, or if
there are certain areas where it excels - and if so - what
those areas are and what they contain to unlock all
that speed.

>> Instead of relying on a single benchmark, one should have
>> a set of benchmarks and every benchmark should have
>> a purpose, this doesn't have to be so involved tho, for
>> example "bignums" could be the purpose of my benchmark, so
>> one would have several, say a dozen, each with the purpose
>> of slowing the computer down with respect to some aspect or
>> known situation that one would try to provoke ... It can be
>> well-known algorithms for that matter.
>>
>> One would then do the same thing in CL and see, where do CL
>> perform much better? The next question would be, why?
>
> Sure. Feel free to share such benchmark for Elisp vs. CL.
> I only know the benchmark library for Elisp. No equivalent
> comparable benchmark for CL.

I'm working on it! This will be very interesting, for sure.

The need for speed - but in a very methodical way ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  2:20                                       ` Emanuel Berg
@ 2023-08-14  2:42                                         ` Po Lu
  2023-08-14  4:16                                           ` Emanuel Berg
  2023-08-14  7:15                                           ` Ihor Radchenko
  0 siblings, 2 replies; 47+ messages in thread
From: Po Lu @ 2023-08-14  2:42 UTC (permalink / raw)
  To: emacs-devel

Emanuel Berg <incal@dataswamp.org> writes:

> Ihor Radchenko wrote:
>
>>>> In practice, as more generic benchmarks demonstrated, we
>>>> only had 10% performance hit. Not something to claim that
>>>> Elisp is much slower compared to CL.
>>>
>>> What do you mean, generic +10% is a huge improvement.
>>
>> It is, but it is also tangent to comparison between Elisp
>> and CL. The main (AFAIU) difference between Elisp and CL is
>> in how the bignums are stored. Elisp uses its own internal
>> object type while CL uses GMP's native format.
>
> GMP = GNU Multiple Precision Arithmetic Library.
>
> https://en.wikipedia.org/wiki/GNU_Multiple_Precision_Arithmetic_Library
>
>> And we have huge overheads converting things back-and-forth
>> between GMP and Elisp formats. It is by choice. And my patch
>> did not do anything about this difference.

AFAIU, no conversion takes place between ``Elisp formats'' and GMP
formats.  Our bignums rely on GMP for all data storage and memory
allocation.

struct Lisp_Bignum
{
  union vectorlike_header header;
  mpz_t value; <-------------------- GMP type
} GCALIGNED_STRUCT;

and finally:

INLINE mpz_t const *
bignum_val (struct Lisp_Bignum const *i)
{
  return &i->value;
}
INLINE mpz_t const *
xbignum_val (Lisp_Object i)
{
  return bignum_val (XBIGNUM (i));
}



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  2:42                                         ` [PATCH] Re: Bignum performance Po Lu
@ 2023-08-14  4:16                                           ` Emanuel Berg
  2023-08-14  7:15                                           ` Ihor Radchenko
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-14  4:16 UTC (permalink / raw)
  To: emacs-devel

Po Lu wrote:

> AFAIU, no conversion takes place between ``Elisp formats''
> and GMP formats. Our bignums rely on GMP for all data
> storage and memory allocation.

There was a problem with that as likely indicated by the
Fibonacci benchmark, that situation has hopefully been patched
by now or will so be, so now it will be interesting to see if
we can identify other such areas, and if they can be solved as
effortlessly ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  2:42                                         ` [PATCH] Re: Bignum performance Po Lu
  2023-08-14  4:16                                           ` Emanuel Berg
@ 2023-08-14  7:15                                           ` Ihor Radchenko
  2023-08-14  7:50                                             ` Po Lu
  2023-08-15 14:28                                             ` Emanuel Berg
  1 sibling, 2 replies; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-14  7:15 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

>>> And we have huge overheads converting things back-and-forth
>>> between GMP and Elisp formats. It is by choice. And my patch
>>> did not do anything about this difference.
>
> AFAIU, no conversion takes place between ``Elisp formats'' and GMP
> formats.  Our bignums rely on GMP for all data storage and memory
> allocation.

Thanks for the clarification!
So, GMP is not as fast as SBCL's implementation after all.
SBCL uses https://github.com/sbcl/sbcl/blob/master/src/code/bignum.lisp
- a custom bignum implementation, which is clearly faster compared to
GMP (in the provided benchmark):

perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
0.739 s
(0.007 s for SBCL)

    17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
     7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add

^^ already >0.1 sec.

     6.51%  emacs    emacs                 [.] arith_driver
     6.03%  emacs    libc.so.6             [.] malloc
     5.57%  emacs    emacs                 [.] allocate_vectorlike
     5.20%  emacs    [unknown]             [k] 0xffffffffaae01857
     4.16%  emacs    libgmp.so.10.5.0      [.] __gmpn_add_n_coreisbr
     3.72%  emacs    emacs                 [.] check_number_coerce_marker
     3.35%  emacs    fib.eln               [.] F666962_fib_0
     3.29%  emacs    emacs                 [.] allocate_pseudovector
     2.30%  emacs    emacs                 [.] Flss

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  7:15                                           ` Ihor Radchenko
@ 2023-08-14  7:50                                             ` Po Lu
  2023-08-14  9:28                                               ` Ihor Radchenko
  2023-08-15 14:28                                             ` Emanuel Berg
  1 sibling, 1 reply; 47+ messages in thread
From: Po Lu @ 2023-08-14  7:50 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel

Ihor Radchenko <yantar92@posteo.net> writes:

> Po Lu <luangruo@yahoo.com> writes:
>
>>>> And we have huge overheads converting things back-and-forth
>>>> between GMP and Elisp formats. It is by choice. And my patch
>>>> did not do anything about this difference.
>>
>> AFAIU, no conversion takes place between ``Elisp formats'' and GMP
>> formats.  Our bignums rely on GMP for all data storage and memory
>> allocation.
>
> Thanks for the clarification!
> So, GMP is not as fast as SBCL's implementation after all.
> SBCL uses https://github.com/sbcl/sbcl/blob/master/src/code/bignum.lisp
> - a custom bignum implementation, which is clearly faster compared to
> GMP (in the provided benchmark):

GMP is significantly faster than all other known bignum libraries.
Bignums are not considered essential for Emacs's performance, so the GMP
library is utilized in an inefficient fashion.

> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
> (0.007 s for SBCL)
>
>     17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>
>      7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>
> ^^ already >0.1 sec.

The subroutine actually performing arithmetic is in fact
mpn_add_n_coreisbr.

mpz_add and mpz_sizeinbase are ``mpz'' functions that perform memory
allocation, and our bignum functions frequently utilize mpz_sizeinbase
to ascertain whether a result can be represented as a fixnum.  As such,
they don't constitute a fair comparison between the speed of the GMP
library itself and SBCL.

GMP provides low-level functions that place responsibility for memory
management and input verification in the hands of the programmer.  These
are usually implemented in CPU-specific assembler, and are very fast.
That being said, they're not available within mini-gmp, and the primary
bottleneck is in fact mpz_sizeinbase.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  7:50                                             ` Po Lu
@ 2023-08-14  9:28                                               ` Ihor Radchenko
  0 siblings, 0 replies; 47+ messages in thread
From: Ihor Radchenko @ 2023-08-14  9:28 UTC (permalink / raw)
  To: Po Lu; +Cc: emacs-devel

Po Lu <luangruo@yahoo.com> writes:

> GMP is significantly faster than all other known bignum libraries.
> Bignums are not considered essential for Emacs's performance, so the GMP
> library is utilized in an inefficient fashion.

I understand. It adds to the Mattias' opinion that we should not care
too much about bignum performance in practice.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] Re: Bignum performance
  2023-08-14  7:15                                           ` Ihor Radchenko
  2023-08-14  7:50                                             ` Po Lu
@ 2023-08-15 14:28                                             ` Emanuel Berg
  1 sibling, 0 replies; 47+ messages in thread
From: Emanuel Berg @ 2023-08-15 14:28 UTC (permalink / raw)
  To: emacs-devel

Ihor Radchenko wrote:

>> AFAIU, no conversion takes place between ``Elisp formats''
>> and GMP formats. Our bignums rely on GMP for all data
>> storage and memory allocation.
>
> Thanks for the clarification! So, GMP is not as fast as
> SBCL's implementation after all. SBCL uses
> https://github.com/sbcl/sbcl/blob/master/src/code/bignum.lisp
> - a custom bignum implementation, which is clearly faster
> compared to GMP (in the provided benchmark):
>
> perf record  ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln
> 0.739 s
> (0.007 s for SBCL)
>
>     17.11%  emacs    libgmp.so.10.5.0      [.] __gmpz_sizeinbase
>      7.35%  emacs    libgmp.so.10.5.0      [.] __gmpz_add
>
> ^^ already >0.1 sec.

And we are not the only ones using GMP, right?

So maybe this issue in particular would be solved even more
broadly by a SBCL -> GMP transition ...

BTW, here are a bunch of new benchmarks from elisp-benchmarks
brought to CL. Ironically, as some of them were
from CL in the first place. But it is the way it goes.

  https://dataswamp.org/~incal/cl/bench/

Several benchmarks are to Emacs specific to make sense (work)
anywhere else, but there are still a few left to do.

Not that in fib.cl there are now, also brought from
elisp-benchmarks, a bunch of new implementations to try.

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2023-09-04  4:13 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-14  6:28 [PATCH] Re: Bignum performance (was: Shrinking the C core) Gerd Möllmann
2023-08-14  6:56 ` Gerd Möllmann
2023-08-14  7:04   ` Ihor Radchenko
2023-08-14  7:35     ` Gerd Möllmann
2023-08-14  8:09       ` Ihor Radchenko
2023-08-14  9:28         ` Gerd Möllmann
2023-08-14  9:42           ` Ihor Radchenko
2023-08-15 14:03             ` Emanuel Berg
2023-08-15 15:01               ` Ihor Radchenko
2023-08-15 22:21                 ` Emanuel Berg
2023-08-15 22:33                 ` Emanuel Berg
2023-08-16  4:36                   ` tomas
2023-08-16  5:23                     ` Emanuel Berg
2023-08-14 16:51           ` Emanuel Berg
2023-08-15  4:58             ` Gerd Möllmann
2023-08-15 14:20               ` Emanuel Berg
2023-08-15  6:26             ` [PATCH] Re: Bignum performance Po Lu
2023-08-15 14:33               ` Emanuel Berg
2023-08-15 17:07                 ` tomas
2023-08-15 22:46                   ` Emanuel Berg
2023-08-16  1:31                 ` Po Lu
2023-08-16  1:37                   ` Emanuel Berg
2023-08-16  3:17                     ` Po Lu
2023-08-16  4:44                       ` tomas
2023-08-16  5:18                       ` Gerd Möllmann
2023-08-16  5:35                         ` Emanuel Berg
2023-08-18  7:14                           ` Simon Leinen
2023-08-19 13:10                             ` Emanuel Berg
2023-08-20  5:07                               ` Ihor Radchenko
2023-08-20  6:20                                 ` Emanuel Berg
2023-08-28  5:32                                 ` Emanuel Berg
2023-09-03  0:48                                   ` Emanuel Berg
2023-09-03  8:50                                     ` Ihor Radchenko
2023-09-03  9:05                                       ` Emanuel Berg
2023-09-03 10:30                                         ` Elisp native-comp vs. SBCL for inclist-type-hints benchmark (was: [PATCH] Re: Bignum performance) Ihor Radchenko
2023-09-04  1:03                                           ` Emanuel Berg
2023-09-03  1:57                                   ` [PATCH] Re: Bignum performance Emanuel Berg
2023-09-04  4:13                             ` Emanuel Berg
2023-08-16  5:41                         ` Gerd Möllmann
2023-08-16  6:42                         ` Po Lu
2023-08-16  8:05                           ` Gerd Möllmann
  -- strict thread matches above, loose matches on Subject: below --
2023-08-09  9:46 Shrinking the C core Eric S. Raymond
2023-08-09 12:34 ` Po Lu
2023-08-09 15:51   ` Eric S. Raymond
2023-08-09 23:56     ` Po Lu
2023-08-10  1:19       ` Eric S. Raymond
2023-08-10  7:44         ` Eli Zaretskii
2023-08-10 21:54           ` Emanuel Berg
2023-08-11 10:27             ` Bignum performance (was: Shrinking the C core) Ihor Radchenko
2023-08-11 12:10               ` Emanuel Berg
2023-08-11 12:32                 ` Ihor Radchenko
2023-08-11 12:38                   ` Emanuel Berg
2023-08-11 14:07                     ` [PATCH] " Ihor Radchenko
2023-08-11 18:06                       ` Emanuel Berg
2023-08-11 19:41                         ` Ihor Radchenko
2023-08-11 19:50                           ` Emanuel Berg
2023-08-12  8:24                             ` Ihor Radchenko
2023-08-12 16:03                               ` Emanuel Berg
2023-08-13  9:09                                 ` Ihor Radchenko
2023-08-13  9:49                                   ` Emanuel Berg
2023-08-13 10:21                                     ` Ihor Radchenko
2023-08-14  2:20                                       ` Emanuel Berg
2023-08-14  2:42                                         ` [PATCH] Re: Bignum performance Po Lu
2023-08-14  4:16                                           ` Emanuel Berg
2023-08-14  7:15                                           ` Ihor Radchenko
2023-08-14  7:50                                             ` Po Lu
2023-08-14  9:28                                               ` Ihor Radchenko
2023-08-15 14:28                                             ` Emanuel Berg

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.