* Re: master fe042e9: Speed up (+ 2 2) by a factor of 10
2018-09-04 14:43 ` master fe042e9: Speed up (+ 2 2) by a factor of 10 Pip Cet
@ 2018-09-04 17:28 ` Paul Eggert
0 siblings, 0 replies; 2+ messages in thread
From: Paul Eggert @ 2018-09-04 17:28 UTC (permalink / raw)
To: Pip Cet, emacs-devel; +Cc: emacs-diffs
[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]
Pip Cet wrote:
>> - num += mpz_mod_ui (z, XBIGNUM (n)->value, cycle_length);
>> + num += mpz_mod_ui (mpz[0], XBIGNUM (n)->value, cycle_length);
>
> I think it would be best to use mpz_div_ui here (despite its name, it
> returns the remainder of the division specified by its arguments).
I assume you meant mpz_tdiv_ui. Yes, thanks, that should be a bit faster.
Likewise for mpz_tdiv_r instead of mpz_mod, for --with-wide-int platforms. I
installed the attached.
> Are you sure it wouldn't be possible to get the performance gains your
> patch gives us without introducing global temporaries, such as by
> using mpz_swap efficiently? Is this worth investigating, maybe, or do
> you prefer global temporaries for another reason?
I'm no fan of these temps, but the GMP mpz API does push hard in the direction
of preinitialized temps if we want efficiency, as the overhead for small
operands is the first thing listed in
<https://gmplib.org/manual/Efficiency.html>. And in the context of Emacs,
doesn't mpz_swap rely on having preinitialized temps? How would we use mpz_swap
efficiently without having preinitialized temps? Unfortunately in C I don't see
how to have preinitialized temps without making them "global" in some sense; we
could easily make them Emacs-thread-local if necessary, but they'd still be
"global" to the thread.
I mainly introduced global temporaries to fix memory leaks when a bignum
calculation overflows past integer-width or when memory is exhausted. It would
also be possible to avoid these leaks by putting a record_unwind_protect of some
sort around every bignum calculation, but that would slow things down significantly.
As a side effect, somewhat to my surprise, I found that global temps made the
code simpler and easier to understand. I still don't like the "global" part of
them, though, and perhaps with further hacking we can find some way to make them
local while keeping the code simple and efficient.
[-- Attachment #2: 0001-Tweak-nthcdr-for-bignum-efficiency.patch --]
[-- Type: text/x-patch, Size: 1133 bytes --]
From 628f6a2c7a9fe476b7e71efed3a8f8784a00cc54 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 4 Sep 2018 10:24:51 -0700
Subject: [PATCH] Tweak nthcdr for bignum efficiency
* src/fns.c (Fnthcdr): Use mpz_tdiv_ui and mpz_tdiv_r
instead of mpz_mod_ui and mpz_mod, as they are more efficient.
Suggested by Pip Cet in:
https://lists.gnu.org/r/emacs-devel/2018-09/msg00073.html
---
src/fns.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/fns.c b/src/fns.c
index 8b25492eae..5a98f14881 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -1470,11 +1470,11 @@ DEFUN ("nthcdr", Fnthcdr, Snthcdr, 2, 2, 0,
CYCLE_LENGTH. */
/* Add N mod CYCLE_LENGTH to NUM. */
if (cycle_length <= ULONG_MAX)
- num += mpz_mod_ui (mpz[0], XBIGNUM (n)->value, cycle_length);
+ num += mpz_tdiv_ui (XBIGNUM (n)->value, cycle_length);
else
{
mpz_set_intmax (mpz[0], cycle_length);
- mpz_mod (mpz[0], XBIGNUM (n)->value, mpz[0]);
+ mpz_tdiv_r (mpz[0], XBIGNUM (n)->value, mpz[0]);
intptr_t iz;
mpz_export (&iz, NULL, -1, sizeof iz, 0, 0, mpz[0]);
num += iz;
--
2.17.1
^ permalink raw reply related [flat|nested] 2+ messages in thread