From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Emanuel Berg Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] Re: Bignum performance (was: Shrinking the C core) Date: Fri, 11 Aug 2023 20:06:50 +0200 Message-ID: <87v8dlihc5.fsf@dataswamp.org> References: <20230809094655.793FC18A4654@snark.thyrsus.com> <87il9owg0f.fsf@yahoo.com> <83fs4rjq9j.fsf@gnu.org> <87jzu2tvfc.fsf@dataswamp.org> <87y1ih3mc1.fsf@localhost> <87h6p5kcek.fsf@dataswamp.org> <87msyxoj2t.fsf@localhost> <875y5lkb4b.fsf@dataswamp.org> <87bkfdsmde.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38329"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) To: emacs-devel@gnu.org Cancel-Lock: sha1:vYABWEtlygNa/oaFscYXfc8nuFk= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 11 20:41:49 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qUX52-0009ni-Bn for ged-emacs-devel@m.gmane-mx.org; Fri, 11 Aug 2023 20:41:48 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qUX4D-0000mu-6n; Fri, 11 Aug 2023 14:40:57 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qUWXO-0003y2-MT for emacs-devel@gnu.org; Fri, 11 Aug 2023 14:07:02 -0400 Original-Received: from ciao.gmane.io ([116.202.254.214]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qUWXM-00009z-Se for emacs-devel@gnu.org; Fri, 11 Aug 2023 14:07:02 -0400 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1qUWXL-00068w-5j for emacs-devel@gnu.org; Fri, 11 Aug 2023 20:06:59 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Fri, 11 Aug 2023 14:40:54 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:308588 Archived-At: Ihor Radchenko wrote: >>> Maybe we could somehow re-use the already allocated bignum >>> objects, similar to what is done for cons cells (see >>> src/alloc.c:Fcons). >> >> Sounds reasonable :) > > And... is has been already done, actually. > allocate_vectorlike calls allocate_vector_from_block, which > re-uses pre-allocated objects. > > And looking into the call graph, this exact branch calling > allocate_vector_from_block is indeed called for the bignums [...] Are we talking a list of Emacs C functions executing with the corresponding times they have been in execution in a tree data structure? :O E.g. where do we find allocate_vectorlike ? > With the patch: > > perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln > 1.968 s > > 33.17% emacs emacs [.] process_mark_stack > 5.51% emacs libgmp.so.10.5.0 [.] __gmpz_sizeinbase > 5.05% emacs emacs [.] mark_char_table > 4.88% emacs emacs [.] pdumper_marked_p_impl > 3.30% emacs emacs [.] pdumper_set_marked_impl > ... > 2.52% emacs emacs [.] allocate_vectorlike > > allocate_vectorlike clearly takes a lot less time by not trying to loop > over all the ~500 empty elements of vector_free_lists. > > We can further get rid of the GC by temporarily disabling it (just for > demonstration): > > (let ((beg (float-time))) > (setq gc-cons-threshold most-positive-fixnum) > (fib 10000 1000) > (message "%.3f s" (- (float-time) beg)) ) > > perf record ~/Git/emacs/src/emacs -Q -batch -l /tmp/fib.eln > 0.739 s > > 17.11% emacs libgmp.so.10.5.0 [.] __gmpz_sizeinbase > 7.35% emacs libgmp.so.10.5.0 [.] __gmpz_add > 6.51% emacs emacs [.] arith_driver > 6.03% emacs libc.so.6 [.] malloc > 5.57% emacs emacs [.] allocate_vectorlike > 5.20% emacs [unknown] [k] 0xffffffffaae01857 > 4.16% emacs libgmp.so.10.5.0 [.] __gmpn_add_n_coreisbr > 3.72% emacs emacs [.] check_number_coerce_marker > 3.35% emacs fib.eln [.] F666962_fib_0 > 3.29% emacs emacs [.] allocate_pseudovector > 2.30% emacs emacs [.] Flss > > Now, the actual bignum arithmetics (lisp/gmp.c) takes most of the CPU time. > > I am not sure what differs between Elisp gmp bindings and analogous SBCL > binding so that SBCL is so much faster. > > diff --git a/src/alloc.c b/src/alloc.c > index 17ca5c725d0..62e96b4c9de 100644 > --- a/src/alloc.c > +++ b/src/alloc.c > @@ -3140,6 +3140,7 @@ large_vector_vec (struct large_vector *p) > vectors of the same NBYTES size, so NTH == VINDEX (NBYTES). */ > > static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX]; > +static int vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX; > > /* Singly-linked list of large vectors. */ > > @@ -3176,6 +3177,8 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t > nbytes) > set_next_vector (v, vector_free_lists[vindex]); > ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size); > vector_free_lists[vindex] = v; > + if ( vindex < vector_free_lists_min_idx ) > + vector_free_lists_min_idx = vindex; > } > > /* Get a new vector block. */ > @@ -3230,8 +3233,8 @@ allocate_vector_from_block (ptrdiff_t nbytes) > /* Next, check free lists containing larger vectors. Since > we will split the result, we should have remaining space > large enough to use for one-slot vector at least. */ > - for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN); > - index < VECTOR_MAX_FREE_LIST_INDEX; index++) > + for (index = max ( VINDEX (nbytes + VBLOCK_BYTES_MIN), > vector_free_lists_min_idx ); > + index < VECTOR_MAX_FREE_LIST_INDEX; index++, > vector_free_lists_min_idx++) > if (vector_free_lists[index]) > { > /* This vector is larger than requested. */ > @@ -3413,6 +3416,7 @@ sweep_vectors (void) > gcstat.total_vectors = 0; > gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0; > memset (vector_free_lists, 0, sizeof (vector_free_lists)); > + vector_free_lists_min_idx = VECTOR_MAX_FREE_LIST_INDEX; > > /* Looking through vector blocks. */ Amazing! :O See if you can do my original test, which was 1-3 Elisp, byte-compiled Elisp, and natively compiled Elisp, and the Common Lisp execution (on your computer), if you'd like. Actually it is a bit of a bummer to the community since Emacs is like THE portal into Lisp. We should have the best Lisp in the business, and I don't see why not? Emacs + SBCL + CL + Elisp anyone? I.e. real CL not the cl- which is actually in Elisp. Not that there is anything wrong with that! On the contrary ;) -- underground experts united https://dataswamp.org/~incal