From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrew Cohen Newsgroups: gmane.emacs.devel Subject: Re: sorting in C Date: Fri, 04 Mar 2022 08:13:33 +0800 Organization: Hong Kong University of Science and Technology Message-ID: <8735jy7exe.fsf@ust.hk> References: <87ilt7bokp.fsf@ust.hk> <83tucrt75y.fsf@gnu.org> <8735kakymb.fsf@ust.hk> <835yp5u5h7.fsf@gnu.org> <87ee3thhh3.fsf@ust.hk> <83zgmhsp13.fsf@gnu.org> <8735k97k4b.fsf@ust.hk> <83wnhlsm3m.fsf@gnu.org> <87pmn9gp5q.fsf@ust.hk> <87h78l9h7x.fsf@ust.hk> <83zgmcojj5.fsf@gnu.org> <87o82sfzc6.fsf@ust.hk> <83r17oodxi.fsf@gnu.org> <87ilt0fv5o.fsf@ust.hk> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="40591"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) To: emacs-devel@gnu.org Cancel-Lock: sha1:g7+Ld0KI7ibg1Y/rxt4ybp4Ey3E= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Mar 04 01:14:25 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nPvaS-000AP4-Um for ged-emacs-devel@m.gmane-mx.org; Fri, 04 Mar 2022 01:14:24 +0100 Original-Received: from localhost ([::1]:33840 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nPvaR-0001qP-Vm for ged-emacs-devel@m.gmane-mx.org; Thu, 03 Mar 2022 19:14:24 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:47578) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPvZq-0000ZI-4k for emacs-devel@gnu.org; Thu, 03 Mar 2022 19:13:46 -0500 Original-Received: from ciao.gmane.io ([116.202.254.214]:42504) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nPvZo-0006kI-5P for emacs-devel@gnu.org; Thu, 03 Mar 2022 19:13:45 -0500 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1nPvZm-0009eh-D7 for emacs-devel@gnu.org; Fri, 04 Mar 2022 01:13:42 +0100 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:286797 Archived-At: So I think I am nearing the end of the sorting saga. Porting the cpython code for TIMSORT was easier than I expected, and with patient help from Eli and especially Mattias E. (who explained to me exactly how to manage the dynamic storage allocation) I have what should be close to the finished form. Here are some (final?) benchmarks. The three columns are the time (in microseconds) for sorting lists of length 10K using the current list sort, the current vector sort (by first converting the list to a vector) and the new timsort. (Note that there is very little overhead for the conversion so these just reflect the sorting algorithms and their implementation). I have used 10 different lists that are often used for comparing sorting algorithms: random, reverse sorted, sorted, sorted with 3 random binary swaps, sorted with 10 random elements at the end, sorted with a random 1% of the elements replaced with random numbers, constant, evil (that is (logxor i 1), which swaps every pair), sorted blocks of 100, sorted blocks of 10. The final column is the percentage improvement over the current list sort. | | oldlist | oldvec | tim | | |-------------------------------------+---------+--------+------+------| | (make-random-list 10000) | 2842 | 2146 | 2233 | 27% | | (nreverse (make-sorted-list 10000)) | 1431 | 1004 | 174 | 722% | | (make-sorted-list 10000) | 1333 | 924 | 170 | 684% | | (make-swapped-list 10000 3) | 1512 | 1055 | 179 | 745% | | (make-plus-list 10000) | 1346 | 915 | 174 | 674% | | (make-onepercent-list 10000) | 1792 | 1308 | 274 | 554% | | (make-constant-list 10000) | 1328 | 917 | 169 | 686% | | (make-evil-list 10000) | 1398 | 969 | 609 | 130% | | (make-block-list 10000 100) | 2244 | 1651 | 1304 | 72% | | (make-block-list 10000 10) | 2641 | 1990 | 2034 | 30% | As you can see the worst case is purely random for which the new algorithm is still more than 25% faster than the current one (albeit 4% slower than for vectors in this case), and short blocks is a close second. Everything else is clearly much faster, with almost everything else being factors of 6 to 8 times faster. (I have been running with timsort as a full replacement for sorting for a few days now, and while I can't say it has transformed my life, it is certainly a noticeable improvement in some circumstances). I'll post again in awhile with final questions and some suggestions for pushing it to git. Please let me know if I should post the code here (its about 1K lines including lots of comments.) -- Andrew Cohen