From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrew Cohen Newsgroups: gmane.emacs.devel Subject: Re: sorting in C Date: Sun, 27 Feb 2022 07:54:09 +0800 Message-ID: <87pmn9gp5q.fsf@ust.hk> References: <87ilt7bokp.fsf@ust.hk> <83tucrt75y.fsf@gnu.org> <8735kakymb.fsf@ust.hk> <835yp5u5h7.fsf@gnu.org> <87ee3thhh3.fsf@ust.hk> <83zgmhsp13.fsf@gnu.org> <8735k97k4b.fsf@ust.hk> <83wnhlsm3m.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12178"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) To: emacs-devel@gnu.org Cancel-Lock: sha1:yQAH/ASI/Er9KhKNTZS+/vf/xX4= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Feb 27 00:55:25 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nO6uK-0002za-3M for ged-emacs-devel@m.gmane-mx.org; Sun, 27 Feb 2022 00:55:24 +0100 Original-Received: from localhost ([::1]:37768 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nO6uI-0007it-Hi for ged-emacs-devel@m.gmane-mx.org; Sat, 26 Feb 2022 18:55:22 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:39010) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO6tM-00070f-5y for emacs-devel@gnu.org; Sat, 26 Feb 2022 18:54:24 -0500 Original-Received: from ciao.gmane.io ([116.202.254.214]:51448) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nO6tK-0001oS-Dd for emacs-devel@gnu.org; Sat, 26 Feb 2022 18:54:23 -0500 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1nO6tG-0001jR-0q for emacs-devel@gnu.org; Sun, 27 Feb 2022 00:54:18 +0100 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=ged-emacs-devel@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:286707 Archived-At: Turns out porting the cpython version of timsort was easier than I expected :) There remain a few things to do but mostly finished. Here are some (slightly) improved benchmarks (orig = original mergesort; vec = convert to vector and use existing vector sort; tim = convert to vector and use timsort). The numbers are microseconds. Key: all lists are length=1000; in the first two rows the lists are blocks of sorted sublists of length 100 and 10 respectively; next is a list with the ith element= (logxor i 1); then a constant list, random list, a sorted list with 3 random elements swapping position, reverse sorted list, and finally a sorted list. | | orig | vec | tim | |------------------------------------+------+-----+-----| | (make-block-list 1000 100) | 157 | 118 | 78 | | (make-block-list 1000 10) | 200 | 153 | 160 | | (make-evil-list 1000) | 117 | 82 | 89 | | (make-constant-list 1000) | 107 | 75 | 17 | | (make-random-list 1000) | 211 | 164 | 167 | | (make-swapped-list 1000 3) | 123 | 89 | 23 | | (nreverse (make-sorted-list 1000)) | 109 | 77 | 17 | | (make-sorted-list 1000) | 107 | 73 | 17 | You can see that its always significantly better than the existing list mergesort. Its comparable to the existing vector sort for the more random cases, and hugely better on the more sorted cases. It is exactly the cpython algorithm so no surprise that it tracks the benchmarks that others have performed on the cpython code. I have a few questions that I'll ask in a subsequent post. Best, Andy -- Andrew Cohen