unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Andrew Cohen <acohen@ust.hk>
To: emacs-devel@gnu.org
Subject: Re: sorting in C
Date: Fri, 04 Mar 2022 08:13:33 +0800	[thread overview]
Message-ID: <8735jy7exe.fsf@ust.hk> (raw)
In-Reply-To: 87ilt0fv5o.fsf@ust.hk

So I think I am nearing the end of the sorting saga. Porting the cpython
code for TIMSORT was easier than I expected, and with patient help from
Eli and especially Mattias E. (who explained to me exactly how to manage
the dynamic storage allocation) I have what should be close to the
finished form.

Here are some (final?) benchmarks. The three columns are the time (in
microseconds) for sorting lists of length 10K using the current list
sort, the current vector sort (by first converting the list to a vector)
and the new timsort. (Note that there is very little overhead for the
conversion so these just reflect the sorting algorithms and their
implementation). I have used 10 different lists that are often used for
comparing sorting algorithms: random, reverse sorted, sorted, sorted
with 3 random binary swaps, sorted with 10 random elements at the end,
sorted with a random 1% of the elements replaced with random numbers,
constant, evil (that is (logxor i 1), which swaps every pair), sorted
blocks of 100, sorted blocks of 10. The final column is the percentage
improvement over the current list sort.

|                                     | oldlist | oldvec |  tim |       |
|-------------------------------------+---------+--------+------+------|
| (make-random-list 10000)            |    2842 |   2146 | 2233 |  27% |
| (nreverse (make-sorted-list 10000)) |    1431 |   1004 |  174 | 722% |
| (make-sorted-list 10000)            |    1333 |    924 |  170 | 684% |
| (make-swapped-list 10000 3)         |    1512 |   1055 |  179 | 745% |
| (make-plus-list 10000)              |    1346 |    915 |  174 | 674% |
| (make-onepercent-list 10000)        |    1792 |   1308 |  274 | 554% |
| (make-constant-list 10000)          |    1328 |    917 |  169 | 686% |
| (make-evil-list 10000)              |    1398 |    969 |  609 | 130% |
| (make-block-list 10000 100)         |    2244 |   1651 | 1304 |  72% |
| (make-block-list 10000 10)          |    2641 |   1990 | 2034 |  30% |

As you can see the worst case is purely random for which the new
algorithm is still more than 25% faster than the current one (albeit 4%
slower than for vectors in this case), and short blocks is a close
second. Everything else is clearly much faster, with almost everything
else being factors of 6 to 8 times faster.

(I have been running with timsort as a full replacement for sorting for
a few days now, and while I can't say it has transformed my life, it is
certainly a noticeable improvement in some circumstances).

I'll post again in awhile with final questions and some suggestions for
pushing it to git. Please let me know if I should post the code here
(its about 1K lines including lots of comments.)

-- 
Andrew Cohen




  reply	other threads:[~2022-03-04  0:13 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22  2:52 sorting in C Andrew Cohen
2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 12:54   ` Andrew Cohen
2022-02-22 13:11     ` Eli Zaretskii
2022-02-23  4:14   ` Andrew Cohen
2022-02-23 12:34     ` Eli Zaretskii
2022-02-23 12:53       ` Andrew Cohen
2022-02-23 13:14         ` Eli Zaretskii
2022-02-23 13:52           ` Andrew Cohen
2022-02-23 14:06           ` Andrew Cohen
2022-02-23 14:18             ` Eli Zaretskii
2022-02-26 23:54               ` Andrew Cohen
2022-02-27  2:27                 ` Andrew Cohen
2022-02-27  7:28                   ` Eli Zaretskii
2022-02-27  9:11                     ` Andrew Cohen
2022-02-27  9:29                       ` Eli Zaretskii
2022-02-27 10:42                         ` Andrew Cohen
2022-03-04  0:13                           ` Andrew Cohen [this message]
2022-03-04  7:05                             ` Eli Zaretskii
2022-02-23 13:19         ` Yuri Khan
2022-02-23 14:12           ` Andrew Cohen
2022-02-22 13:12 ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735jy7exe.fsf@ust.hk \
    --to=acohen@ust.hk \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).