unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Andrew Cohen <acohen@ust.hk>
To: emacs-devel@gnu.org
Subject: Re: sorting in C
Date: Wed, 23 Feb 2022 12:14:52 +0800	[thread overview]
Message-ID: <8735kakymb.fsf@ust.hk> (raw)
In-Reply-To: 83tucrt75y.fsf@gnu.org

Sorry for the long post. 

I wanted to give some benchmarking update on the patch I posted to the
list earlier. These are far from comprehensive but I hope are a useful
starting point.

Background: these benchmarks compare the current ("oldsort") with the
modified ("newsort"). The modified is described in points 1 and 2 from
my earlier email: replacing the current list sort (a recursive merge
sort) with an iterative in-place insertion sort for short (n < 64)
lists; and converting to a vector, passing off to the current vector
sorting routine, and converting back to a list for longer lists.

To generate them I compile a version of emacs that has BOTH routines,
run =emacs -Q=, and use =benchmark-call= to time the results over
multiple repititions. I compute the time as =(total time - GC time)=
averaged over the repetitions (the resuls I am reporting have no GC, so
this is just the average time). If I increase the repetitions I get GC,
but don't see much difference in how often GC happens.

There are really two cases: length < 64 which uses the insertion sort;
and longer lengths which uses the vector sorting. I compare with similar
sized lists (63 to call the insertion sort, and 65 to call the vector
sort.) I use 4 lists for each case: sorted (s63, s65), reverse sorted
(r63, r65), random (rand63, rand65), and "worst-case" (worst63,worst65:
a sawtooth, which maximizes the number of comparisons for insertion
sort), all with 1000 repetitions. Finally I try many random list with
100K elements (only one is included in the table---the others are all
similar). These are in milliseconds.

|            |   old |   new |
|------------+-------+-------|
| s63        | 1.485 |  .376 |
| s65        | 1.211 |  .990 |
| r63        | 1.323 |  .609 |
| r65        | 1.422 | 1.091 |
| rand63     | 1.236 | 2.581 |
| rand65     | 1.627 | 1.238 |
| worst63    | 1.109 | 6.019 |
| worst65    | 1.359 | 1.067 |
| random100K |   160 |   101 |


Observations: It all looks as expected. Its compatible with everything
I have seen in similar comparisons you can find in the ether.

Insertion sort: great for ordered lists (factor of 2 to 4 better);
poor for random data (factor of 2 worse); worst case is 5.4 times
worse (theoretical it should be about 32/6 = 5.333 :) ).

Vector sort: its typically 30% faster, but does even better for longer
lists.

Is the insertion sort for small lists worth it compared to the vector
sort? For partially ordered lists its between 2 and 4 times faster than
the vector sort. For random data its a factor of 2 slower. The worst
case is a factor of 6. I think its worth it (my prejudice is the data is
more likely to be partially ordered than not.)  Also the advantage of
the insertion sort shown here is really the worst case---its more
advantageous for even shorter lists where the overhead of converting to
and from the list becomes more important. We could lower the threshold
to a list size less than 64, but the TIMSORT testing suggests 64 is the
sweet spot.

And implementing TIMSORT will be even better---similar or better speedup
for the best cases, and only marginal degradation of the worst.  But I
think even this small change here is a win.

-- 
Andrew Cohen




  parent reply	other threads:[~2022-02-23  4:14 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-22  2:52 sorting in C Andrew Cohen
2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 12:54   ` Andrew Cohen
2022-02-22 13:11     ` Eli Zaretskii
2022-02-23  4:14   ` Andrew Cohen [this message]
2022-02-23 12:34     ` Eli Zaretskii
2022-02-23 12:53       ` Andrew Cohen
2022-02-23 13:14         ` Eli Zaretskii
2022-02-23 13:52           ` Andrew Cohen
2022-02-23 14:06           ` Andrew Cohen
2022-02-23 14:18             ` Eli Zaretskii
2022-02-26 23:54               ` Andrew Cohen
2022-02-27  2:27                 ` Andrew Cohen
2022-02-27  7:28                   ` Eli Zaretskii
2022-02-27  9:11                     ` Andrew Cohen
2022-02-27  9:29                       ` Eli Zaretskii
2022-02-27 10:42                         ` Andrew Cohen
2022-03-04  0:13                           ` Andrew Cohen
2022-03-04  7:05                             ` Eli Zaretskii
2022-02-23 13:19         ` Yuri Khan
2022-02-23 14:12           ` Andrew Cohen
2022-02-22 13:12 ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735kakymb.fsf@ust.hk \
    --to=acohen@ust.hk \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).