* sorting in C
@ 2022-02-22 2:52 Andrew Cohen
2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 13:12 ` Lars Ingebrigtsen
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Cohen @ 2022-02-22 2:52 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]
I have been trying to speed up some things in gnus, with some
success. But it inspired me to take a look at the sorting algorithm
in C. I have done some simple tests and have made some (simple)
improvements which I hope to push. But I am a novice at the C parts of
emacs (and C in general) so am sharing here.
Currently, lists and vectors have different code paths, but both use a
recursive merge sort (I haven't actually looked at the vector code, I'm
just guessing from the function names). There are some minor
inefficiencies in the list code (it computes the length of the list on
each recursion, for example, rather than passing it as an
argument---changing this alone gives a 10% speedup) which I played
with. I also tried an iterative rather than recursive merge (no real
improvement). But in the end I found that the simplest large improvement
was to:
1. Use an iterative insertion sort for small lists (length < 64). I
wrote one that is in-place and does minimal work when the list is
sorted or reverse sorted, which is the case for much of the sorting
in gnus.
2. Convert longer lists to a vector, hand it off to the vector code, and
convert back to a list. This gives a 35% improvement in all cases I
was able to test.
These are the parts I propose pushing (patch attached). But then I:
3. Replaced the current vector sorting algorithm with TIMSORT. This did
exactly what TIMSORT is supposed to do: on random data it is
marginally slower than the current mergesort. But on partially
ordered data it is 5 times faster.
I plan to continue playing with changes to the vector algorithm to see
what other changes might be worthwhile.
Best,
Andy
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sorting improvements --]
[-- Type: text/x-patch, Size: 2699 bytes --]
diff --git a/src/fns.c b/src/fns.c
index ea8428fd98..913c3f3489 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -2166,29 +2166,73 @@ DEFUN ("reverse", Freverse, Sreverse, 1, 1, 0,
return new;
}
+
+/* Using PRED to compare, return whether A and B are in order.
+ Compare stably when A appeared before B in the input. */
+static bool
+inorder (Lisp_Object pred, Lisp_Object a, Lisp_Object b)
+{
+ return NILP (call2 (pred, b, a));
+}
+
/* Sort LIST using PREDICATE, preserving original order of elements
considered as equal. */
+/* This converts the list to a vector, sorts the vector, and converts
+ back to a list. In simple benchmarks this is 35% faster. It
+ allocates a new vector so it uses space of order the length */
+
static Lisp_Object
sort_list (Lisp_Object list, Lisp_Object predicate)
{
ptrdiff_t length = list_length (list);
- if (length < 2)
+ if (length < 2) {
return list;
+ } else if (length < 64) {
+ /* use an iterative insertion sort for short lists */
+ Lisp_Object old = Qnil, new = list, before, after;
+ while (CONSP (new)) {
+ Lisp_Object next = Fcdr (new);
+ if (NILP (old) || inorder (predicate, Fcar (old), Fcar (new))) {
+ old = new;
+ new = next;
+ continue;
+ }
+ XSETCDR(old, next);
+ before = Qnil, after = list;
+ while (CONSP (after)) {
+ if (inorder (predicate, Fcar (after), Fcar (new))) {
+ before = after; after = XCDR (after); continue;
+ } else if (NILP (before)) {
+ XSETCDR(new, list); list = new; break;}
+ XSETCDR(before, new);
+ XSETCDR(new, after);
+ break;
+ }
+ new = next;
+ }
+ return list;
+ } else {
+ Lisp_Object result = make_nil_vector (length), tail = list;
+ int i;
+ /* convert list to vector */
+ for (i=0; i < length; i++) {
+ ASET (result, i, Fcar (tail));
+ tail = XCDR (tail);
+ }
- Lisp_Object tem = Fnthcdr (make_fixnum (length / 2 - 1), list);
- Lisp_Object back = Fcdr (tem);
- Fsetcdr (tem, Qnil);
-
- return merge (Fsort (list, predicate), Fsort (back, predicate), predicate);
-}
+ result = Fsort (result, predicate);
-/* Using PRED to compare, return whether A and B are in order.
- Compare stably when A appeared before B in the input. */
-static bool
-inorder (Lisp_Object pred, Lisp_Object a, Lisp_Object b)
-{
- return NILP (call2 (pred, b, a));
+ /* convert vector to list */
+ i = 0;
+ tail = list;
+ while (CONSP (tail)) {
+ XSETCAR (tail, AREF (result, i));
+ tail = XCDR (tail);
+ i++;
+ }
+ return list;
+ }
}
/* Using PRED to compare, merge from ALEN-length A and BLEN-length B
[-- Attachment #3: Type: text/plain, Size: 19 bytes --]
--
Andrew Cohen
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-22 2:52 sorting in C Andrew Cohen
@ 2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 12:54 ` Andrew Cohen
2022-02-23 4:14 ` Andrew Cohen
2022-02-22 13:12 ` Lars Ingebrigtsen
1 sibling, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-22 12:30 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Tue, 22 Feb 2022 10:52:06 +0800
>
> I have been trying to speed up some things in gnus, with some
> success. But it inspired me to take a look at the sorting algorithm
> in C. I have done some simple tests and have made some (simple)
> improvements which I hope to push. But I am a novice at the C parts of
> emacs (and C in general) so am sharing here.
>
> Currently, lists and vectors have different code paths, but both use a
> recursive merge sort (I haven't actually looked at the vector code, I'm
> just guessing from the function names). There are some minor
> inefficiencies in the list code (it computes the length of the list on
> each recursion, for example, rather than passing it as an
> argument---changing this alone gives a 10% speedup) which I played
> with. I also tried an iterative rather than recursive merge (no real
> improvement). But in the end I found that the simplest large improvement
> was to:
>
> 1. Use an iterative insertion sort for small lists (length < 64). I
> wrote one that is in-place and does minimal work when the list is
> sorted or reverse sorted, which is the case for much of the sorting
> in gnus.
> 2. Convert longer lists to a vector, hand it off to the vector code, and
> convert back to a list. This gives a 35% improvement in all cases I
> was able to test.
>
> These are the parts I propose pushing (patch attached). But then I:
>
> 3. Replaced the current vector sorting algorithm with TIMSORT. This did
> exactly what TIMSORT is supposed to do: on random data it is
> marginally slower than the current mergesort. But on partially
> ordered data it is 5 times faster.
>
> I plan to continue playing with changes to the vector algorithm to see
> what other changes might be worthwhile.
Thank you for working on this.
Could you please work on benchmarks for this functionality, and post
those benchmarks (and their code if relevant) together with the code
changes? AFAICT, our test suite only includes tests for very small
lists and vectors, which is not enough for considering these changes.
> - if (length < 2)
> + if (length < 2) {
> return list;
> + } else if (length < 64) {
> + /* use an iterative insertion sort for short lists */
> + Lisp_Object old = Qnil, new = list, before, after;
> + while (CONSP (new)) {
> + Lisp_Object next = Fcdr (new);
> + if (NILP (old) || inorder (predicate, Fcar (old), Fcar (new))) {
> + old = new;
> + new = next;
> + continue;
> + }
Stylistic comment: the above is not our style of using braces. Please
look around in the existing code to see what style we use, and if you
have questions, ask them here.
> + before = Qnil, after = list;
Another stylistic nit: please don't use "foo, bar" where not
necessary. (It basically is only necessary inside parenthesized
expressions.)
> + XSETCDR(new, list); list = new; break;}
^^
Please leave one space between the name of a function/macro and the
following open parenthesis.
> + XSETCDR(before, new);
> + XSETCDR(new, after);
Likewise.
> + Lisp_Object result = make_nil_vector (length), tail = list;
Why not make_uninit_vector? It's marginally faster, and I don't think
you really need to initialize the members to nil, do you?
> + int i;
> + /* convert list to vector */
> + for (i=0; i < length; i++) {
"i = 0", with spaces.
> + /* convert vector to list */
Text in comments should be complete sentences, and end in 2 spaces:
/* Convert vector back to list. */
> + i = 0;
> + tail = list;
> + while (CONSP (tail)) {
> + XSETCAR (tail, AREF (result, i));
> + tail = XCDR (tail);
> + i++;
> + }
> + return list;
> + }
Btw, did you also compare the number of GC cycles in the two
implementations? If not, I suggest to include that, since excess GC
also slows down real-life Lisp programs.
Last, but not least: it seems like your copyright assignment was only
about changes to Gnus? If so, would you like to start your assignment
paperwork at this time, so that by the time you have the code ready,
we could accept it without limitations? If you agree, I will send you
the form to fill and the instructions to go with it.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-22 12:30 ` Eli Zaretskii
@ 2022-02-22 12:54 ` Andrew Cohen
2022-02-22 13:11 ` Eli Zaretskii
2022-02-23 4:14 ` Andrew Cohen
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-22 12:54 UTC (permalink / raw)
To: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
[...]
>> I plan to continue playing with changes to the vector algorithm
>> to see what other changes might be worthwhile.
EZ> Thank you for working on this.
EZ> Could you please work on benchmarks for this functionality, and
EZ> post those benchmarks (and their code if relevant) together with
EZ> the code changes? AFAICT, our test suite only includes tests
EZ> for very small lists and vectors, which is not enough for
EZ> considering these changes.
Yes. I have been doing only rudimentary benchmarks so far, but will work
on something more substantial.
>> - if (length < 2) + if (length < 2) { return list; + } else if
>> (length < 64) { + /* use an iterative insertion sort for short
>> lists */ + Lisp_Object old = Qnil, new = list, before, after; +
>> while (CONSP (new)) { + Lisp_Object next = Fcdr (new); + if (NILP
>> (old) || inorder (predicate, Fcar (old), Fcar (new))) { + old =
>> new; + new = next; + continue; + }
EZ> Stylistic comment: the above is not our style of using braces.
EZ> Please look around in the existing code to see what style we
EZ> use, and if you have questions, ask them here.
Sure.
>> + before = Qnil, after = list;
EZ> Another stylistic nit: please don't use "foo, bar" where not
EZ> necessary. (It basically is only necessary inside parenthesized
EZ> expressions.)
Sure.
>> + XSETCDR(new, list); list = new; break;}
EZ> ^^ Please leave one space between the name of
EZ> a function/macro and the following open parenthesis.
>> + XSETCDR(before, new); + XSETCDR(new, after);
EZ> Likewise.
Sure.
>> + Lisp_Object result = make_nil_vector (length), tail = list;
EZ> Why not make_uninit_vector? It's marginally faster, and I don't
EZ> think you really need to initialize the members to nil, do you?
Yes. (I only looked at the C code for the first time a few days ago :))
>> + int i; + /* convert list to vector */ + for (i=0; i < length;
>> i++) {
EZ> "i = 0", with spaces.
Yes.
>> + /* convert vector to list */
EZ> Text in comments should be complete sentences, and end in 2
EZ> spaces:
EZ> /* Convert vector back to list. */
All these comments will disappear (or be improved :))
>> + i = 0; + tail = list; + while (CONSP (tail)) { + XSETCAR (tail,
>> AREF (result, i)); + tail = XCDR (tail); + i++; + } + return
>> list; + }
EZ> Btw, did you also compare the number of GC cycles in the two
EZ> implementations? If not, I suggest to include that, since
EZ> excess GC also slows down real-life Lisp programs.
Again, only rudimentary, so not yet adequate. In playing with lists with
1 million elements I see about the same amount of GC as with the
existing algorithm. But I'm doing this in a not very effective way (I
run each sort many times with benchmark and see what fraction GCs;
mostly it doesn't).
I may need some assistance for this part. I'll put together something
and post it to get started.
EZ> Last, but not least: it seems like your copyright assignment was
EZ> only about changes to Gnus? If so, would you like to start your
EZ> assignment paperwork at this time, so that by the time you have
EZ> the code ready, we could accept it without limitations? If you
EZ> agree, I will send you the form to fill and the instructions to
EZ> go with it.
Sure, please do, and thanks in advance. Probably wise since I have made
some small changes in other parts already.
Best,
Andy
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-22 12:54 ` Andrew Cohen
@ 2022-02-22 13:11 ` Eli Zaretskii
0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-22 13:11 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Tue, 22 Feb 2022 20:54:24 +0800
>
> EZ> Last, but not least: it seems like your copyright assignment was
> EZ> only about changes to Gnus? If so, would you like to start your
> EZ> assignment paperwork at this time, so that by the time you have
> EZ> the code ready, we could accept it without limitations? If you
> EZ> agree, I will send you the form to fill and the instructions to
> EZ> go with it.
>
> Sure, please do, and thanks in advance. Probably wise since I have made
> some small changes in other parts already.
Thanks, form sent off-list.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-22 2:52 sorting in C Andrew Cohen
2022-02-22 12:30 ` Eli Zaretskii
@ 2022-02-22 13:12 ` Lars Ingebrigtsen
1 sibling, 0 replies; 22+ messages in thread
From: Lars Ingebrigtsen @ 2022-02-22 13:12 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
Andrew Cohen <acohen@ust.hk> writes:
> 3. Replaced the current vector sorting algorithm with TIMSORT. This did
> exactly what TIMSORT is supposed to do: on random data it is
> marginally slower than the current mergesort. But on partially
> ordered data it is 5 times faster.
That's impressive -- my guess is that we do typically sort partially
ordered data a lot.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 12:54 ` Andrew Cohen
@ 2022-02-23 4:14 ` Andrew Cohen
2022-02-23 12:34 ` Eli Zaretskii
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-23 4:14 UTC (permalink / raw)
To: emacs-devel
Sorry for the long post.
I wanted to give some benchmarking update on the patch I posted to the
list earlier. These are far from comprehensive but I hope are a useful
starting point.
Background: these benchmarks compare the current ("oldsort") with the
modified ("newsort"). The modified is described in points 1 and 2 from
my earlier email: replacing the current list sort (a recursive merge
sort) with an iterative in-place insertion sort for short (n < 64)
lists; and converting to a vector, passing off to the current vector
sorting routine, and converting back to a list for longer lists.
To generate them I compile a version of emacs that has BOTH routines,
run =emacs -Q=, and use =benchmark-call= to time the results over
multiple repititions. I compute the time as =(total time - GC time)=
averaged over the repetitions (the resuls I am reporting have no GC, so
this is just the average time). If I increase the repetitions I get GC,
but don't see much difference in how often GC happens.
There are really two cases: length < 64 which uses the insertion sort;
and longer lengths which uses the vector sorting. I compare with similar
sized lists (63 to call the insertion sort, and 65 to call the vector
sort.) I use 4 lists for each case: sorted (s63, s65), reverse sorted
(r63, r65), random (rand63, rand65), and "worst-case" (worst63,worst65:
a sawtooth, which maximizes the number of comparisons for insertion
sort), all with 1000 repetitions. Finally I try many random list with
100K elements (only one is included in the table---the others are all
similar). These are in milliseconds.
| | old | new |
|------------+-------+-------|
| s63 | 1.485 | .376 |
| s65 | 1.211 | .990 |
| r63 | 1.323 | .609 |
| r65 | 1.422 | 1.091 |
| rand63 | 1.236 | 2.581 |
| rand65 | 1.627 | 1.238 |
| worst63 | 1.109 | 6.019 |
| worst65 | 1.359 | 1.067 |
| random100K | 160 | 101 |
Observations: It all looks as expected. Its compatible with everything
I have seen in similar comparisons you can find in the ether.
Insertion sort: great for ordered lists (factor of 2 to 4 better);
poor for random data (factor of 2 worse); worst case is 5.4 times
worse (theoretical it should be about 32/6 = 5.333 :) ).
Vector sort: its typically 30% faster, but does even better for longer
lists.
Is the insertion sort for small lists worth it compared to the vector
sort? For partially ordered lists its between 2 and 4 times faster than
the vector sort. For random data its a factor of 2 slower. The worst
case is a factor of 6. I think its worth it (my prejudice is the data is
more likely to be partially ordered than not.) Also the advantage of
the insertion sort shown here is really the worst case---its more
advantageous for even shorter lists where the overhead of converting to
and from the list becomes more important. We could lower the threshold
to a list size less than 64, but the TIMSORT testing suggests 64 is the
sweet spot.
And implementing TIMSORT will be even better---similar or better speedup
for the best cases, and only marginal degradation of the worst. But I
think even this small change here is a win.
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 4:14 ` Andrew Cohen
@ 2022-02-23 12:34 ` Eli Zaretskii
2022-02-23 12:53 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-23 12:34 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Wed, 23 Feb 2022 12:14:52 +0800
>
> | | old | new |
> |------------+-------+-------|
> | s63 | 1.485 | .376 |
> | s65 | 1.211 | .990 |
> | r63 | 1.323 | .609 |
> | r65 | 1.422 | 1.091 |
> | rand63 | 1.236 | 2.581 |
> | rand65 | 1.627 | 1.238 |
> | worst63 | 1.109 | 6.019 |
> | worst65 | 1.359 | 1.067 |
> | random100K | 160 | 101 |
>
>
> Observations: It all looks as expected. Its compatible with everything
> I have seen in similar comparisons you can find in the ether.
>
> Insertion sort: great for ordered lists (factor of 2 to 4 better);
> poor for random data (factor of 2 worse); worst case is 5.4 times
> worse (theoretical it should be about 32/6 = 5.333 :) ).
>
> Vector sort: its typically 30% faster, but does even better for longer
> lists.
>
> Is the insertion sort for small lists worth it compared to the vector
> sort? For partially ordered lists its between 2 and 4 times faster than
> the vector sort. For random data its a factor of 2 slower. The worst
> case is a factor of 6. I think its worth it (my prejudice is the data is
> more likely to be partially ordered than not.) Also the advantage of
> the insertion sort shown here is really the worst case---its more
> advantageous for even shorter lists where the overhead of converting to
> and from the list becomes more important.
The 5-fold slowdown is a pain, IMO. Can we do better in the worst
case?
> And implementing TIMSORT will be even better---similar or better speedup
> for the best cases, and only marginal degradation of the worst. But I
> think even this small change here is a win.
Can we see the numbers? This sounds like a net win to me.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 12:34 ` Eli Zaretskii
@ 2022-02-23 12:53 ` Andrew Cohen
2022-02-23 13:14 ` Eli Zaretskii
2022-02-23 13:19 ` Yuri Khan
0 siblings, 2 replies; 22+ messages in thread
From: Andrew Cohen @ 2022-02-23 12:53 UTC (permalink / raw)
To: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
[...]
EZ> The 5-fold slowdown is a pain, IMO. Can we do better in the
EZ> worst case?
Maybe, but probably not too much. That is the point of these
hybrids---they improve on some cases and do worse on others. Its a win
only if the real-world data favors the improved case.
I played around a bit more and lowering the length threshold to 40
might be a better compromise. I am working on a more thorough set of
tests first.
We can always dispense with the insertion sort bit (or set a very low
threshold) and just keep the vector routine. This is a consistent 30%
win across the board.
>> And implementing TIMSORT will be even better---similar or better
>> speedup for the best cases, and only marginal degradation of the
>> worst. But I think even this small change here is a win.
EZ> Can we see the numbers? This sounds like a net win to me.
Err, no? I have it all working but I'm using a C version of TIMSORT from
the web. I don't think the license is acceptable so it probably needs to
be written from scratch. Shouldn't be that difficult (the algorithm
itself is well-documented) but I'm not sure how much time I have to
finish it.
Best,
Andy
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 12:53 ` Andrew Cohen
@ 2022-02-23 13:14 ` Eli Zaretskii
2022-02-23 13:52 ` Andrew Cohen
2022-02-23 14:06 ` Andrew Cohen
2022-02-23 13:19 ` Yuri Khan
1 sibling, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-23 13:14 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Wed, 23 Feb 2022 20:53:28 +0800
>
> >>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
>
> [...]
>
> EZ> The 5-fold slowdown is a pain, IMO. Can we do better in the
> EZ> worst case?
>
> Maybe, but probably not too much. That is the point of these
> hybrids---they improve on some cases and do worse on others. Its a win
> only if the real-world data favors the improved case.
>
> I played around a bit more and lowering the length threshold to 40
> might be a better compromise. I am working on a more thorough set of
> tests first.
OK, thanks. Let's see the improved numbers, and can you also show
absolute times? If they are short enough, the slowdown might not be
significant, since if the data structure is larger, the slow method
will not be used, right? Or did I misunderstand?
> >> And implementing TIMSORT will be even better---similar or better
> >> speedup for the best cases, and only marginal degradation of the
> >> worst. But I think even this small change here is a win.
>
> EZ> Can we see the numbers? This sounds like a net win to me.
>
> Err, no? I have it all working but I'm using a C version of TIMSORT from
> the web. I don't think the license is acceptable so it probably needs to
> be written from scratch.
But for showing the performance, the license is not important, is it?
> Shouldn't be that difficult (the algorithm itself is
> well-documented) but I'm not sure how much time I have to finish it.
From my POV, take all the time you need. Emacs 29 is far from a
release.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 12:53 ` Andrew Cohen
2022-02-23 13:14 ` Eli Zaretskii
@ 2022-02-23 13:19 ` Yuri Khan
2022-02-23 14:12 ` Andrew Cohen
1 sibling, 1 reply; 22+ messages in thread
From: Yuri Khan @ 2022-02-23 13:19 UTC (permalink / raw)
To: Andrew Cohen; +Cc: Emacs developers
On Wed, 23 Feb 2022 at 19:58, Andrew Cohen <acohen@ust.hk> wrote:
> Err, no? I have it all working but I'm using a C version of TIMSORT from
> the web. I don't think the license is acceptable so it probably needs to
> be written from scratch. Shouldn't be that difficult (the algorithm
> itself is well-documented) but I'm not sure how much time I have to
> finish it.
Python’s license has been GPL-compatible for a long time[1], and
Python has an implementation of timsort — in fact, Wikipedia says
timsort was developed *for* Python[2]. Could probably crib that.
[1]: https://docs.python.org/3/license.html
[2]: https://en.wikipedia.org/wiki/Timsort
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 13:14 ` Eli Zaretskii
@ 2022-02-23 13:52 ` Andrew Cohen
2022-02-23 14:06 ` Andrew Cohen
1 sibling, 0 replies; 22+ messages in thread
From: Andrew Cohen @ 2022-02-23 13:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
[...]
EZ> The 5-fold slowdown is a pain, IMO. Can we do better in the
EZ> worst case?
>>
>> Maybe, but probably not too much. That is the point of these
>> hybrids---they improve on some cases and do worse on others. Its
>> a win only if the real-world data favors the improved case.
>>
>> I played around a bit more and lowering the length threshold to
>> 40 might be a better compromise. I am working on a more thorough
>> set of tests first.
EZ> OK, thanks. Let's see the improved numbers, and can you also
EZ> show absolute times? If they are short enough, the slowdown
EZ> might not be significant, since if the data structure is larger,
EZ> the slow method will not be used, right? Or did I
EZ> misunderstand?
Yes, I will show absolute numbers in the future (for reference, it is
microseconds). And you did not misunderstand---the "slow" method (which
is much faster for sorted lists :)) isn't used for longer cases.
[...]
EZ> But for showing the performance, the license is not important,
EZ> is it?
Right, that was sort of a joke---just buying time until I can produce
some nice looking data. I'm trying to automate their production a bit
more.
>> Shouldn't be that difficult (the algorithm itself is
>> well-documented) but I'm not sure how much time I have to finish
>> it.
EZ> From my POV, take all the time you need. Emacs 29 is far from a
EZ> release.
Great, thanks.
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 13:14 ` Eli Zaretskii
2022-02-23 13:52 ` Andrew Cohen
@ 2022-02-23 14:06 ` Andrew Cohen
2022-02-23 14:18 ` Eli Zaretskii
1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-23 14:06 UTC (permalink / raw)
To: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
[...]
EZ> But for showing the performance, the license is not important,
EZ> is it?
OK, I couldn't resist :) Here are some simple numbers for timsort. These
are in microseconds, and there is no special case of small lists. I
have lots more data but this should be enough to whet your appetite.
Note that "worst" isn't worst case for timsort (it was only worst case
for insertion sort, which isn't used in these benchmarks). For longer
random lists its comparable to the existing /vector/ sort routine, which
as I have said is 30% faster than the recursive merge used for lists.
For partially sorted lists its very, very fast.
| | oldsort | timsort |
|----------+-----------+-----------|
| s65 | 6.001 | 1.808 |
| r65 | 6.626 | 2.982 |
| rand65 | 11.062 | 10.371 |
| worst65 | 8.524 | 6.751 |
| rand100k | 33722.263 | 30913.107 |
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 13:19 ` Yuri Khan
@ 2022-02-23 14:12 ` Andrew Cohen
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Cohen @ 2022-02-23 14:12 UTC (permalink / raw)
To: emacs-devel
>>>>> "YK" == Yuri Khan <yuri.v.khan@gmail.com> writes:
YK> On Wed, 23 Feb 2022 at 19:58, Andrew Cohen <acohen@ust.hk> wrote:
>> Err, no? I have it all working but I'm using a C version of
>> TIMSORT from the web. I don't think the license is acceptable so
>> it probably needs to be written from scratch. Shouldn't be that
>> difficult (the algorithm itself is well-documented) but I'm not
>> sure how much time I have to finish it.
YK> Python’s license has been GPL-compatible for a long time[1], and
YK> Python has an implementation of timsort — in fact, Wikipedia
YK> says timsort was developed *for* Python[2]. Could probably crib
YK> that.
Thanks for the suggestion. I started with the python code (which is
where I learned about timsort in the first place :)). The description of
the algorithm is excellent, but the actual code is pretty specific to
the python case. But it is still a very useful guide.
I don't want to make a big deal of it---it shouldn't be too hard to
write an unencumbered implementation.
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 14:06 ` Andrew Cohen
@ 2022-02-23 14:18 ` Eli Zaretskii
2022-02-26 23:54 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-23 14:18 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Wed, 23 Feb 2022 22:06:28 +0800
>
> | | oldsort | timsort |
> |----------+-----------+-----------|
> | s65 | 6.001 | 1.808 |
> | r65 | 6.626 | 2.982 |
> | rand65 | 11.062 | 10.371 |
> | worst65 | 8.524 | 6.751 |
> | rand100k | 33722.263 | 30913.107 |
That looks very promising, thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-23 14:18 ` Eli Zaretskii
@ 2022-02-26 23:54 ` Andrew Cohen
2022-02-27 2:27 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-26 23:54 UTC (permalink / raw)
To: emacs-devel
Turns out porting the cpython version of timsort was easier than I
expected :) There remain a few things to do but mostly finished.
Here are some (slightly) improved benchmarks (orig = original mergesort;
vec = convert to vector and use existing vector sort; tim = convert to
vector and use timsort). The numbers are microseconds.
Key: all lists are length=1000; in the first two rows the lists are
blocks of sorted sublists of length 100 and 10 respectively; next is a
list with the ith element= (logxor i 1); then a constant list, random
list, a sorted list with 3 random elements swapping position, reverse
sorted list, and finally a sorted list.
| | orig | vec | tim |
|------------------------------------+------+-----+-----|
| (make-block-list 1000 100) | 157 | 118 | 78 |
| (make-block-list 1000 10) | 200 | 153 | 160 |
| (make-evil-list 1000) | 117 | 82 | 89 |
| (make-constant-list 1000) | 107 | 75 | 17 |
| (make-random-list 1000) | 211 | 164 | 167 |
| (make-swapped-list 1000 3) | 123 | 89 | 23 |
| (nreverse (make-sorted-list 1000)) | 109 | 77 | 17 |
| (make-sorted-list 1000) | 107 | 73 | 17 |
You can see that its always significantly better than the existing list
mergesort. Its comparable to the existing vector sort for the more
random cases, and hugely better on the more sorted cases. It is exactly
the cpython algorithm so no surprise that it tracks the benchmarks that
others have performed on the cpython code.
I have a few questions that I'll ask in a subsequent post.
Best,
Andy
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-26 23:54 ` Andrew Cohen
@ 2022-02-27 2:27 ` Andrew Cohen
2022-02-27 7:28 ` Eli Zaretskii
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-27 2:27 UTC (permalink / raw)
To: emacs-devel; +Cc: Mattias Engdegård
Mattias E. reminds me (in a private email):
>Since Timsort may allocate temporary scratch space it is important to
>make sure it's freed if the comparison predicate throws. A specbind
>may be needed for the clean-up, but I haven't looked at your timsort
>code -- perhaps you have already solved that problem.
Dealing with the tmp space is my one remaining question. I note that
when sorting a list of length L, the current (vector) sorting routine
requires space for a tmp array of length L/2. It uses SAFE_ALLOCA_LISP
(and SAFE_FREE) outside the sorting routine and passes a pointer to the
storage as an argument to =sort_vector_inplace=. This way memory
management is easy.
TIMSORT /also/ requires space for a tmp array of length L/2, but only in
the worst case (random lists). For partially sorted lists it can make do
with less. So it takes a dynamic approach: it allocates a small amount
of storage (enough for an array of length 256) which can handle all
short lists and longer partially sorted lists; and then allocates
additional storage on the fly as needed for other cases.
Right now my routine accepts a pointer to tmp space as an argument; if
this is null, it uses the dynamic allocation, and otherwise just uses
the pre-allocated storage.
Clearly the less memory required the better, and for mostly-sorted lists
the pre-allocated 256 is usually sufficient. This saves the space, and
any (minimal) time needed to allocate additional space.
But the early allocation of the maximum space required (as is done for
the current sorting routine) makes the memory management trivial (which
is good!) and avoids the (minmal) additional time for allocs for random
lists.
I'm inclined to just go with the current system: allocate the maximum
required before calling the routine, but welcome any advice or
expressions of preference for the dynamic allocation.
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-27 2:27 ` Andrew Cohen
@ 2022-02-27 7:28 ` Eli Zaretskii
2022-02-27 9:11 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-27 7:28 UTC (permalink / raw)
To: Andrew Cohen; +Cc: mattiase, emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Sun, 27 Feb 2022 10:27:30 +0800
> Cc: Mattias Engdegård <mattiase@acm.org>
>
>
> Mattias E. reminds me (in a private email):
> >Since Timsort may allocate temporary scratch space it is important to
> >make sure it's freed if the comparison predicate throws. A specbind
> >may be needed for the clean-up, but I haven't looked at your timsort
> >code -- perhaps you have already solved that problem.
>
> Dealing with the tmp space is my one remaining question. I note that
> when sorting a list of length L, the current (vector) sorting routine
> requires space for a tmp array of length L/2. It uses SAFE_ALLOCA_LISP
> (and SAFE_FREE) outside the sorting routine and passes a pointer to the
> storage as an argument to =sort_vector_inplace=. This way memory
> management is easy.
>
> TIMSORT /also/ requires space for a tmp array of length L/2, but only in
> the worst case (random lists). For partially sorted lists it can make do
> with less. So it takes a dynamic approach: it allocates a small amount
> of storage (enough for an array of length 256) which can handle all
> short lists and longer partially sorted lists; and then allocates
> additional storage on the fly as needed for other cases.
>
> Right now my routine accepts a pointer to tmp space as an argument; if
> this is null, it uses the dynamic allocation, and otherwise just uses
> the pre-allocated storage.
>
> Clearly the less memory required the better, and for mostly-sorted lists
> the pre-allocated 256 is usually sufficient. This saves the space, and
> any (minimal) time needed to allocate additional space.
>
> But the early allocation of the maximum space required (as is done for
> the current sorting routine) makes the memory management trivial (which
> is good!) and avoids the (minmal) additional time for allocs for random
> lists.
>
> I'm inclined to just go with the current system: allocate the maximum
> required before calling the routine, but welcome any advice or
> expressions of preference for the dynamic allocation.
It is okay to use the existing scheme, since it will at worst have the
same limitation as the existing one: when the list or vector to sort
is very large, you might run out of memory trying to allocate L/2-size
array.
However, from your description, it doesn't sound like the more optimal
approach of allocating dynamically is much more complicated. In
particular, what Mattias said should be easy using the unwind-protect
machinery we already have (and use in many similar situations). See
the calls to record_unwind_protect_ptr whose first argument is
'xfree'. We also have reallocation routines ready to be used.
You could also start with the existing scheme and then add the dynamic
allocation as a followup patch.
Bottom line: I think it's up to you.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-27 7:28 ` Eli Zaretskii
@ 2022-02-27 9:11 ` Andrew Cohen
2022-02-27 9:29 ` Eli Zaretskii
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-27 9:11 UTC (permalink / raw)
To: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
>> From: Andrew Cohen <acohen@ust.hk> Date: Sun, 27 Feb 2022
>> 10:27:30 +0800 Cc: Mattias Engdegård <mattiase@acm.org>
[...]
>> Dealing with the tmp space is my one remaining question. I note
>> that when sorting a list of length L, the current (vector)
>> sorting routine requires space for a tmp array of length L/2. It
>> uses SAFE_ALLOCA_LISP (and SAFE_FREE) outside the sorting routine
>> and passes a pointer to the storage as an argument to
>> =sort_vector_inplace=. This way memory management is easy.
>>
>> TIMSORT /also/ requires space for a tmp array of length L/2, but
>> only in the worst case (random lists). For partially sorted lists
>> it can make do with less. So it takes a dynamic approach: it
>> allocates a small amount of storage (enough for an array of
>> length 256) which can handle all short lists and longer partially
>> sorted lists; and then allocates additional storage on the fly as
>> needed for other cases.
>>
[...]
EZ> It is okay to use the existing scheme, since it will at worst
EZ> have the same limitation as the existing one: when the list or
EZ> vector to sort is very large, you might run out of memory trying
EZ> to allocate L/2-size array.
EZ> However, from your description, it doesn't sound like the more
EZ> optimal approach of allocating dynamically is much more
EZ> complicated. In particular, what Mattias said should be easy
EZ> using the unwind-protect machinery we already have (and use in
EZ> many similar situations). See the calls to
EZ> record_unwind_protect_ptr whose first argument is 'xfree'. We
EZ> also have reallocation routines ready to be used.
That is how I am handling it now, but I'm not sure if I have it right
(sorry for the naive question):
When I need new memory I call
: specpdl_ref sa_count = SPECPDL_INDEX ();
: a = (Lisp_Object *) record_xmalloc (need * sizeof (Lisp_Object));
and I save =sa_count=; I guess =record_xmalloc= handles freeing the
memory on exception. Later during the sorting process I free the memory
explicitly with
: safe_free (sa_count)
Does this seem right? (Probably, since I've been running this way for
awhile and would have expected lots of problems if I weren't allocating
and freeing the memory :))
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-27 9:11 ` Andrew Cohen
@ 2022-02-27 9:29 ` Eli Zaretskii
2022-02-27 10:42 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Eli Zaretskii @ 2022-02-27 9:29 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Sun, 27 Feb 2022 17:11:53 +0800
>
> EZ> However, from your description, it doesn't sound like the more
> EZ> optimal approach of allocating dynamically is much more
> EZ> complicated. In particular, what Mattias said should be easy
> EZ> using the unwind-protect machinery we already have (and use in
> EZ> many similar situations). See the calls to
> EZ> record_unwind_protect_ptr whose first argument is 'xfree'. We
> EZ> also have reallocation routines ready to be used.
>
> That is how I am handling it now, but I'm not sure if I have it right
> (sorry for the naive question):
>
> When I need new memory I call
>
> : specpdl_ref sa_count = SPECPDL_INDEX ();
> : a = (Lisp_Object *) record_xmalloc (need * sizeof (Lisp_Object));
>
> and I save =sa_count=; I guess =record_xmalloc= handles freeing the
> memory on exception. Later during the sorting process I free the memory
> explicitly with
>
> : safe_free (sa_count)
>
> Does this seem right? (Probably, since I've been running this way for
> awhile and would have expected lots of problems if I weren't allocating
> and freeing the memory :))
I'd rather you didn't use safe_free, since that is for SAFE_ALLOCA
etc. Just use unbind_to directly, like we do elsewhere where
record_xmalloc is used.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-27 9:29 ` Eli Zaretskii
@ 2022-02-27 10:42 ` Andrew Cohen
2022-03-04 0:13 ` Andrew Cohen
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-02-27 10:42 UTC (permalink / raw)
To: emacs-devel
>>>>> "EZ" == Eli Zaretskii <eliz@gnu.org> writes:
[...]
>> When I need new memory I call
>>
>> : specpdl_ref sa_count = SPECPDL_INDEX (); : a = (Lisp_Object *)
>> record_xmalloc (need * sizeof (Lisp_Object));
>>
>> and I save =sa_count=; I guess =record_xmalloc= handles freeing
>> the memory on exception. Later during the sorting process I free
>> the memory explicitly with
>>
>> : safe_free (sa_count)
>>
>> Does this seem right? (Probably, since I've been running this way
>> for awhile and would have expected lots of problems if I weren't
>> allocating and freeing the memory :))
EZ> I'd rather you didn't use safe_free, since that is for
EZ> SAFE_ALLOCA etc. Just use unbind_to directly, like we do
EZ> elsewhere where record_xmalloc is used.
Ahh, didn't know about that function. I replaced the safe_free with
unbind_to and seems to work fine.
Thanks,
Andy
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-02-27 10:42 ` Andrew Cohen
@ 2022-03-04 0:13 ` Andrew Cohen
2022-03-04 7:05 ` Eli Zaretskii
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Cohen @ 2022-03-04 0:13 UTC (permalink / raw)
To: emacs-devel
So I think I am nearing the end of the sorting saga. Porting the cpython
code for TIMSORT was easier than I expected, and with patient help from
Eli and especially Mattias E. (who explained to me exactly how to manage
the dynamic storage allocation) I have what should be close to the
finished form.
Here are some (final?) benchmarks. The three columns are the time (in
microseconds) for sorting lists of length 10K using the current list
sort, the current vector sort (by first converting the list to a vector)
and the new timsort. (Note that there is very little overhead for the
conversion so these just reflect the sorting algorithms and their
implementation). I have used 10 different lists that are often used for
comparing sorting algorithms: random, reverse sorted, sorted, sorted
with 3 random binary swaps, sorted with 10 random elements at the end,
sorted with a random 1% of the elements replaced with random numbers,
constant, evil (that is (logxor i 1), which swaps every pair), sorted
blocks of 100, sorted blocks of 10. The final column is the percentage
improvement over the current list sort.
| | oldlist | oldvec | tim | |
|-------------------------------------+---------+--------+------+------|
| (make-random-list 10000) | 2842 | 2146 | 2233 | 27% |
| (nreverse (make-sorted-list 10000)) | 1431 | 1004 | 174 | 722% |
| (make-sorted-list 10000) | 1333 | 924 | 170 | 684% |
| (make-swapped-list 10000 3) | 1512 | 1055 | 179 | 745% |
| (make-plus-list 10000) | 1346 | 915 | 174 | 674% |
| (make-onepercent-list 10000) | 1792 | 1308 | 274 | 554% |
| (make-constant-list 10000) | 1328 | 917 | 169 | 686% |
| (make-evil-list 10000) | 1398 | 969 | 609 | 130% |
| (make-block-list 10000 100) | 2244 | 1651 | 1304 | 72% |
| (make-block-list 10000 10) | 2641 | 1990 | 2034 | 30% |
As you can see the worst case is purely random for which the new
algorithm is still more than 25% faster than the current one (albeit 4%
slower than for vectors in this case), and short blocks is a close
second. Everything else is clearly much faster, with almost everything
else being factors of 6 to 8 times faster.
(I have been running with timsort as a full replacement for sorting for
a few days now, and while I can't say it has transformed my life, it is
certainly a noticeable improvement in some circumstances).
I'll post again in awhile with final questions and some suggestions for
pushing it to git. Please let me know if I should post the code here
(its about 1K lines including lots of comments.)
--
Andrew Cohen
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: sorting in C
2022-03-04 0:13 ` Andrew Cohen
@ 2022-03-04 7:05 ` Eli Zaretskii
0 siblings, 0 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-03-04 7:05 UTC (permalink / raw)
To: Andrew Cohen; +Cc: emacs-devel
> From: Andrew Cohen <acohen@ust.hk>
> Date: Fri, 04 Mar 2022 08:13:33 +0800
>
> | | oldlist | oldvec | tim | |
> |-------------------------------------+---------+--------+------+------|
> | (make-random-list 10000) | 2842 | 2146 | 2233 | 27% |
> | (nreverse (make-sorted-list 10000)) | 1431 | 1004 | 174 | 722% |
> | (make-sorted-list 10000) | 1333 | 924 | 170 | 684% |
> | (make-swapped-list 10000 3) | 1512 | 1055 | 179 | 745% |
> | (make-plus-list 10000) | 1346 | 915 | 174 | 674% |
> | (make-onepercent-list 10000) | 1792 | 1308 | 274 | 554% |
> | (make-constant-list 10000) | 1328 | 917 | 169 | 686% |
> | (make-evil-list 10000) | 1398 | 969 | 609 | 130% |
> | (make-block-list 10000 100) | 2244 | 1651 | 1304 | 72% |
> | (make-block-list 10000 10) | 2641 | 1990 | 2034 | 30% |
>
> As you can see the worst case is purely random for which the new
> algorithm is still more than 25% faster than the current one (albeit 4%
> slower than for vectors in this case), and short blocks is a close
> second. Everything else is clearly much faster, with almost everything
> else being factors of 6 to 8 times faster.
Yes, this looks very promising, thanks.
> I'll post again in awhile with final questions and some suggestions for
> pushing it to git. Please let me know if I should post the code here
> (its about 1K lines including lots of comments.)
I think it is best to send your changes as "git format-patch" in a
message submitted to bug-gnu-emacs@gnu.org, which will then create an
issue on our issue tracker. That will ensure this is not forgotten by
some chance.
Then we will wait for the completion of your legal paperwork.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-03-04 7:05 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-22 2:52 sorting in C Andrew Cohen
2022-02-22 12:30 ` Eli Zaretskii
2022-02-22 12:54 ` Andrew Cohen
2022-02-22 13:11 ` Eli Zaretskii
2022-02-23 4:14 ` Andrew Cohen
2022-02-23 12:34 ` Eli Zaretskii
2022-02-23 12:53 ` Andrew Cohen
2022-02-23 13:14 ` Eli Zaretskii
2022-02-23 13:52 ` Andrew Cohen
2022-02-23 14:06 ` Andrew Cohen
2022-02-23 14:18 ` Eli Zaretskii
2022-02-26 23:54 ` Andrew Cohen
2022-02-27 2:27 ` Andrew Cohen
2022-02-27 7:28 ` Eli Zaretskii
2022-02-27 9:11 ` Andrew Cohen
2022-02-27 9:29 ` Eli Zaretskii
2022-02-27 10:42 ` Andrew Cohen
2022-03-04 0:13 ` Andrew Cohen
2022-03-04 7:05 ` Eli Zaretskii
2022-02-23 13:19 ` Yuri Khan
2022-02-23 14:12 ` Andrew Cohen
2022-02-22 13:12 ` Lars Ingebrigtsen
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.