unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Sort on vectors slow
@ 2004-10-14 12:28 Roland Orre
  2004-10-19 17:52 ` Marius Vollmer
  0 siblings, 1 reply; 5+ messages in thread
From: Roland Orre @ 2004-10-14 12:28 UTC (permalink / raw)


I noticed that the default sort routine on vectors is very
slow. I'm using the restricted-vector-sort! and compared
with a merge sort on vectors instead, the speed difference
is tremendous. In this example foo is a vector with 100000
elements (an often occuring size for me currently), here
with random numbers. Machine is a 2.2 GHz P4. Observe that
the restricted-vector-qsort I'm using takes inclusive endpos
according to the doc (I've mentioned this on guile-devel).

(begin
  (define vlen 1000000)
  (define foo (make-vector vlen 9999999))
  (array-map! foo random foo)
  (define bar (copy-vector bar))
)
(verbose 2)

(restricted-vector-qsort! foo < 0 (1- vlen))
;;; 249150  msec  (71790 msec in gc)

(array-copy! bar foo)

(restricted-vector-msort! foo < 0 (1- vlen))
;;; 1060  msec  (300 msec in gc)

I'm not sure if this huge difference makes sense, or if it indicates
a problem with quicksort, which I took from libc earlier (1998). I've
found in some text though that time complexity of quicksort is O(N^2)
in worst case, only O(N log N) in average, and  merge sort is O(N log N)
in all cases. An advantage is that merge sort is stable also, which
quicksort isn't. Anyone knowing anything about memory usage for the
two algorithms?

/Roland

PS. Code to the restricted-vector-msort! below. I suggest that this
should go into the distribution, alternatively we should replace the
current default quicksort for vectors with merge sort.


SCM_DEFINE (scm_restricted_vector_msort_x,
	    "restricted-vector-msort!", 4, 1, 0, 
            (SCM vec, SCM less, SCM startpos, SCM endpos, SCM tmpv),
	    "Sort the sequence @var{items}, which may be a list or a\n"
	    "vector. @var{less} is used for comparing the sequence elements.\n"
	    "The sorting is destructive, that means that the input sequence\n"
	    "is modified to produce the sorted result.\n"
	    "This is a stable sort.")
#define FUNC_NAME s_scm_restricted_vector_msort_x
{
  const scm_t_trampoline_2 cmp = compare_function (less, 2, FUNC_NAME);
  size_t  vlen, spos, len;

  if (SCM_VECTORP (vec))
    {
      SCM *temp;
      vlen = SCM_VECTOR_LENGTH (vec);

      SCM_VALIDATE_INUM_MIN_COPY (3, startpos, 0, spos);
      SCM_ASSERT_RANGE (3, startpos, spos <= vlen);
      SCM_VALIDATE_INUM_RANGE (4, endpos,0, vlen);
      len = SCM_INUM (endpos) - spos;

      /*
	 the following array does not contain any new references to
	 SCM objects, so we can get away with allocing it on the heap.
      */
      if (SCM_UNBNDP(tmpv))
	temp = scm_malloc (len * sizeof(SCM));
      else
	if (SCM_VECTORP(tmpv) && (SCM_VECTOR_LENGTH(tmpv) >= vlen))
	  {
	    temp = SCM_WRITABLE_VELTS(tmpv);
	  }
	else SCM_WRONG_TYPE_ARG (1, vec);

      scm_merge_vector_step (vec, temp, cmp, less, spos, len);
      if (SCM_UNBNDP(tmpv))
	free(temp);
      return SCM_UNSPECIFIED;
    }
  else
    SCM_WRONG_TYPE_ARG (1, vec);
}
#undef FUNC_NAME




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sort on vectors slow
@ 2004-10-14 14:22 Roland Orre
  0 siblings, 0 replies; 5+ messages in thread
From: Roland Orre @ 2004-10-14 14:22 UTC (permalink / raw)


On Thu, 2004-10-14 at 14:28, Roland Orre wrote:
> PS. Code to the restricted-vector-msort! below. I suggest that this
> should go into the distribution, alternatively we should replace the
> current default quicksort for vectors with merge sort.

Typical, the code I sent contained a little bug, which I hadn't
noticed as I'm usually using a temporary vector (optional arg)
supplied. So, here is a new version of restricted-vector-msort!
if anyone is interested or maybe consider it should be put into
the distribution.
/Roland

SCM_DEFINE (scm_restricted_vector_msort_x,
	    "restricted-vector-msort!", 4, 1, 0, 
            (SCM vec, SCM less, SCM startpos, SCM endpos, SCM tmpv),
	    "Sort the sequence @var{items}, which may be a list or a\n"
	    "vector. @var{less} is used for comparing the sequence elements.\n"
	    "The sorting is destructive, that means that the input sequence\n"
	    "is modified to produce the sorted result.\n"
	    "This is a stable sort.")
#define FUNC_NAME s_scm_restricted_vector_msort_x
{
  const scm_t_trampoline_2 cmp = compare_function (less, 2, FUNC_NAME);
  size_t  vlen, spos, high;

  if (SCM_VECTORP (vec))
    {
      SCM *temp;
      vlen = SCM_VECTOR_LENGTH (vec);

      SCM_VALIDATE_INUM_MIN_COPY (3, startpos, 0, spos);
      SCM_ASSERT_RANGE (3, startpos, spos <= vlen);
      SCM_VALIDATE_INUM_RANGE (4, endpos,0, vlen);
      high = SCM_INUM (endpos) - spos;

      /*
	 the following array does not contain any new references to
	 SCM objects, so we can get away with allocing it on the heap.
      */
      if (SCM_UNBNDP(tmpv)) {
	temp = scm_malloc (vlen * sizeof(SCM));
      }
      else
	if (SCM_VECTORP(tmpv) && (SCM_VECTOR_LENGTH(tmpv) >= vlen))
	  {
	    temp = SCM_WRITABLE_VELTS(tmpv);
	  }
	else SCM_WRONG_TYPE_ARG (5, tmpv);

      scm_merge_vector_step (vec, temp, cmp, less, spos, high);

      if (SCM_UNBNDP(tmpv))
	free(temp);

      return SCM_UNSPECIFIED;
    }
  else
    SCM_WRONG_TYPE_ARG (1, vec);
}
#undef FUNC_NAME




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sort on vectors slow
  2004-10-14 12:28 Sort on vectors slow Roland Orre
@ 2004-10-19 17:52 ` Marius Vollmer
  2004-10-19 21:47   ` Marius Vollmer
  0 siblings, 1 reply; 5+ messages in thread
From: Marius Vollmer @ 2004-10-19 17:52 UTC (permalink / raw)
  Cc: guile-user

Roland Orre <roland.orre@neurologic.se> writes:

> I'm not sure if this huge difference makes sense, or if it indicates
> a problem with quicksort, which I took from libc earlier (1998).

The problem is not with quicksort per se, but with the insertion sort
'optimizations' in our code.  That part takes most of the time, but it
shouldn't.  I can't say yet what is wrong exactly, yet...

> I've found in some text though that time complexity of quicksort is
> O(N^2) in worst case, only O(N log N) in average, and merge sort is
> O(N log N) in all cases.

The worst case being an already sorted vector, right?  This shouldn't
be the case with your random vector.

In any case, the qsort function in GNU libc is now implemented as a
merge sort.  From libc NEWS:

* Mike Haertel (of GNU e?grep and malloc fame) has written a new sorting
  function which uses the `merge sort' algorithm, and is said to be
  significantly faster than the old GNU `qsort' function.  Merge sort is
  now the standard `qsort' function.  The new algorithm can require a lot
  of temporary storage; so, the old sorting function is called when the
  required storage is not available.

> PS. Code to the restricted-vector-msort! below. I suggest that this
> should go into the distribution, alternatively we should replace the
> current default quicksort for vectors with merge sort.

Yep, I will probably do this, but according to the NEWS entry above,
we might need to fall back to quicksort; so fixing quicksort would be
good in any case.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sort on vectors slow
  2004-10-19 17:52 ` Marius Vollmer
@ 2004-10-19 21:47   ` Marius Vollmer
  2004-10-19 22:28     ` Marius Vollmer
  0 siblings, 1 reply; 5+ messages in thread
From: Marius Vollmer @ 2004-10-19 21:47 UTC (permalink / raw)
  Cc: guile-user

Marius Vollmer <marius.vollmer@uni-dortmund.de> writes:

> Roland Orre <roland.orre@neurologic.se> writes:
>
>> I'm not sure if this huge difference makes sense, or if it indicates
>> a problem with quicksort, which I took from libc earlier (1998).
>
> The problem is not with quicksort per se, but with the insertion sort
> 'optimizations' in our code.  That part takes most of the time, but it
> shouldn't.  I can't say yet what is wrong exactly, yet...

Let me post a short update, in case you are working on this as well,
Roland: it looks like our quicksort isn't sorting correctly all the
time and the subsequent insertion sort covers up for that, but takes
its usual n^2 time for it.

Now, time to get out the Knuth...

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Sort on vectors slow
  2004-10-19 21:47   ` Marius Vollmer
@ 2004-10-19 22:28     ` Marius Vollmer
  0 siblings, 0 replies; 5+ messages in thread
From: Marius Vollmer @ 2004-10-19 22:28 UTC (permalink / raw)
  Cc: guile-user

Marius Vollmer <mvo@zagadka.de> writes:

> Now, time to get out the Knuth...

Wasn't needed after all... it turned out that our quicksort was always
using base_ptr[mid] as the pivot and did not take into account that
the "collaps the walls" loop might overwrite that location, thus
incorrectly changing the pivot during collapsing.  I have fixed this
by copying out the pivot into a local variable.  'sort' and 'sort!'
should now work reasonably also for large vectors.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://lists.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-10-19 22:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-14 12:28 Sort on vectors slow Roland Orre
2004-10-19 17:52 ` Marius Vollmer
2004-10-19 21:47   ` Marius Vollmer
2004-10-19 22:28     ` Marius Vollmer
  -- strict thread matches above, loose matches on Subject: below --
2004-10-14 14:22 Roland Orre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).