unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Add seq-shuffle
@ 2024-09-14  7:48 Hugo Thunnissen
  2024-09-14  8:42 ` Philip Kaludercic
  2024-09-15  1:44 ` Adam Porter
  0 siblings, 2 replies; 14+ messages in thread
From: Hugo Thunnissen @ 2024-09-14  7:48 UTC (permalink / raw)
  To: Emacs-devel@gnu.org


Hi,

I a function to randomize the order of a sequence. It is an
implementation of the Fisher-Yates shuffle algorithm (as I understand it
from wikipedia 🙃). I needed to randomize the order of a large list
(1000000 elements) and noticed that converting it to a vector made an
enormous difference in performance, so I opted to convert the sequence
to a vector before shuffling it.

Is there any interest to add this or a similar function to the seq
library?


(defun seq-shuffle (sequence)
  (cl-assert (sequencep sequence))

  (let* ((vec (seq-into sequence 'vector))
         (length (length vec))
         (n 0)
         (seq-type (type-of sequence)))
    (while (< n length)
      (let ((i (+ n (random (- length n))))
            (tmp (aref vec n)))
        (setf (aref vec n) (aref vec i)
              (aref vec i) tmp))
      (cl-incf n))

    (seq-into vec (if (eq 'cons seq-type) 'list seq-type))))

- Hugo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-14  7:48 Add seq-shuffle Hugo Thunnissen
@ 2024-09-14  8:42 ` Philip Kaludercic
  2024-09-16  5:08   ` Emanuel Berg
  2024-09-15  1:44 ` Adam Porter
  1 sibling, 1 reply; 14+ messages in thread
From: Philip Kaludercic @ 2024-09-14  8:42 UTC (permalink / raw)
  To: Hugo Thunnissen; +Cc: Emacs-devel@gnu.org

Hugo Thunnissen <devel@hugot.nl> writes:

> Hi,
>
> I a function to randomize the order of a sequence. It is an
> implementation of the Fisher-Yates shuffle algorithm (as I understand it
> from wikipedia 🙃). I needed to randomize the order of a large list
> (1000000 elements) and noticed that converting it to a vector made an
> enormous difference in performance, so I opted to convert the sequence
> to a vector before shuffling it.
>
> Is there any interest to add this or a similar function to the seq
> library?
>
>
> (defun seq-shuffle (sequence)
>   (cl-assert (sequencep sequence))
>
>   (let* ((vec (seq-into sequence 'vector))
>          (length (length vec))
>          (n 0)
>          (seq-type (type-of sequence)))
>     (while (< n length)
>       (let ((i (+ n (random (- length n))))
>             (tmp (aref vec n)))
>         (setf (aref vec n) (aref vec i)
>               (aref vec i) tmp))

If you are already using cl-lib, then you can simplify this with

  (cl-rotatef (aref vec n) (aref vec i))

>       (cl-incf n))

and use a `dotimes' instead of counting manually.

>     (seq-into vec (if (eq 'cons seq-type) 'list seq-type))))

I think this would be nice to have, but you should

1. Prepare a git patch and send it to bug-gnu-emacs@gnu.org
2. Probably prepare a destructive version, so that shuffling doesn't
   always have to cons new memory.  You can then just invoke the
   destructive seq-nshuffle in seq-shuffle.

> - Hugo
>
>

-- 
	Philip Kaludercic on siskin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-14  7:48 Add seq-shuffle Hugo Thunnissen
  2024-09-14  8:42 ` Philip Kaludercic
@ 2024-09-15  1:44 ` Adam Porter
  2024-09-15  6:27   ` Eli Zaretskii
  2024-09-15  7:48   ` Hugo Thunnissen
  1 sibling, 2 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-15  1:44 UTC (permalink / raw)
  To: devel; +Cc: Emacs-devel

Hi Hugo,

Thanks, this is a gap in functionality that should be filled.

FWIW, in my listen.el package on ELPA, I have this function, the 
algorithm of which I copied from Chris Wellons's elfeed-shuffle function 
in his Elfeed package (which he placed into the Public Domain).

(defun listen-queue-shuffle (queue)
   "Shuffle QUEUE."
   (interactive (list (listen-queue-complete)))
   ;; Copied from `elfeed-shuffle'.
   (let* ((tracks (listen-queue-tracks queue))
          (current-track (listen-queue-current queue))
          n)
     (when current-track
       (cl-callf2 delete current-track tracks))
     (setf n (length tracks))
     ;; Don't use dotimes result (bug#16206)
     (dotimes (i n)
       (cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i))))))
     (when current-track
       (push current-track tracks))
     (setf (listen-queue-tracks queue) tracks))
   (listen-queue--update-buffer queue))

I haven't benchmarked it against the version you posted, but I know that 
Chris did write it specifically for the sake of performance, so it would 
probably be worth considering.

--Adam



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-15  1:44 ` Adam Porter
@ 2024-09-15  6:27   ` Eli Zaretskii
  2024-09-15  6:45     ` Adam Porter
  2024-09-15  7:48   ` Hugo Thunnissen
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-09-15  6:27 UTC (permalink / raw)
  To: Adam Porter; +Cc: devel, Emacs-devel

> Date: Sat, 14 Sep 2024 20:44:16 -0500
> Cc: Emacs-devel@gnu.org
> From: Adam Porter <adam@alphapapa.net>
> 
> Thanks, this is a gap in functionality that should be filled.

Not every gap _should_ be filled, no.  Only those we think (and have
some basis to think) it is going to be missed by enough real-life Lisp
programs.

So saying "should be filled" without explaining why you think it
should be filled is less convincing than it could have been.  Thus, I
urge you to add motivation for your opinion.  TIA.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-15  6:27   ` Eli Zaretskii
@ 2024-09-15  6:45     ` Adam Porter
  2024-09-15 11:58       ` Stefan Kangas
  0 siblings, 1 reply; 14+ messages in thread
From: Adam Porter @ 2024-09-15  6:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: devel, Emacs-devel

On 9/15/24 01:27, Eli Zaretskii wrote:
>> Date: Sat, 14 Sep 2024 20:44:16 -0500
>> Cc: Emacs-devel@gnu.org
>> From: Adam Porter <adam@alphapapa.net>
>>
>> Thanks, this is a gap in functionality that should be filled.
> 
> Not every gap _should_ be filled, no.  Only those we think (and have
> some basis to think) it is going to be missed by enough real-life Lisp
> programs.
> 
> So saying "should be filled" without explaining why you think it
> should be filled is less convincing than it could have been.  Thus, I
> urge you to add motivation for your opinion.  TIA.

I think it should be filled because it's not an uncommon thing to want 
to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious 
how to do it from scratch, especially in a performant way.  A "naive" 
implementation, without regard for performance, can perform quite poorly.

That there are different sequence types, each with their own 
characteristics, further complicates the matter.  For newer Elisp 
programmers, who may be unfamiliar with the various types and 
performance issues in Elisp, having a built-in function to do it 
correctly and quickly would be helpful.

As it is, we have had various implementations in various third-party 
libraries for years, each with the same purpose but doing it slightly 
differently.  As an example from other languages, Python provides the 
random.shuffle function for sequences.  It would seem natural for 
Emacs's seq library to provide a similar function.

It's not a big deal, but if I had a vote, it would be to include such a 
function in seq.el.

--Adam



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-15  1:44 ` Adam Porter
  2024-09-15  6:27   ` Eli Zaretskii
@ 2024-09-15  7:48   ` Hugo Thunnissen
  1 sibling, 0 replies; 14+ messages in thread
From: Hugo Thunnissen @ 2024-09-15  7:48 UTC (permalink / raw)
  To: Adam Porter; +Cc: Emacs-devel

Adam Porter <adam@alphapapa.net> writes:

> Hi Hugo,
>
> Thanks, this is a gap in functionality that should be filled.
>
> FWIW, in my listen.el package on ELPA, I have this function, the algorithm of
> which I copied from Chris Wellons's elfeed-shuffle function in his Elfeed
> package (which he placed into the Public Domain).
>
> (defun listen-queue-shuffle (queue)
>   "Shuffle QUEUE."
>   (interactive (list (listen-queue-complete)))
>   ;; Copied from `elfeed-shuffle'.
>   (let* ((tracks (listen-queue-tracks queue))
>          (current-track (listen-queue-current queue))
>          n)
>     (when current-track
>       (cl-callf2 delete current-track tracks))
>     (setf n (length tracks))
>     ;; Don't use dotimes result (bug#16206)
>     (dotimes (i n)
>       (cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i))))))
>     (when current-track
>       (push current-track tracks))
>     (setf (listen-queue-tracks queue) tracks))
>   (listen-queue--update-buffer queue))
>
> I haven't benchmarked it against the version you posted, but I know that Chris
> did write it specifically for the sake of performance, so it would probably be
> worth considering.
>
> --Adam

Hah! Coincidentally, I already did a comparison benchmark yesterday. I
found a gist benchmarking different shuffling functions on github and
appended an earlier version of mine for comparison.

Link:
https://gist.github.com/purcell/34824f1b676e6188540cdf71c7cc9fc4?permalink_comment_id=5191301#gistcomment-5191301

It does pretty well performance-wise and it looks like something similar
to what I'd end up with if I follow Philips advice. The main difference
is that it operates directly on whatever sequence is passed to it as
opposed to always converting it to a vector.

I'm not sure which is the better approach: leave the used sequence type
up to the caller or copy it into a vector regardless of type.

- Hugo



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-15  6:45     ` Adam Porter
@ 2024-09-15 11:58       ` Stefan Kangas
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Kangas @ 2024-09-15 11:58 UTC (permalink / raw)
  To: Adam Porter, Eli Zaretskii; +Cc: devel, Emacs-devel

Adam Porter <adam@alphapapa.net> writes:

> I think it should be filled because it's not an uncommon thing to want
> to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious
> how to do it from scratch, especially in a performant way.  A "naive"
> implementation, without regard for performance, can perform quite poorly.
>
> That there are different sequence types, each with their own
> characteristics, further complicates the matter.  For newer Elisp
> programmers, who may be unfamiliar with the various types and
> performance issues in Elisp, having a built-in function to do it
> correctly and quickly would be helpful.
>
> As it is, we have had various implementations in various third-party
> libraries for years, each with the same purpose but doing it slightly
> differently.  As an example from other languages, Python provides the
> random.shuffle function for sequences.  It would seem natural for
> Emacs's seq library to provide a similar function.
>
> It's not a big deal, but if I had a vote, it would be to include such a
> function in seq.el.

Makes sense to me, FWIW.

I have had slow and naive implementations of it in my init file at
various times myself.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-14  8:42 ` Philip Kaludercic
@ 2024-09-16  5:08   ` Emanuel Berg
  2024-09-16 19:17     ` Adam Porter
  2024-09-17 18:53     ` Yuri Khan
  0 siblings, 2 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-16  5:08 UTC (permalink / raw)
  To: emacs-devel; +Cc: Philip Kaludercic

Philip Kaludercic wrote:

>> (defun seq-shuffle (sequence)
>>   (cl-assert (sequencep sequence))
>>
>>   (let* ((vec (seq-into sequence 'vector))
>>          (length (length vec))
>>          (n 0)
>>          (seq-type (type-of sequence)))
>>     (while (< n length)
>>       (let ((i (+ n (random (- length n))))
>>             (tmp (aref vec n)))
>>         (setf (aref vec n) (aref vec i)
>>               (aref vec i) tmp))
>
> If you are already using cl-lib, then you can simplify this with
>
>   (cl-rotatef (aref vec n) (aref vec i))

(seq-sort (lambda (_ __) (zerop (random 2))) seq)

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-16  5:08   ` Emanuel Berg
@ 2024-09-16 19:17     ` Adam Porter
  2024-09-17  4:09       ` Emanuel Berg
  2024-09-17 18:37       ` Philip Kaludercic
  2024-09-17 18:53     ` Yuri Khan
  1 sibling, 2 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-16 19:17 UTC (permalink / raw)
  To: incal; +Cc: emacs-devel

Hi Emanuel,

That's a neat and concise solution.  It seems to produce a decently 
random sorting, but I'd guess that it may not perform well for large 
sequences.  For example:

     (let (comparisons)
       (list :result (seq-sort (lambda (a b)
                                 (push (cons a b) comparisons)
                                 (zerop (random 2)))
                               (number-sequence 0 10))
             :num-comparisons (length comparisons)))
     ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)

In testing that expression repeatedly, I see that the number of 
comparisons varies between about 23 and 26.

Thanks for pointing it out, though.  It's good to know that it exists 
and in what circumstances it could be useful.

--Adam



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-16 19:17     ` Adam Porter
@ 2024-09-17  4:09       ` Emanuel Berg
  2024-09-17 18:37       ` Philip Kaludercic
  1 sibling, 0 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-17  4:09 UTC (permalink / raw)
  To: emacs-devel

Adam Porter wrote:

> That's a neat and concise solution.

Pretty!

> It seems to produce a decently random

For better random,

  https://dataswamp.org/~incal/emacs-init/random-urandom/

> sorting, but I'd guess that it may not perform well for
> large sequences. For example:
>
>     (let (comparisons)
>       (list :result (seq-sort (lambda (a b)
>                                 (push (cons a b) comparisons)
>                                 (zerop (random 2)))
>                               (number-sequence 0 10))
>             :num-comparisons (length comparisons)))
>     ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>
> In testing that expression repeatedly, I see that the number
> of comparisons varies between about 23 and 26.

Okay, but is that bad?

Isn't that linear plus logarithmic i.e. linear?

But I actually didn't know it varied, I thought it always
tested all, wouldn't that be the arithmetic sum, so for 11
items 1 + ... + 10 = 55 tests?

(defun arith-sum (n)
  (cl-loop
    with sum = 0
    for i from 1 to (1- n)
    do (cl-incf sum i)
    finally return sum) )

;; (arith-sum 11) ; 55

How long are long lists? Not 11 items, if we are talking
marshalling, maybe you want to get a CS algorithm book and see
what specific algorithms they have for the purpose that scale
the best for huge sets? If linear is too slow, you
want logarithmic.

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-16 19:17     ` Adam Porter
  2024-09-17  4:09       ` Emanuel Berg
@ 2024-09-17 18:37       ` Philip Kaludercic
  2024-09-17 22:26         ` Adam Porter
  2024-09-18  1:27         ` Emanuel Berg
  1 sibling, 2 replies; 14+ messages in thread
From: Philip Kaludercic @ 2024-09-17 18:37 UTC (permalink / raw)
  To: Adam Porter; +Cc: incal, emacs-devel

Adam Porter <adam@alphapapa.net> writes:

> Hi Emanuel,
>
> That's a neat and concise solution. It seems to produce a decently
> random sorting, but I'd guess that it may not perform well for large
> sequences.  For example:
>
>     (let (comparisons)
>       (list :result (seq-sort (lambda (a b)
>                                 (push (cons a b) comparisons)
>                                 (zerop (random 2)))
>                               (number-sequence 0 10))
>             :num-comparisons (length comparisons)))
>     ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>
> In testing that expression repeatedly, I see that the number of
> comparisons varies between about 23 and 26.
>
> Thanks for pointing it out, though.  It's good to know that it exists
> and in what circumstances it could be useful.

IIRC the approach is related to the "Naïve method" mentioned on
Wikipedia [0].  I think that this variation of your code demonstrates
that not all elements are considered equally often:

  (let ((comparisons '()))
    (list :result
          (seq-sort (lambda (a b)
                              (cl-incf (alist-get a comparisons 0))
                              (cl-incf (alist-get b comparisons 0))
                              (zerop (random 2)))
                            (number-sequence 0 10))
          :num-comparisons
          (seq-sort (lambda (x y)
                      (< (cdr x) (cdr y)))
                    comparisons)))
  ;; (:result
  ;;  (9 0 1 7 4 10 5 6 8 3 2)
  ;;  :num-comparisons
  ;;  ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8)))

While acceptable as a personal hack, it a proper solution should do
Fisher–Yates.  Adding a "shuffle" function to Emacs would also eliminate
the excuse to use suboptimal fallbacks.

[0] https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Na%C3%AFve_method

>
> --Adam
>
>

-- 
	Philip Kaludercic on siskin



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-16  5:08   ` Emanuel Berg
  2024-09-16 19:17     ` Adam Porter
@ 2024-09-17 18:53     ` Yuri Khan
  1 sibling, 0 replies; 14+ messages in thread
From: Yuri Khan @ 2024-09-17 18:53 UTC (permalink / raw)
  To: emacs-devel; +Cc: Philip Kaludercic

On Mon, 16 Sept 2024 at 18:07, Emanuel Berg <incal@dataswamp.org> wrote:

> (seq-sort (lambda (_ __) (zerop (random 2))) seq)

Is seq-sort guaranteed to behave well with a comparison predicate that
is not a linear order though?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-17 18:37       ` Philip Kaludercic
@ 2024-09-17 22:26         ` Adam Porter
  2024-09-18  1:27         ` Emanuel Berg
  1 sibling, 0 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-17 22:26 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: incal, emacs-devel

On 9/17/24 13:37, Philip Kaludercic wrote:
> While acceptable as a personal hack, it a proper solution should do
> Fisher–Yates.  Adding a "shuffle" function to Emacs would also eliminate
> the excuse to use suboptimal fallbacks.

Agreed, FWIW.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Add seq-shuffle
  2024-09-17 18:37       ` Philip Kaludercic
  2024-09-17 22:26         ` Adam Porter
@ 2024-09-18  1:27         ` Emanuel Berg
  1 sibling, 0 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-18  1:27 UTC (permalink / raw)
  To: emacs-devel

Philip Kaludercic wrote:

>> That's a neat and concise solution. It seems to produce
>> a decently random sorting, but I'd guess that it may not
>> perform well for large sequences. For example:
>>
>>     (let (comparisons)
>>       (list :result (seq-sort (lambda (a b)
>>                                 (push (cons a b) comparisons)
>>                                 (zerop (random 2)))
>>                               (number-sequence 0 10))
>>             :num-comparisons (length comparisons)))
>>     ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>>
>> In testing that expression repeatedly, I see that the
>> number of comparisons varies between about 23 and 26.
>>
>> Thanks for pointing it out, though. It's good to know that
>> it exists and in what circumstances it could be useful.
>
> IIRC the approach is related to the "Naive method" mentioned
> on Wikipedia [0]. I think that this variation of your code
> demonstrates that not all elements are considered equally
> often:
>
>   (let ((comparisons '()))
>     (list :result
>           (seq-sort (lambda (a b)
>                               (cl-incf (alist-get a comparisons 0))
>                               (cl-incf (alist-get b comparisons 0))
>                               (zerop (random 2)))
>                             (number-sequence 0 10))
>           :num-comparisons
>           (seq-sort (lambda (x y)
>                       (< (cdr x) (cdr y)))
>                     comparisons)))
>   ;; (:result
>   ;;  (9 0 1 7 4 10 5 6 8 3 2)
>   ;;  :num-comparisons
>   ;;  ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8)))

I think it is as random as `random', as for the exact
execution of the algorithm one should examine 'seq-sort'.

Natural thing would just be to go from one end to the other,
but maybe they do some optimizations.

> While acceptable as a personal hack

Very generous of you, noble sir!

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-09-18  1:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-14  7:48 Add seq-shuffle Hugo Thunnissen
2024-09-14  8:42 ` Philip Kaludercic
2024-09-16  5:08   ` Emanuel Berg
2024-09-16 19:17     ` Adam Porter
2024-09-17  4:09       ` Emanuel Berg
2024-09-17 18:37       ` Philip Kaludercic
2024-09-17 22:26         ` Adam Porter
2024-09-18  1:27         ` Emanuel Berg
2024-09-17 18:53     ` Yuri Khan
2024-09-15  1:44 ` Adam Porter
2024-09-15  6:27   ` Eli Zaretskii
2024-09-15  6:45     ` Adam Porter
2024-09-15 11:58       ` Stefan Kangas
2024-09-15  7:48   ` Hugo Thunnissen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).