* Add seq-shuffle
@ 2024-09-14 7:48 Hugo Thunnissen
2024-09-14 8:42 ` Philip Kaludercic
2024-09-15 1:44 ` Adam Porter
0 siblings, 2 replies; 14+ messages in thread
From: Hugo Thunnissen @ 2024-09-14 7:48 UTC (permalink / raw)
To: Emacs-devel@gnu.org
Hi,
I a function to randomize the order of a sequence. It is an
implementation of the Fisher-Yates shuffle algorithm (as I understand it
from wikipedia 🙃). I needed to randomize the order of a large list
(1000000 elements) and noticed that converting it to a vector made an
enormous difference in performance, so I opted to convert the sequence
to a vector before shuffling it.
Is there any interest to add this or a similar function to the seq
library?
(defun seq-shuffle (sequence)
(cl-assert (sequencep sequence))
(let* ((vec (seq-into sequence 'vector))
(length (length vec))
(n 0)
(seq-type (type-of sequence)))
(while (< n length)
(let ((i (+ n (random (- length n))))
(tmp (aref vec n)))
(setf (aref vec n) (aref vec i)
(aref vec i) tmp))
(cl-incf n))
(seq-into vec (if (eq 'cons seq-type) 'list seq-type))))
- Hugo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen
@ 2024-09-14 8:42 ` Philip Kaludercic
2024-09-16 5:08 ` Emanuel Berg
2024-09-15 1:44 ` Adam Porter
1 sibling, 1 reply; 14+ messages in thread
From: Philip Kaludercic @ 2024-09-14 8:42 UTC (permalink / raw)
To: Hugo Thunnissen; +Cc: Emacs-devel@gnu.org
Hugo Thunnissen <devel@hugot.nl> writes:
> Hi,
>
> I a function to randomize the order of a sequence. It is an
> implementation of the Fisher-Yates shuffle algorithm (as I understand it
> from wikipedia 🙃). I needed to randomize the order of a large list
> (1000000 elements) and noticed that converting it to a vector made an
> enormous difference in performance, so I opted to convert the sequence
> to a vector before shuffling it.
>
> Is there any interest to add this or a similar function to the seq
> library?
>
>
> (defun seq-shuffle (sequence)
> (cl-assert (sequencep sequence))
>
> (let* ((vec (seq-into sequence 'vector))
> (length (length vec))
> (n 0)
> (seq-type (type-of sequence)))
> (while (< n length)
> (let ((i (+ n (random (- length n))))
> (tmp (aref vec n)))
> (setf (aref vec n) (aref vec i)
> (aref vec i) tmp))
If you are already using cl-lib, then you can simplify this with
(cl-rotatef (aref vec n) (aref vec i))
> (cl-incf n))
and use a `dotimes' instead of counting manually.
> (seq-into vec (if (eq 'cons seq-type) 'list seq-type))))
I think this would be nice to have, but you should
1. Prepare a git patch and send it to bug-gnu-emacs@gnu.org
2. Probably prepare a destructive version, so that shuffling doesn't
always have to cons new memory. You can then just invoke the
destructive seq-nshuffle in seq-shuffle.
> - Hugo
>
>
--
Philip Kaludercic on siskin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen
2024-09-14 8:42 ` Philip Kaludercic
@ 2024-09-15 1:44 ` Adam Porter
2024-09-15 6:27 ` Eli Zaretskii
2024-09-15 7:48 ` Hugo Thunnissen
1 sibling, 2 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-15 1:44 UTC (permalink / raw)
To: devel; +Cc: Emacs-devel
Hi Hugo,
Thanks, this is a gap in functionality that should be filled.
FWIW, in my listen.el package on ELPA, I have this function, the
algorithm of which I copied from Chris Wellons's elfeed-shuffle function
in his Elfeed package (which he placed into the Public Domain).
(defun listen-queue-shuffle (queue)
"Shuffle QUEUE."
(interactive (list (listen-queue-complete)))
;; Copied from `elfeed-shuffle'.
(let* ((tracks (listen-queue-tracks queue))
(current-track (listen-queue-current queue))
n)
(when current-track
(cl-callf2 delete current-track tracks))
(setf n (length tracks))
;; Don't use dotimes result (bug#16206)
(dotimes (i n)
(cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i))))))
(when current-track
(push current-track tracks))
(setf (listen-queue-tracks queue) tracks))
(listen-queue--update-buffer queue))
I haven't benchmarked it against the version you posted, but I know that
Chris did write it specifically for the sake of performance, so it would
probably be worth considering.
--Adam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-15 1:44 ` Adam Porter
@ 2024-09-15 6:27 ` Eli Zaretskii
2024-09-15 6:45 ` Adam Porter
2024-09-15 7:48 ` Hugo Thunnissen
1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-09-15 6:27 UTC (permalink / raw)
To: Adam Porter; +Cc: devel, Emacs-devel
> Date: Sat, 14 Sep 2024 20:44:16 -0500
> Cc: Emacs-devel@gnu.org
> From: Adam Porter <adam@alphapapa.net>
>
> Thanks, this is a gap in functionality that should be filled.
Not every gap _should_ be filled, no. Only those we think (and have
some basis to think) it is going to be missed by enough real-life Lisp
programs.
So saying "should be filled" without explaining why you think it
should be filled is less convincing than it could have been. Thus, I
urge you to add motivation for your opinion. TIA.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-15 6:27 ` Eli Zaretskii
@ 2024-09-15 6:45 ` Adam Porter
2024-09-15 11:58 ` Stefan Kangas
0 siblings, 1 reply; 14+ messages in thread
From: Adam Porter @ 2024-09-15 6:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: devel, Emacs-devel
On 9/15/24 01:27, Eli Zaretskii wrote:
>> Date: Sat, 14 Sep 2024 20:44:16 -0500
>> Cc: Emacs-devel@gnu.org
>> From: Adam Porter <adam@alphapapa.net>
>>
>> Thanks, this is a gap in functionality that should be filled.
>
> Not every gap _should_ be filled, no. Only those we think (and have
> some basis to think) it is going to be missed by enough real-life Lisp
> programs.
>
> So saying "should be filled" without explaining why you think it
> should be filled is less convincing than it could have been. Thus, I
> urge you to add motivation for your opinion. TIA.
I think it should be filled because it's not an uncommon thing to want
to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious
how to do it from scratch, especially in a performant way. A "naive"
implementation, without regard for performance, can perform quite poorly.
That there are different sequence types, each with their own
characteristics, further complicates the matter. For newer Elisp
programmers, who may be unfamiliar with the various types and
performance issues in Elisp, having a built-in function to do it
correctly and quickly would be helpful.
As it is, we have had various implementations in various third-party
libraries for years, each with the same purpose but doing it slightly
differently. As an example from other languages, Python provides the
random.shuffle function for sequences. It would seem natural for
Emacs's seq library to provide a similar function.
It's not a big deal, but if I had a vote, it would be to include such a
function in seq.el.
--Adam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-15 1:44 ` Adam Porter
2024-09-15 6:27 ` Eli Zaretskii
@ 2024-09-15 7:48 ` Hugo Thunnissen
1 sibling, 0 replies; 14+ messages in thread
From: Hugo Thunnissen @ 2024-09-15 7:48 UTC (permalink / raw)
To: Adam Porter; +Cc: Emacs-devel
Adam Porter <adam@alphapapa.net> writes:
> Hi Hugo,
>
> Thanks, this is a gap in functionality that should be filled.
>
> FWIW, in my listen.el package on ELPA, I have this function, the algorithm of
> which I copied from Chris Wellons's elfeed-shuffle function in his Elfeed
> package (which he placed into the Public Domain).
>
> (defun listen-queue-shuffle (queue)
> "Shuffle QUEUE."
> (interactive (list (listen-queue-complete)))
> ;; Copied from `elfeed-shuffle'.
> (let* ((tracks (listen-queue-tracks queue))
> (current-track (listen-queue-current queue))
> n)
> (when current-track
> (cl-callf2 delete current-track tracks))
> (setf n (length tracks))
> ;; Don't use dotimes result (bug#16206)
> (dotimes (i n)
> (cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i))))))
> (when current-track
> (push current-track tracks))
> (setf (listen-queue-tracks queue) tracks))
> (listen-queue--update-buffer queue))
>
> I haven't benchmarked it against the version you posted, but I know that Chris
> did write it specifically for the sake of performance, so it would probably be
> worth considering.
>
> --Adam
Hah! Coincidentally, I already did a comparison benchmark yesterday. I
found a gist benchmarking different shuffling functions on github and
appended an earlier version of mine for comparison.
Link:
https://gist.github.com/purcell/34824f1b676e6188540cdf71c7cc9fc4?permalink_comment_id=5191301#gistcomment-5191301
It does pretty well performance-wise and it looks like something similar
to what I'd end up with if I follow Philips advice. The main difference
is that it operates directly on whatever sequence is passed to it as
opposed to always converting it to a vector.
I'm not sure which is the better approach: leave the used sequence type
up to the caller or copy it into a vector regardless of type.
- Hugo
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-15 6:45 ` Adam Porter
@ 2024-09-15 11:58 ` Stefan Kangas
0 siblings, 0 replies; 14+ messages in thread
From: Stefan Kangas @ 2024-09-15 11:58 UTC (permalink / raw)
To: Adam Porter, Eli Zaretskii; +Cc: devel, Emacs-devel
Adam Porter <adam@alphapapa.net> writes:
> I think it should be filled because it's not an uncommon thing to want
> to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious
> how to do it from scratch, especially in a performant way. A "naive"
> implementation, without regard for performance, can perform quite poorly.
>
> That there are different sequence types, each with their own
> characteristics, further complicates the matter. For newer Elisp
> programmers, who may be unfamiliar with the various types and
> performance issues in Elisp, having a built-in function to do it
> correctly and quickly would be helpful.
>
> As it is, we have had various implementations in various third-party
> libraries for years, each with the same purpose but doing it slightly
> differently. As an example from other languages, Python provides the
> random.shuffle function for sequences. It would seem natural for
> Emacs's seq library to provide a similar function.
>
> It's not a big deal, but if I had a vote, it would be to include such a
> function in seq.el.
Makes sense to me, FWIW.
I have had slow and naive implementations of it in my init file at
various times myself.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-14 8:42 ` Philip Kaludercic
@ 2024-09-16 5:08 ` Emanuel Berg
2024-09-16 19:17 ` Adam Porter
2024-09-17 18:53 ` Yuri Khan
0 siblings, 2 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-16 5:08 UTC (permalink / raw)
To: emacs-devel; +Cc: Philip Kaludercic
Philip Kaludercic wrote:
>> (defun seq-shuffle (sequence)
>> (cl-assert (sequencep sequence))
>>
>> (let* ((vec (seq-into sequence 'vector))
>> (length (length vec))
>> (n 0)
>> (seq-type (type-of sequence)))
>> (while (< n length)
>> (let ((i (+ n (random (- length n))))
>> (tmp (aref vec n)))
>> (setf (aref vec n) (aref vec i)
>> (aref vec i) tmp))
>
> If you are already using cl-lib, then you can simplify this with
>
> (cl-rotatef (aref vec n) (aref vec i))
(seq-sort (lambda (_ __) (zerop (random 2))) seq)
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-16 5:08 ` Emanuel Berg
@ 2024-09-16 19:17 ` Adam Porter
2024-09-17 4:09 ` Emanuel Berg
2024-09-17 18:37 ` Philip Kaludercic
2024-09-17 18:53 ` Yuri Khan
1 sibling, 2 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-16 19:17 UTC (permalink / raw)
To: incal; +Cc: emacs-devel
Hi Emanuel,
That's a neat and concise solution. It seems to produce a decently
random sorting, but I'd guess that it may not perform well for large
sequences. For example:
(let (comparisons)
(list :result (seq-sort (lambda (a b)
(push (cons a b) comparisons)
(zerop (random 2)))
(number-sequence 0 10))
:num-comparisons (length comparisons)))
;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
In testing that expression repeatedly, I see that the number of
comparisons varies between about 23 and 26.
Thanks for pointing it out, though. It's good to know that it exists
and in what circumstances it could be useful.
--Adam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-16 19:17 ` Adam Porter
@ 2024-09-17 4:09 ` Emanuel Berg
2024-09-17 18:37 ` Philip Kaludercic
1 sibling, 0 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-17 4:09 UTC (permalink / raw)
To: emacs-devel
Adam Porter wrote:
> That's a neat and concise solution.
Pretty!
> It seems to produce a decently random
For better random,
https://dataswamp.org/~incal/emacs-init/random-urandom/
> sorting, but I'd guess that it may not perform well for
> large sequences. For example:
>
> (let (comparisons)
> (list :result (seq-sort (lambda (a b)
> (push (cons a b) comparisons)
> (zerop (random 2)))
> (number-sequence 0 10))
> :num-comparisons (length comparisons)))
> ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>
> In testing that expression repeatedly, I see that the number
> of comparisons varies between about 23 and 26.
Okay, but is that bad?
Isn't that linear plus logarithmic i.e. linear?
But I actually didn't know it varied, I thought it always
tested all, wouldn't that be the arithmetic sum, so for 11
items 1 + ... + 10 = 55 tests?
(defun arith-sum (n)
(cl-loop
with sum = 0
for i from 1 to (1- n)
do (cl-incf sum i)
finally return sum) )
;; (arith-sum 11) ; 55
How long are long lists? Not 11 items, if we are talking
marshalling, maybe you want to get a CS algorithm book and see
what specific algorithms they have for the purpose that scale
the best for huge sets? If linear is too slow, you
want logarithmic.
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-16 19:17 ` Adam Porter
2024-09-17 4:09 ` Emanuel Berg
@ 2024-09-17 18:37 ` Philip Kaludercic
2024-09-17 22:26 ` Adam Porter
2024-09-18 1:27 ` Emanuel Berg
1 sibling, 2 replies; 14+ messages in thread
From: Philip Kaludercic @ 2024-09-17 18:37 UTC (permalink / raw)
To: Adam Porter; +Cc: incal, emacs-devel
Adam Porter <adam@alphapapa.net> writes:
> Hi Emanuel,
>
> That's a neat and concise solution. It seems to produce a decently
> random sorting, but I'd guess that it may not perform well for large
> sequences. For example:
>
> (let (comparisons)
> (list :result (seq-sort (lambda (a b)
> (push (cons a b) comparisons)
> (zerop (random 2)))
> (number-sequence 0 10))
> :num-comparisons (length comparisons)))
> ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>
> In testing that expression repeatedly, I see that the number of
> comparisons varies between about 23 and 26.
>
> Thanks for pointing it out, though. It's good to know that it exists
> and in what circumstances it could be useful.
IIRC the approach is related to the "Naïve method" mentioned on
Wikipedia [0]. I think that this variation of your code demonstrates
that not all elements are considered equally often:
(let ((comparisons '()))
(list :result
(seq-sort (lambda (a b)
(cl-incf (alist-get a comparisons 0))
(cl-incf (alist-get b comparisons 0))
(zerop (random 2)))
(number-sequence 0 10))
:num-comparisons
(seq-sort (lambda (x y)
(< (cdr x) (cdr y)))
comparisons)))
;; (:result
;; (9 0 1 7 4 10 5 6 8 3 2)
;; :num-comparisons
;; ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8)))
While acceptable as a personal hack, it a proper solution should do
Fisher–Yates. Adding a "shuffle" function to Emacs would also eliminate
the excuse to use suboptimal fallbacks.
[0] https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Na%C3%AFve_method
>
> --Adam
>
>
--
Philip Kaludercic on siskin
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-16 5:08 ` Emanuel Berg
2024-09-16 19:17 ` Adam Porter
@ 2024-09-17 18:53 ` Yuri Khan
1 sibling, 0 replies; 14+ messages in thread
From: Yuri Khan @ 2024-09-17 18:53 UTC (permalink / raw)
To: emacs-devel; +Cc: Philip Kaludercic
On Mon, 16 Sept 2024 at 18:07, Emanuel Berg <incal@dataswamp.org> wrote:
> (seq-sort (lambda (_ __) (zerop (random 2))) seq)
Is seq-sort guaranteed to behave well with a comparison predicate that
is not a linear order though?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-17 18:37 ` Philip Kaludercic
@ 2024-09-17 22:26 ` Adam Porter
2024-09-18 1:27 ` Emanuel Berg
1 sibling, 0 replies; 14+ messages in thread
From: Adam Porter @ 2024-09-17 22:26 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: incal, emacs-devel
On 9/17/24 13:37, Philip Kaludercic wrote:
> While acceptable as a personal hack, it a proper solution should do
> Fisher–Yates. Adding a "shuffle" function to Emacs would also eliminate
> the excuse to use suboptimal fallbacks.
Agreed, FWIW.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle
2024-09-17 18:37 ` Philip Kaludercic
2024-09-17 22:26 ` Adam Porter
@ 2024-09-18 1:27 ` Emanuel Berg
1 sibling, 0 replies; 14+ messages in thread
From: Emanuel Berg @ 2024-09-18 1:27 UTC (permalink / raw)
To: emacs-devel
Philip Kaludercic wrote:
>> That's a neat and concise solution. It seems to produce
>> a decently random sorting, but I'd guess that it may not
>> perform well for large sequences. For example:
>>
>> (let (comparisons)
>> (list :result (seq-sort (lambda (a b)
>> (push (cons a b) comparisons)
>> (zerop (random 2)))
>> (number-sequence 0 10))
>> :num-comparisons (length comparisons)))
>> ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26)
>>
>> In testing that expression repeatedly, I see that the
>> number of comparisons varies between about 23 and 26.
>>
>> Thanks for pointing it out, though. It's good to know that
>> it exists and in what circumstances it could be useful.
>
> IIRC the approach is related to the "Naive method" mentioned
> on Wikipedia [0]. I think that this variation of your code
> demonstrates that not all elements are considered equally
> often:
>
> (let ((comparisons '()))
> (list :result
> (seq-sort (lambda (a b)
> (cl-incf (alist-get a comparisons 0))
> (cl-incf (alist-get b comparisons 0))
> (zerop (random 2)))
> (number-sequence 0 10))
> :num-comparisons
> (seq-sort (lambda (x y)
> (< (cdr x) (cdr y)))
> comparisons)))
> ;; (:result
> ;; (9 0 1 7 4 10 5 6 8 3 2)
> ;; :num-comparisons
> ;; ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8)))
I think it is as random as `random', as for the exact
execution of the algorithm one should examine 'seq-sort'.
Natural thing would just be to go from one end to the other,
but maybe they do some optimizations.
> While acceptable as a personal hack
Very generous of you, noble sir!
--
underground experts united
https://dataswamp.org/~incal
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-09-18 1:27 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen
2024-09-14 8:42 ` Philip Kaludercic
2024-09-16 5:08 ` Emanuel Berg
2024-09-16 19:17 ` Adam Porter
2024-09-17 4:09 ` Emanuel Berg
2024-09-17 18:37 ` Philip Kaludercic
2024-09-17 22:26 ` Adam Porter
2024-09-18 1:27 ` Emanuel Berg
2024-09-17 18:53 ` Yuri Khan
2024-09-15 1:44 ` Adam Porter
2024-09-15 6:27 ` Eli Zaretskii
2024-09-15 6:45 ` Adam Porter
2024-09-15 11:58 ` Stefan Kangas
2024-09-15 7:48 ` Hugo Thunnissen
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).