* Add seq-shuffle @ 2024-09-14 7:48 Hugo Thunnissen 2024-09-14 8:42 ` Philip Kaludercic 2024-09-15 1:44 ` Adam Porter 0 siblings, 2 replies; 14+ messages in thread From: Hugo Thunnissen @ 2024-09-14 7:48 UTC (permalink / raw) To: Emacs-devel@gnu.org Hi, I a function to randomize the order of a sequence. It is an implementation of the Fisher-Yates shuffle algorithm (as I understand it from wikipedia 🙃). I needed to randomize the order of a large list (1000000 elements) and noticed that converting it to a vector made an enormous difference in performance, so I opted to convert the sequence to a vector before shuffling it. Is there any interest to add this or a similar function to the seq library? (defun seq-shuffle (sequence) (cl-assert (sequencep sequence)) (let* ((vec (seq-into sequence 'vector)) (length (length vec)) (n 0) (seq-type (type-of sequence))) (while (< n length) (let ((i (+ n (random (- length n)))) (tmp (aref vec n))) (setf (aref vec n) (aref vec i) (aref vec i) tmp)) (cl-incf n)) (seq-into vec (if (eq 'cons seq-type) 'list seq-type)))) - Hugo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen @ 2024-09-14 8:42 ` Philip Kaludercic 2024-09-16 5:08 ` Emanuel Berg 2024-09-15 1:44 ` Adam Porter 1 sibling, 1 reply; 14+ messages in thread From: Philip Kaludercic @ 2024-09-14 8:42 UTC (permalink / raw) To: Hugo Thunnissen; +Cc: Emacs-devel@gnu.org Hugo Thunnissen <devel@hugot.nl> writes: > Hi, > > I a function to randomize the order of a sequence. It is an > implementation of the Fisher-Yates shuffle algorithm (as I understand it > from wikipedia 🙃). I needed to randomize the order of a large list > (1000000 elements) and noticed that converting it to a vector made an > enormous difference in performance, so I opted to convert the sequence > to a vector before shuffling it. > > Is there any interest to add this or a similar function to the seq > library? > > > (defun seq-shuffle (sequence) > (cl-assert (sequencep sequence)) > > (let* ((vec (seq-into sequence 'vector)) > (length (length vec)) > (n 0) > (seq-type (type-of sequence))) > (while (< n length) > (let ((i (+ n (random (- length n)))) > (tmp (aref vec n))) > (setf (aref vec n) (aref vec i) > (aref vec i) tmp)) If you are already using cl-lib, then you can simplify this with (cl-rotatef (aref vec n) (aref vec i)) > (cl-incf n)) and use a `dotimes' instead of counting manually. > (seq-into vec (if (eq 'cons seq-type) 'list seq-type)))) I think this would be nice to have, but you should 1. Prepare a git patch and send it to bug-gnu-emacs@gnu.org 2. Probably prepare a destructive version, so that shuffling doesn't always have to cons new memory. You can then just invoke the destructive seq-nshuffle in seq-shuffle. > - Hugo > > -- Philip Kaludercic on siskin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-14 8:42 ` Philip Kaludercic @ 2024-09-16 5:08 ` Emanuel Berg 2024-09-16 19:17 ` Adam Porter 2024-09-17 18:53 ` Yuri Khan 0 siblings, 2 replies; 14+ messages in thread From: Emanuel Berg @ 2024-09-16 5:08 UTC (permalink / raw) To: emacs-devel; +Cc: Philip Kaludercic Philip Kaludercic wrote: >> (defun seq-shuffle (sequence) >> (cl-assert (sequencep sequence)) >> >> (let* ((vec (seq-into sequence 'vector)) >> (length (length vec)) >> (n 0) >> (seq-type (type-of sequence))) >> (while (< n length) >> (let ((i (+ n (random (- length n)))) >> (tmp (aref vec n))) >> (setf (aref vec n) (aref vec i) >> (aref vec i) tmp)) > > If you are already using cl-lib, then you can simplify this with > > (cl-rotatef (aref vec n) (aref vec i)) (seq-sort (lambda (_ __) (zerop (random 2))) seq) -- underground experts united https://dataswamp.org/~incal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-16 5:08 ` Emanuel Berg @ 2024-09-16 19:17 ` Adam Porter 2024-09-17 4:09 ` Emanuel Berg 2024-09-17 18:37 ` Philip Kaludercic 2024-09-17 18:53 ` Yuri Khan 1 sibling, 2 replies; 14+ messages in thread From: Adam Porter @ 2024-09-16 19:17 UTC (permalink / raw) To: incal; +Cc: emacs-devel Hi Emanuel, That's a neat and concise solution. It seems to produce a decently random sorting, but I'd guess that it may not perform well for large sequences. For example: (let (comparisons) (list :result (seq-sort (lambda (a b) (push (cons a b) comparisons) (zerop (random 2))) (number-sequence 0 10)) :num-comparisons (length comparisons))) ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26) In testing that expression repeatedly, I see that the number of comparisons varies between about 23 and 26. Thanks for pointing it out, though. It's good to know that it exists and in what circumstances it could be useful. --Adam ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-16 19:17 ` Adam Porter @ 2024-09-17 4:09 ` Emanuel Berg 2024-09-17 18:37 ` Philip Kaludercic 1 sibling, 0 replies; 14+ messages in thread From: Emanuel Berg @ 2024-09-17 4:09 UTC (permalink / raw) To: emacs-devel Adam Porter wrote: > That's a neat and concise solution. Pretty! > It seems to produce a decently random For better random, https://dataswamp.org/~incal/emacs-init/random-urandom/ > sorting, but I'd guess that it may not perform well for > large sequences. For example: > > (let (comparisons) > (list :result (seq-sort (lambda (a b) > (push (cons a b) comparisons) > (zerop (random 2))) > (number-sequence 0 10)) > :num-comparisons (length comparisons))) > ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26) > > In testing that expression repeatedly, I see that the number > of comparisons varies between about 23 and 26. Okay, but is that bad? Isn't that linear plus logarithmic i.e. linear? But I actually didn't know it varied, I thought it always tested all, wouldn't that be the arithmetic sum, so for 11 items 1 + ... + 10 = 55 tests? (defun arith-sum (n) (cl-loop with sum = 0 for i from 1 to (1- n) do (cl-incf sum i) finally return sum) ) ;; (arith-sum 11) ; 55 How long are long lists? Not 11 items, if we are talking marshalling, maybe you want to get a CS algorithm book and see what specific algorithms they have for the purpose that scale the best for huge sets? If linear is too slow, you want logarithmic. -- underground experts united https://dataswamp.org/~incal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-16 19:17 ` Adam Porter 2024-09-17 4:09 ` Emanuel Berg @ 2024-09-17 18:37 ` Philip Kaludercic 2024-09-17 22:26 ` Adam Porter 2024-09-18 1:27 ` Emanuel Berg 1 sibling, 2 replies; 14+ messages in thread From: Philip Kaludercic @ 2024-09-17 18:37 UTC (permalink / raw) To: Adam Porter; +Cc: incal, emacs-devel Adam Porter <adam@alphapapa.net> writes: > Hi Emanuel, > > That's a neat and concise solution. It seems to produce a decently > random sorting, but I'd guess that it may not perform well for large > sequences. For example: > > (let (comparisons) > (list :result (seq-sort (lambda (a b) > (push (cons a b) comparisons) > (zerop (random 2))) > (number-sequence 0 10)) > :num-comparisons (length comparisons))) > ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26) > > In testing that expression repeatedly, I see that the number of > comparisons varies between about 23 and 26. > > Thanks for pointing it out, though. It's good to know that it exists > and in what circumstances it could be useful. IIRC the approach is related to the "Naïve method" mentioned on Wikipedia [0]. I think that this variation of your code demonstrates that not all elements are considered equally often: (let ((comparisons '())) (list :result (seq-sort (lambda (a b) (cl-incf (alist-get a comparisons 0)) (cl-incf (alist-get b comparisons 0)) (zerop (random 2))) (number-sequence 0 10)) :num-comparisons (seq-sort (lambda (x y) (< (cdr x) (cdr y))) comparisons))) ;; (:result ;; (9 0 1 7 4 10 5 6 8 3 2) ;; :num-comparisons ;; ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8))) While acceptable as a personal hack, it a proper solution should do Fisher–Yates. Adding a "shuffle" function to Emacs would also eliminate the excuse to use suboptimal fallbacks. [0] https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Na%C3%AFve_method > > --Adam > > -- Philip Kaludercic on siskin ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-17 18:37 ` Philip Kaludercic @ 2024-09-17 22:26 ` Adam Porter 2024-09-18 1:27 ` Emanuel Berg 1 sibling, 0 replies; 14+ messages in thread From: Adam Porter @ 2024-09-17 22:26 UTC (permalink / raw) To: Philip Kaludercic; +Cc: incal, emacs-devel On 9/17/24 13:37, Philip Kaludercic wrote: > While acceptable as a personal hack, it a proper solution should do > Fisher–Yates. Adding a "shuffle" function to Emacs would also eliminate > the excuse to use suboptimal fallbacks. Agreed, FWIW. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-17 18:37 ` Philip Kaludercic 2024-09-17 22:26 ` Adam Porter @ 2024-09-18 1:27 ` Emanuel Berg 1 sibling, 0 replies; 14+ messages in thread From: Emanuel Berg @ 2024-09-18 1:27 UTC (permalink / raw) To: emacs-devel Philip Kaludercic wrote: >> That's a neat and concise solution. It seems to produce >> a decently random sorting, but I'd guess that it may not >> perform well for large sequences. For example: >> >> (let (comparisons) >> (list :result (seq-sort (lambda (a b) >> (push (cons a b) comparisons) >> (zerop (random 2))) >> (number-sequence 0 10)) >> :num-comparisons (length comparisons))) >> ;; (:result (6 0 9 1 5 10 8 2 7 3 4) :num-comparisons 26) >> >> In testing that expression repeatedly, I see that the >> number of comparisons varies between about 23 and 26. >> >> Thanks for pointing it out, though. It's good to know that >> it exists and in what circumstances it could be useful. > > IIRC the approach is related to the "Naive method" mentioned > on Wikipedia [0]. I think that this variation of your code > demonstrates that not all elements are considered equally > often: > > (let ((comparisons '())) > (list :result > (seq-sort (lambda (a b) > (cl-incf (alist-get a comparisons 0)) > (cl-incf (alist-get b comparisons 0)) > (zerop (random 2))) > (number-sequence 0 10)) > :num-comparisons > (seq-sort (lambda (x y) > (< (cdr x) (cdr y))) > comparisons))) > ;; (:result > ;; (9 0 1 7 4 10 5 6 8 3 2) > ;; :num-comparisons > ;; ((0 . 2) (10 . 3) (8 . 3) (9 . 4) (7 . 4) (6 . 4) (4 . 5) (2 . 5) (3 . 7) (1 . 7) (5 . 8))) I think it is as random as `random', as for the exact execution of the algorithm one should examine 'seq-sort'. Natural thing would just be to go from one end to the other, but maybe they do some optimizations. > While acceptable as a personal hack Very generous of you, noble sir! -- underground experts united https://dataswamp.org/~incal ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-16 5:08 ` Emanuel Berg 2024-09-16 19:17 ` Adam Porter @ 2024-09-17 18:53 ` Yuri Khan 1 sibling, 0 replies; 14+ messages in thread From: Yuri Khan @ 2024-09-17 18:53 UTC (permalink / raw) To: emacs-devel; +Cc: Philip Kaludercic On Mon, 16 Sept 2024 at 18:07, Emanuel Berg <incal@dataswamp.org> wrote: > (seq-sort (lambda (_ __) (zerop (random 2))) seq) Is seq-sort guaranteed to behave well with a comparison predicate that is not a linear order though? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen 2024-09-14 8:42 ` Philip Kaludercic @ 2024-09-15 1:44 ` Adam Porter 2024-09-15 6:27 ` Eli Zaretskii 2024-09-15 7:48 ` Hugo Thunnissen 1 sibling, 2 replies; 14+ messages in thread From: Adam Porter @ 2024-09-15 1:44 UTC (permalink / raw) To: devel; +Cc: Emacs-devel Hi Hugo, Thanks, this is a gap in functionality that should be filled. FWIW, in my listen.el package on ELPA, I have this function, the algorithm of which I copied from Chris Wellons's elfeed-shuffle function in his Elfeed package (which he placed into the Public Domain). (defun listen-queue-shuffle (queue) "Shuffle QUEUE." (interactive (list (listen-queue-complete))) ;; Copied from `elfeed-shuffle'. (let* ((tracks (listen-queue-tracks queue)) (current-track (listen-queue-current queue)) n) (when current-track (cl-callf2 delete current-track tracks)) (setf n (length tracks)) ;; Don't use dotimes result (bug#16206) (dotimes (i n) (cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i)))))) (when current-track (push current-track tracks)) (setf (listen-queue-tracks queue) tracks)) (listen-queue--update-buffer queue)) I haven't benchmarked it against the version you posted, but I know that Chris did write it specifically for the sake of performance, so it would probably be worth considering. --Adam ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-15 1:44 ` Adam Porter @ 2024-09-15 6:27 ` Eli Zaretskii 2024-09-15 6:45 ` Adam Porter 2024-09-15 7:48 ` Hugo Thunnissen 1 sibling, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2024-09-15 6:27 UTC (permalink / raw) To: Adam Porter; +Cc: devel, Emacs-devel > Date: Sat, 14 Sep 2024 20:44:16 -0500 > Cc: Emacs-devel@gnu.org > From: Adam Porter <adam@alphapapa.net> > > Thanks, this is a gap in functionality that should be filled. Not every gap _should_ be filled, no. Only those we think (and have some basis to think) it is going to be missed by enough real-life Lisp programs. So saying "should be filled" without explaining why you think it should be filled is less convincing than it could have been. Thus, I urge you to add motivation for your opinion. TIA. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-15 6:27 ` Eli Zaretskii @ 2024-09-15 6:45 ` Adam Porter 2024-09-15 11:58 ` Stefan Kangas 0 siblings, 1 reply; 14+ messages in thread From: Adam Porter @ 2024-09-15 6:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: devel, Emacs-devel On 9/15/24 01:27, Eli Zaretskii wrote: >> Date: Sat, 14 Sep 2024 20:44:16 -0500 >> Cc: Emacs-devel@gnu.org >> From: Adam Porter <adam@alphapapa.net> >> >> Thanks, this is a gap in functionality that should be filled. > > Not every gap _should_ be filled, no. Only those we think (and have > some basis to think) it is going to be missed by enough real-life Lisp > programs. > > So saying "should be filled" without explaining why you think it > should be filled is less convincing than it could have been. Thus, I > urge you to add motivation for your opinion. TIA. I think it should be filled because it's not an uncommon thing to want to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious how to do it from scratch, especially in a performant way. A "naive" implementation, without regard for performance, can perform quite poorly. That there are different sequence types, each with their own characteristics, further complicates the matter. For newer Elisp programmers, who may be unfamiliar with the various types and performance issues in Elisp, having a built-in function to do it correctly and quickly would be helpful. As it is, we have had various implementations in various third-party libraries for years, each with the same purpose but doing it slightly differently. As an example from other languages, Python provides the random.shuffle function for sequences. It would seem natural for Emacs's seq library to provide a similar function. It's not a big deal, but if I had a vote, it would be to include such a function in seq.el. --Adam ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-15 6:45 ` Adam Porter @ 2024-09-15 11:58 ` Stefan Kangas 0 siblings, 0 replies; 14+ messages in thread From: Stefan Kangas @ 2024-09-15 11:58 UTC (permalink / raw) To: Adam Porter, Eli Zaretskii; +Cc: devel, Emacs-devel Adam Porter <adam@alphapapa.net> writes: > I think it should be filled because it's not an uncommon thing to want > to do (e.g. it's needed in Listen.el and Elfeed), and it's not obvious > how to do it from scratch, especially in a performant way. A "naive" > implementation, without regard for performance, can perform quite poorly. > > That there are different sequence types, each with their own > characteristics, further complicates the matter. For newer Elisp > programmers, who may be unfamiliar with the various types and > performance issues in Elisp, having a built-in function to do it > correctly and quickly would be helpful. > > As it is, we have had various implementations in various third-party > libraries for years, each with the same purpose but doing it slightly > differently. As an example from other languages, Python provides the > random.shuffle function for sequences. It would seem natural for > Emacs's seq library to provide a similar function. > > It's not a big deal, but if I had a vote, it would be to include such a > function in seq.el. Makes sense to me, FWIW. I have had slow and naive implementations of it in my init file at various times myself. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Add seq-shuffle 2024-09-15 1:44 ` Adam Porter 2024-09-15 6:27 ` Eli Zaretskii @ 2024-09-15 7:48 ` Hugo Thunnissen 1 sibling, 0 replies; 14+ messages in thread From: Hugo Thunnissen @ 2024-09-15 7:48 UTC (permalink / raw) To: Adam Porter; +Cc: Emacs-devel Adam Porter <adam@alphapapa.net> writes: > Hi Hugo, > > Thanks, this is a gap in functionality that should be filled. > > FWIW, in my listen.el package on ELPA, I have this function, the algorithm of > which I copied from Chris Wellons's elfeed-shuffle function in his Elfeed > package (which he placed into the Public Domain). > > (defun listen-queue-shuffle (queue) > "Shuffle QUEUE." > (interactive (list (listen-queue-complete))) > ;; Copied from `elfeed-shuffle'. > (let* ((tracks (listen-queue-tracks queue)) > (current-track (listen-queue-current queue)) > n) > (when current-track > (cl-callf2 delete current-track tracks)) > (setf n (length tracks)) > ;; Don't use dotimes result (bug#16206) > (dotimes (i n) > (cl-rotatef (elt tracks i) (elt tracks (+ i (cl-random (- n i)))))) > (when current-track > (push current-track tracks)) > (setf (listen-queue-tracks queue) tracks)) > (listen-queue--update-buffer queue)) > > I haven't benchmarked it against the version you posted, but I know that Chris > did write it specifically for the sake of performance, so it would probably be > worth considering. > > --Adam Hah! Coincidentally, I already did a comparison benchmark yesterday. I found a gist benchmarking different shuffling functions on github and appended an earlier version of mine for comparison. Link: https://gist.github.com/purcell/34824f1b676e6188540cdf71c7cc9fc4?permalink_comment_id=5191301#gistcomment-5191301 It does pretty well performance-wise and it looks like something similar to what I'd end up with if I follow Philips advice. The main difference is that it operates directly on whatever sequence is passed to it as opposed to always converting it to a vector. I'm not sure which is the better approach: leave the used sequence type up to the caller or copy it into a vector regardless of type. - Hugo ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-09-18 1:27 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-14 7:48 Add seq-shuffle Hugo Thunnissen 2024-09-14 8:42 ` Philip Kaludercic 2024-09-16 5:08 ` Emanuel Berg 2024-09-16 19:17 ` Adam Porter 2024-09-17 4:09 ` Emanuel Berg 2024-09-17 18:37 ` Philip Kaludercic 2024-09-17 22:26 ` Adam Porter 2024-09-18 1:27 ` Emanuel Berg 2024-09-17 18:53 ` Yuri Khan 2024-09-15 1:44 ` Adam Porter 2024-09-15 6:27 ` Eli Zaretskii 2024-09-15 6:45 ` Adam Porter 2024-09-15 11:58 ` Stefan Kangas 2024-09-15 7:48 ` Hugo Thunnissen
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).