* bug#56521: Add 'take' list operation [PATCH]
@ 2022-07-12 17:05 Mattias Engdegård
2022-07-12 22:51 ` Lars Ingebrigtsen
0 siblings, 1 reply; 5+ messages in thread
From: Mattias Engdegård @ 2022-07-12 17:05 UTC (permalink / raw)
To: 56521
[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]
A basic list operation that is often useful is 'take', that extracts the N first elements of a list. The attached patch adds 'take' as a built-in function.
We already have its complement, `drop`, under the name `nthcdr`. (I wouldn't mind adding `drop` as an alias but it's not very important.)
How would it compare to existing alternatives, or to a pure Lisp implementation? Here are relative timings (smaller is better), sans GC.
N/M means taking N elements from a list of M.
| | 5/10 | 9999/10000 | 1/10 | 1/0 |
|------------+------+------------+------+------|
| take | 1.00 | 1.00 | 1.00 | 1.00 |
|------------+------+------------+------+------|
| seq-take | 4.90 | 3.45 | 5.82 | 4.95 |
| -take | 4.01 | 4.92 | 2.86 | 1.54 |
|------------+------+------------+------+------|
| fwd | 2.89 | 3.76 | 1.77 | 1.25 |
| rev | 2.90 | 3.45 | 2.06 | 1.38 |
| cl-loop | 3.18 | 3.82 | 2.21 | 1.65 |
| butlast | 3.58 | 1.73 | 6.59 | 2.33 |
|------------+------+------------+------+------|
| ntake (el) | 0.80 | 0.20 | 1.51 | 1.40 |
| ntake (C) | 0.48 | 0.41 | 0.83 | 0.99 |
`seq-take` and `-take` are existing implementations from seq.el and dash.el respectively. `fwd`, `rev`, `cl-loop` and `butlast` are
alternative implementations in elisp. `ntake` are elisp and C implementations of a destructive take operation
A common replacement is using `butlast` but as we see that is very inefficient (worse than the table suggests since GC costs are not included). It also doesn't work for circular or dotted lists.
The C implementation would, of course, be used to implement `seq-take` for lists and speed up existing code.
Adding `ntake` as a C primitive is not quite as beneficial since the elisp implementation does fairly well. (I'm not sure why Elisp is actually faster than C in the 9999/10000 case; maybe I made a mistake, or it has something to do with `nthcdr` having its own bytecode). I included `ntake` in the patch for reference and because it's Lisp tradition to have faster destructive 'n-' variants of list functions.
The patch is just a draft; there will be tests and manual text.
There are details that can be debated. For example, we don't error on negative N. It might be useful to allow the compiler to partially evaluate calls where N<2 or LIST=nil.
[-- Attachment #2: take.diff --]
[-- Type: application/octet-stream, Size: 2057 bytes --]
diff --git a/src/fns.c b/src/fns.c
index 1f57e675b1..2f507573a1 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -1557,6 +1557,61 @@ substring_both (Lisp_Object string, ptrdiff_t from, ptrdiff_t from_byte,
return res;
}
\f
+DEFUN ("take", Ftake, Stake, 2, 2, 0,
+ doc: /* Return the first N elements of LIST.
+If N is zero or negative, return nil.
+If LIST is not more than N elements long, return it (or a copy). */)
+ (Lisp_Object n, Lisp_Object list)
+{
+ CHECK_FIXNUM (n);
+ EMACS_INT m = XFIXNUM (n);
+ if (m <= 0)
+ return Qnil;
+ CHECK_LIST (list);
+ if (NILP (list))
+ return Qnil;
+ Lisp_Object ret = Fcons (XCAR (list), Qnil);
+ Lisp_Object prev = ret;
+ m--;
+ list = XCDR (list);
+ while (m > 0 && CONSP (list))
+ {
+ Lisp_Object p = Fcons (XCAR (list), Qnil);
+ XSETCDR (prev, p);
+ prev = p;
+ m--;
+ list = XCDR (list);
+ }
+ if (m > 0 && !NILP (list))
+ wrong_type_argument (Qlistp, list);
+ return ret;
+}
+
+DEFUN ("ntake", Fntake, Sntake, 2, 2, 0,
+ doc: /* Return the first N elements of LIST, destructively.
+If N is zero or negative, return nil.
+If LIST is not more than N elements long, return it. */)
+ (Lisp_Object n, Lisp_Object list)
+{
+ CHECK_FIXNUM (n);
+ EMACS_INT m = XFIXNUM (n);
+ if (m <= 0)
+ return Qnil;
+ CHECK_LIST (list);
+ Lisp_Object tail = list;
+ --m;
+ while (m > 0 && CONSP (tail))
+ {
+ tail = XCDR (tail);
+ m--;
+ }
+ if (CONSP (tail))
+ XSETCDR (tail, Qnil);
+ else if (!NILP (tail))
+ wrong_type_argument (Qlistp, list);
+ return list;
+}
+
DEFUN ("nthcdr", Fnthcdr, Snthcdr, 2, 2, 0,
doc: /* Take cdr N times on LIST, return the result. */)
(Lisp_Object n, Lisp_Object list)
@@ -6082,6 +6137,8 @@ syms_of_fns (void)
defsubr (&Scopy_alist);
defsubr (&Ssubstring);
defsubr (&Ssubstring_no_properties);
+ defsubr (&Stake);
+ defsubr (&Sntake);
defsubr (&Snthcdr);
defsubr (&Snth);
defsubr (&Selt);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* bug#56521: Add 'take' list operation [PATCH]
2022-07-12 17:05 bug#56521: Add 'take' list operation [PATCH] Mattias Engdegård
@ 2022-07-12 22:51 ` Lars Ingebrigtsen
2022-07-13 13:18 ` Mattias Engdegård
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ingebrigtsen @ 2022-07-12 22:51 UTC (permalink / raw)
To: Mattias Engdegård; +Cc: 56521
Mattias Engdegård <mattiase@acm.org> writes:
> A basic list operation that is often useful is 'take', that extracts
> the N first elements of a list. The attached patch adds 'take' as a
> built-in function.
I'm not against adding this, but I'm not sure of the utility. That is,
if you have performance critical code, you don't use `take', because it
generates much garbage -- instead you loop over a list, taking the
elements a few at a time explicitly. And if you don't have performance
critical code, then why not use `seq-take'?
So a fast `take' seems to me to land in that uncanny zone of "you don't
really need that", but there may, of course, be usage scenarios that I'm
not aware of.
> I included `ntake` in the patch for reference and because it's Lisp
> tradition to have faster destructive 'n-' variants of list functions.
I think `ntake' would be a more popular function than `take', really --
because shortening a list is efficient, and something that's not
uncommon to do, and it's a bother to write
(setcdr (nthcdr 10 list) nil)
especially since that may bug out if the list is shorter. But as you
say, the Lisp version of ntake may well be faster than the C version, so
perhaps just stick it in subr.el instead?
But I'm not sure about the `ntake' name. 😀 It's natural in one way,
but since the main point is that the list is shortened, then perhaps
`list-shorten' (or something like that) is more appropriate.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#56521: Add 'take' list operation [PATCH]
2022-07-12 22:51 ` Lars Ingebrigtsen
@ 2022-07-13 13:18 ` Mattias Engdegård
2022-07-17 16:00 ` Mattias Engdegård
0 siblings, 1 reply; 5+ messages in thread
From: Mattias Engdegård @ 2022-07-13 13:18 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 56521
[-- Attachment #1: Type: text/plain, Size: 2239 bytes --]
13 juli 2022 kl. 00.51 skrev Lars Ingebrigtsen <larsi@gnus.org>:
> I'm not against adding this, but I'm not sure of the utility. That is,
> if you have performance critical code, you don't use `take', because it
> generates much garbage -- instead you loop over a list, taking the
> elements a few at a time explicitly. And if you don't have performance
> critical code, then why not use `seq-take'?
Thank you, but it is definitely not the case that performance is either of utmost importance or not at all. In fact, most code falls between those extremes. In particular, for elaborate packages such as Magit, Org, Calc, the byte compiler, many programming modes etc, managing complexity and correctness matter most but performance isn't unimportant.
Another way of looking at it is that we add `take` for convenience, and implement it in C for performance.
And don't underestimate the utility of `take` itself -- it's far from always possible or desired to do immediate work on the extracted prefix. I keep seeing it reimplemented (badly) and have done so several times myself (also badly).
It's also useful in conjunction with drop (nthcdr) since (append (take N L) (drop N L)) = L for all N and L.
For example, (take N (drop M L)) returns a sublist; (append (take N L) (drop M L)) removes a sublist. The primitives are easy to reason about.
> But as you
> say, the Lisp version of ntake may well be faster than the C version, so
> perhaps just stick it in subr.el instead?
After a more careful look, that anomaly was probably just a measurement fluke. On the whole the C implementation is reliably faster, although naturally the difference isn't nearly as big as for `take`.
> But I'm not sure about the `ntake' name.
It follows an established convention and makes it easy for the user to see the relationship with `take` as well as understand how they differ.
(We have practically no list primitive starting with 'list-'; those are commands that list something.)
If you prefer we could name it `destructive-take` or `take-destructively` but these are annoyingly long, lack precedence, and are not really better.
Attached is a more complete patch, now with tests and documentation updates.
[-- Attachment #2: 0001-Add-take-and-ntake-bug-56521.patch --]
[-- Type: application/octet-stream, Size: 7787 bytes --]
From dd18df350bb496aaa06d240b1ae71862a048d82d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Wed, 13 Jul 2022 13:46:52 +0200
Subject: [PATCH] Add `take` and `ntake` (bug#56521)
These are useful list primitives, complementary to `nthcdr`.
* src/fns.c (Ftake, Fntake): New.
(syms_of_fns): Defsubr them.
* doc/lispref/lists.texi (List Elements):
* lisp/emacs-lisp/shortdoc.el (list): Document
* lisp/emacs-lisp/byte-opt.el (side-effect-free-fns, pure-fns):
Declare `take` pure and side-effect-free.
* test/src/fns-tests.el (fns-tests--take-ref, fns--take-ntake):
New test.
---
doc/lispref/lists.texi | 29 +++++++++++++++++++
lisp/emacs-lisp/byte-opt.el | 4 +--
lisp/emacs-lisp/shortdoc.el | 4 +++
src/fns.c | 57 +++++++++++++++++++++++++++++++++++++
test/src/fns-tests.el | 49 +++++++++++++++++++++++++++++++
5 files changed, 141 insertions(+), 2 deletions(-)
diff --git a/doc/lispref/lists.texi b/doc/lispref/lists.texi
index a4f0ba815b..2a9ad1d5e0 100644
--- a/doc/lispref/lists.texi
+++ b/doc/lispref/lists.texi
@@ -340,6 +340,35 @@ List Elements
@end example
@end defun
+@defun take n list
+This function returns the @var{n} first elements of @var{list}. Essentially,
+it returns the part of @var{list} that @code{nthcdr} skips.
+
+@code{take} returns @var{list} if it is shorter than @var{n} elements;
+it returns @code{nil} if @var{n} is zero or negative.
+
+@example
+@group
+(take 3 '(a b c d))
+ @result{} (a b c)
+@end group
+@group
+(take 10 '(a b c d))
+ @result{} (a b c d)
+@end group
+@group
+(take 0 '(a b c d))
+ @result{} nil
+@end group
+@end example
+@end defun
+
+@defun ntake n list
+This is a version of @code{take} that works by destructively modifying
+the list structure of the argument. That makes it faster, but the
+original value of @var{list} is lost.
+@end defun
+
@defun last list &optional n
This function returns the last link of @var{list}. The @code{car} of
this link is the list's last element. If @var{list} is null,
diff --git a/lisp/emacs-lisp/byte-opt.el b/lisp/emacs-lisp/byte-opt.el
index b7147a7f50..04f766446c 100644
--- a/lisp/emacs-lisp/byte-opt.el
+++ b/lisp/emacs-lisp/byte-opt.el
@@ -1373,7 +1373,7 @@ byte-optimize-set
symbol-function symbol-name symbol-plist symbol-value string-make-unibyte
string-make-multibyte string-as-multibyte string-as-unibyte
string-to-multibyte
- tan time-convert truncate
+ take tan time-convert truncate
unibyte-char-to-multibyte upcase user-full-name
user-login-name user-original-login-name custom-variable-p
vconcat
@@ -1449,7 +1449,7 @@ byte-optimize-set
;; values if a marker is moved.
(let ((pure-fns
- '(concat regexp-opt regexp-quote
+ '(concat take regexp-opt regexp-quote
string-to-char string-to-syntax symbol-name
eq eql
= /= < <= >= > min max
diff --git a/lisp/emacs-lisp/shortdoc.el b/lisp/emacs-lisp/shortdoc.el
index a2d954cadb..1514ece6d1 100644
--- a/lisp/emacs-lisp/shortdoc.el
+++ b/lisp/emacs-lisp/shortdoc.el
@@ -595,6 +595,10 @@ list
:eval (nth 1 '(one two three)))
(nthcdr
:eval (nthcdr 1 '(one two three)))
+ (take
+ :eval (take 3 '(one two three four)))
+ (ntake
+ :eval (ntake 3 (list 'one 'two 'three 'four)))
(elt
:eval (elt '(one two three) 1))
(car-safe
diff --git a/src/fns.c b/src/fns.c
index 1f57e675b1..84cfec6c3f 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -1557,6 +1557,61 @@ substring_both (Lisp_Object string, ptrdiff_t from, ptrdiff_t from_byte,
return res;
}
\f
+DEFUN ("take", Ftake, Stake, 2, 2, 0,
+ doc: /* Return the first N elements of LIST.
+If N is zero or negative, return nil.
+If LIST is no more than N elements long, return it (or a copy). */)
+ (Lisp_Object n, Lisp_Object list)
+{
+ CHECK_FIXNUM (n);
+ EMACS_INT m = XFIXNUM (n);
+ if (m <= 0)
+ return Qnil;
+ CHECK_LIST (list);
+ if (NILP (list))
+ return Qnil;
+ Lisp_Object ret = Fcons (XCAR (list), Qnil);
+ Lisp_Object prev = ret;
+ m--;
+ list = XCDR (list);
+ while (m > 0 && CONSP (list))
+ {
+ Lisp_Object p = Fcons (XCAR (list), Qnil);
+ XSETCDR (prev, p);
+ prev = p;
+ m--;
+ list = XCDR (list);
+ }
+ if (m > 0 && !NILP (list))
+ wrong_type_argument (Qlistp, list);
+ return ret;
+}
+
+DEFUN ("ntake", Fntake, Sntake, 2, 2, 0,
+ doc: /* Modify LIST to keep only the first N elements.
+If N is zero or negative, return nil.
+If LIST is no more than N elements long, return it. */)
+ (Lisp_Object n, Lisp_Object list)
+{
+ CHECK_FIXNUM (n);
+ EMACS_INT m = XFIXNUM (n);
+ if (m <= 0)
+ return Qnil;
+ CHECK_LIST (list);
+ Lisp_Object tail = list;
+ --m;
+ while (m > 0 && CONSP (tail))
+ {
+ tail = XCDR (tail);
+ m--;
+ }
+ if (CONSP (tail))
+ XSETCDR (tail, Qnil);
+ else if (!NILP (tail))
+ wrong_type_argument (Qlistp, list);
+ return list;
+}
+
DEFUN ("nthcdr", Fnthcdr, Snthcdr, 2, 2, 0,
doc: /* Take cdr N times on LIST, return the result. */)
(Lisp_Object n, Lisp_Object list)
@@ -6082,6 +6137,8 @@ syms_of_fns (void)
defsubr (&Scopy_alist);
defsubr (&Ssubstring);
defsubr (&Ssubstring_no_properties);
+ defsubr (&Stake);
+ defsubr (&Sntake);
defsubr (&Snthcdr);
defsubr (&Snth);
defsubr (&Selt);
diff --git a/test/src/fns-tests.el b/test/src/fns-tests.el
index 20074ca0d2..a84cce3ad4 100644
--- a/test/src/fns-tests.el
+++ b/test/src/fns-tests.el
@@ -1365,4 +1365,53 @@ fns--string-to-unibyte-multibyte
(should-error (string-to-unibyte "å"))
(should-error (string-to-unibyte "ABC∀BC")))
+(defun fns-tests--take-ref (n list)
+ "Reference implementation of `take'."
+ (named-let loop ((m n) (tail list) (ac nil))
+ (if (and (> m 0) tail)
+ (loop (1- m) (cdr tail) (cons (car tail) ac))
+ (nreverse ac))))
+
+(ert-deftest fns--take-ntake ()
+ "Test `take' and `ntake'."
+ ;; Check errors and edge cases.
+ (should-error (take 'x '(a)))
+ (should-error (ntake 'x '(a)))
+ (should-error (take 1 'a))
+ (should-error (ntake 1 'a))
+ (should-error (take 2 '(a . b)))
+ (should-error (ntake 2 '(a . b)))
+ ;; Tolerate non-lists for a count of zero.
+ (should (equal (take 0 'a) nil))
+ (should (equal (ntake 0 'a) nil))
+ ;; But not non-numbers for empty lists.
+ (should-error (take 'x nil))
+ (should-error (ntake 'x nil))
+
+ (dolist (list '(nil (a) (a b) (a b c) (a b c d) (a . b) (a b . c)))
+ (ert-info ((prin1-to-string list) :prefix "list: ")
+ (let ((max (if (proper-list-p list)
+ (+ 2 (length list))
+ (safe-length list))))
+ (dolist (n (number-sequence -1 max))
+ (ert-info ((prin1-to-string n) :prefix "n: ")
+ (let* ((l (copy-tree list))
+ (ref (fns-tests--take-ref n l)))
+ (should (equal (take n l) ref))
+ (should (equal l list))
+ (should (equal (ntake n l) ref))))))))
+
+ ;; Circular list.
+ (let ((list (list 'a 'b 'c)))
+ (setcdr (nthcdr 2 list) (cdr list)) ; list now (a b c b c b c ...)
+ (should (equal (take 0 list) nil))
+ (should (equal (take 1 list) '(a)))
+ (should (equal (take 2 list) '(a b)))
+ (should (equal (take 3 list) '(a b c)))
+ (should (equal (take 4 list) '(a b c b)))
+ (should (equal (take 5 list) '(a b c b c)))
+ (should (equal (take 10 list) '(a b c b c b c b c b)))
+
+ (should (equal (ntake 10 list) '(a b)))))
+
;;; fns-tests.el ends here
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* bug#56521: Add 'take' list operation [PATCH]
2022-07-13 13:18 ` Mattias Engdegård
@ 2022-07-17 16:00 ` Mattias Engdegård
2022-07-18 12:50 ` Mattias Engdegård
0 siblings, 1 reply; 5+ messages in thread
From: Mattias Engdegård @ 2022-07-17 16:00 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 56521
Not much argument here; pushed to master. A couple of notes:
I went for the C implementation of `ntake` because I already had the code, and because an elisp implementation would be slower than `take` for short lists or small N, and there very point of `ntake` is speed.
`nthcdr` has some elaborate code for handling bignum arguments which makes some kind of sense since it's defined as an iteration of `cdr`. Currently, `take` and `ntake` require fixnums because bignums didn't seem to be worth the trouble.
Before closing the bug, I'll make the new code pay their way by adding some calls. `seq-take` is first in line for a makeover.
^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#56521: Add 'take' list operation [PATCH]
2022-07-17 16:00 ` Mattias Engdegård
@ 2022-07-18 12:50 ` Mattias Engdegård
0 siblings, 0 replies; 5+ messages in thread
From: Mattias Engdegård @ 2022-07-18 12:50 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 56521-done
> `seq-take` is first in line for a makeover.
Done. Naturally there are many more instances of code implementing `take` and `ntake` in creative ways but those will have to wait.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-18 12:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-12 17:05 bug#56521: Add 'take' list operation [PATCH] Mattias Engdegård
2022-07-12 22:51 ` Lars Ingebrigtsen
2022-07-13 13:18 ` Mattias Engdegård
2022-07-17 16:00 ` Mattias Engdegård
2022-07-18 12:50 ` Mattias Engdegård
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).