unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#69709: `sort` interface improvement and universal ordering predicate
@ 2024-03-10 13:28 Mattias Engdegård
  2024-03-10 14:09 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-10 13:28 UTC (permalink / raw)
  To: 69709

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

The existing `sort` interface suffers from some usability problems:

1. Writing an ordering predicate is often difficult and error-prone, even for very basic tasks such as selecting the field to sort on. It's not uncommon to botch a predicate as simple as

  (lambda (a b) (< (f a) (f b)))

which I've managed to do myself more than once. It gets particularly messy when sorting on multiple fields.
Having to write a custom comparison function for almost every occasion also means that performance suffers.

2. Mutability by default is a bug magnet as well.

To deal with the first problem, we could:

* Add a universal ordering predicate that will compare two values of the same type for many built-in types: numbers, strings, symbols, markers, lists, vectors, records, and a few more.
* Make this ordering the default.
* Add a key (accessor) function argument, like that in the recent `sort-on` interface, but built-in. This is important.

These work very well together: the user does not need to write or even choose an ordering predicate in most cases. Key functions are much less error-prone to write, and with the lexicographic ordering of lists, vectors and records, multi-key sorting is made much easier.

A key function combined with a standard ordering can also be used to optimise comparisons since we have all key values up front along with how they should be compared. The original timsort code that we stole from Python did this.

As a starting point, a patch implementing a universal ordering predicate is attached below.

The proposed sorting function interface would be

  (new-sort seq &key key lessp destructive)

because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.

Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both

  (sort seq lessp)                       ; old-style
  (sort seq &key key lessp destructive)  ; new-style

since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.


[-- Attachment #2: 0001-value-less-p.patch --]
[-- Type: application/octet-stream, Size: 14046 bytes --]

From 6980c8cf54a23eddf97ecf4881e43f9a49a18a55 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sun, 10 Mar 2024 13:18:22 +0100
Subject: [PATCH] value-less-p

---
 src/data.c            |   2 +
 src/fns.c             | 226 ++++++++++++++++++++++++++++++++++++++++++
 test/src/fns-tests.el | 122 +++++++++++++++++++++++
 3 files changed, 350 insertions(+)

diff --git a/src/data.c b/src/data.c
index df08eaf8102..8f3ba6438b9 100644
--- a/src/data.c
+++ b/src/data.c
@@ -4039,6 +4039,7 @@ syms_of_data (void)
   DEFSYM (Qminibuffer_quit, "minibuffer-quit");
   DEFSYM (Qwrong_length_argument, "wrong-length-argument");
   DEFSYM (Qwrong_type_argument, "wrong-type-argument");
+  DEFSYM (Qtype_mismatch, "type-mismatch")
   DEFSYM (Qargs_out_of_range, "args-out-of-range");
   DEFSYM (Qvoid_function, "void-function");
   DEFSYM (Qcyclic_function_indirection, "cyclic-function-indirection");
@@ -4130,6 +4131,7 @@ #define PUT_ERROR(sym, tail, msg)			\
   PUT_ERROR (Quser_error, error_tail, "");
   PUT_ERROR (Qwrong_length_argument, error_tail, "Wrong length argument");
   PUT_ERROR (Qwrong_type_argument, error_tail, "Wrong type argument");
+  PUT_ERROR (Qtype_mismatch, error_tail, "Types do not match");
   PUT_ERROR (Qargs_out_of_range, error_tail, "Args out of range");
   PUT_ERROR (Qvoid_function, error_tail,
 	     "Symbol's function definition is void");
diff --git a/src/fns.c b/src/fns.c
index 0a64e515402..cc017839996 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -27,6 +27,7 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include <vla.h>
 #include <errno.h>
 #include <ctype.h>
+#include <math.h>
 
 #include "lisp.h"
 #include "bignum.h"
@@ -2908,6 +2909,230 @@ internal_equal (Lisp_Object o1, Lisp_Object o2, enum equal_kind equal_kind,
 
   return false;
 }
+
+
+/* Return -1, 0 or 1 to indicate whether a<b, a=b or a>b in the sense
+   of value-less-p.
+   In particular 0 does not mean equality in the sense of Fequal, only
+   that the arguments cannot be ordered yet they can be compared (same
+   type).
+
+   If lessp_only is true, then we may return 0 instead of 1 when a>b,
+   if this is faster.  */
+static int
+value_less_p (Lisp_Object a, Lisp_Object b, int maxdepth, bool lessp_only)
+{
+  if (maxdepth < 0)
+    error ("Maximum depth exceeded in comparison");
+
+ tail_recurse:
+  /* Shortcut for a common case.  */
+  if (BASE_EQ (a, b))
+    return 0;
+
+  switch (XTYPE (a))
+    {
+    case_Lisp_Int:
+      {
+	EMACS_INT ia = XFIXNUM (a);
+	if (FIXNUMP (b))
+	  return ia < XFIXNUM (b) ? -1 : ia > XFIXNUM (b);
+	if (FLOATP (b))
+	  return ia < XFLOAT_DATA (b) ? -1 : ia > XFLOAT_DATA (b);
+	if (BIGNUMP (b))
+	  return -mpz_sgn (*xbignum_val (b));
+      }
+      goto type_mismatch;
+
+    case Lisp_Symbol:
+      if (BARE_SYMBOL_P (b))
+	{
+	  struct Lisp_Symbol *sa = XBARE_SYMBOL (a);
+	  struct Lisp_Symbol *sb = XBARE_SYMBOL (b);
+	  if (!NILP (Fstring_lessp (sa->u.s.name, sb->u.s.name)))
+	    return -1;
+	  if (lessp_only)
+	    return 0;
+	  if (sa->u.s.interned == SYMBOL_INTERNED_IN_INITIAL_OBARRAY
+	      && sb->u.s.interned == SYMBOL_INTERNED_IN_INITIAL_OBARRAY)
+	    /* Both symbols are interned in the initial obarray, so cannot have
+	       equal names.  */
+	    return 1;
+	  return NILP (Fequal (sa->u.s.name, sb->u.s.name));
+	}
+      if (CONSP (b) && NILP (a))
+	return -1;
+      if (SYMBOLP (b))
+	{
+	  /* Slow-path branch when B is a symbol-with-pos.  */
+	  if (!NILP (Fstring_lessp (a, b)))
+	    return -1;
+	  if (lessp_only)
+	    return 0;
+	  return NILP (Fequal (a, b));
+	}
+      goto type_mismatch;
+
+    case Lisp_String:
+      if (STRINGP (b))
+	{
+	  if (!NILP (Fstring_lessp (a, b)))
+	    return -1;
+	  /* FIXME: We would go even faster, and wouldn't need the
+	     lessp_only hack, if we had a string comparison with -1/0/1 result.
+	     Generalise the code in Fstring_lessp for internal use?  */
+	  if (lessp_only)
+	    return 0;
+	  return NILP (Fequal (a, b));
+	}
+      goto type_mismatch;
+
+    case Lisp_Cons:
+      while (CONSP (b))
+	{
+	  int cmp = value_less_p (XCAR (a), XCAR (b), maxdepth - 1, false);
+	  if (cmp != 0)
+	    return cmp;
+	  a = XCDR (a);
+	  b = XCDR (b);
+	  if (!CONSP (a))
+	    break;
+	}
+      if (CONSP (a))
+	{
+	  if (NILP (b))
+	    return 1;
+	  else
+	    goto type_mismatch;
+	}
+      goto tail_recurse;
+
+    case Lisp_Vectorlike:
+      if (VECTORLIKEP (b))
+	{
+	  enum pvec_type ta = PSEUDOVECTOR_TYPE (XVECTOR (a));
+	  enum pvec_type tb = PSEUDOVECTOR_TYPE (XVECTOR (b));
+	  if (ta == tb)
+	    switch (ta)
+	      {
+	      case PVEC_NORMAL_VECTOR:
+	      case PVEC_RECORD:
+		{
+		  ptrdiff_t len_a = ASIZE (a);
+		  ptrdiff_t len_b = ASIZE (b);
+		  if (ta == PVEC_RECORD)
+		    {
+		      len_a &= PSEUDOVECTOR_SIZE_MASK;
+		      len_b &= PSEUDOVECTOR_SIZE_MASK;
+		    }
+		  ptrdiff_t len_min = min (len_a, len_b);
+		  for (ptrdiff_t i = 0; i < len_min; i++)
+		    {
+		      int cmp = value_less_p (AREF (a, i), AREF (b, i),
+					      maxdepth - 1, false);
+		      if (cmp != 0)
+			return cmp;
+		    }
+		  return len_a < len_b ? -1 : len_a > len_b;
+		}
+
+	      case PVEC_BOOL_VECTOR:
+		{
+		  ptrdiff_t len_a = bool_vector_size (a);
+		  ptrdiff_t len_b = bool_vector_size (b);
+		  ptrdiff_t len_min = min (len_a, len_b);
+		  /* FIXME: very inefficient, we could compare words.  */
+		  for (ptrdiff_t i = 0; i < len_min; i++)
+		    {
+		      bool ai = bool_vector_bitref (a, i);
+		      bool bi = bool_vector_bitref (b, i);
+		      if (ai != bi)
+			return bi ? -1 : ai;
+		    }
+		  return len_a < len_b ? -1 : len_a > len_b;
+		}
+
+	      case PVEC_MARKER:
+		{
+		  Lisp_Object buf_a = Fmarker_buffer (a);
+		  Lisp_Object buf_b = Fmarker_buffer (b);
+		  if (NILP (buf_a))
+		    return NILP (buf_b) ? 0 : -1;
+		  if (NILP (buf_b))
+		    return 1;
+		  int cmp = value_less_p (buf_a, buf_b, maxdepth - 1, false);
+		  if (cmp != 0)
+		    return cmp;
+		  ptrdiff_t pa = XMARKER (a)->charpos;
+		  ptrdiff_t pb = XMARKER (b)->charpos;
+		  return pa < pb ? -1 : pa > pb;
+		}
+
+	      case PVEC_PROCESS:
+		return value_less_p (Fprocess_name (a), Fprocess_name (b),
+				     maxdepth - 1, lessp_only);
+	      case PVEC_BUFFER:
+		{
+		  /* Killed buffers lack names and sort before those alive.  */
+		  Lisp_Object na = Fbuffer_name (a);
+		  Lisp_Object nb = Fbuffer_name (b);
+		  if (NILP (na))
+		    return NILP (nb) ? 0 : -1;
+		  if (NILP (nb))
+		    return 1;
+		  return value_less_p (na, nb, maxdepth - 1, lessp_only);
+		}
+
+	      case PVEC_BIGNUM:
+		return mpz_cmp (*xbignum_val (a), *xbignum_val (b));
+
+	      default:
+		/* Treat other types as unordered.  */
+		return 0;
+	      }
+	}
+      else if (BIGNUMP (a))
+	return -value_less_p (b, a, maxdepth, false);
+      goto type_mismatch;
+
+    case Lisp_Float:
+      {
+	double fa = XFLOAT_DATA (a);
+	if (FLOATP (b))
+	  return fa < XFLOAT_DATA (b) ? -1 : fa > XFLOAT_DATA (b);
+	if (FIXNUMP (b))
+	  return fa < XFIXNUM (b) ? -1 : fa > XFIXNUM (b);
+	if (BIGNUMP (b))
+	  {
+	    if (isnan (fa))
+	      return 0;
+	    return -mpz_cmp_d (*xbignum_val (b), fa);
+	  }
+      }
+      goto type_mismatch;
+
+    default:
+      eassume (0);
+    }
+ type_mismatch:
+  xsignal2 (Qtype_mismatch, a, b);
+}
+
+DEFUN ("value-less-p", Fvalue_less_p, Svalue_less_p, 2, 2, 0,
+       doc: /* Return non-nil if A precedes B in standard value order.
+A and B must have the same basic type.
+Numbers are compared with `<'.
+Strings and symbols are compared with `string-lessp'.
+Lists, vectors, bool-vectors and records are compared lexicographically.
+Markers are compared lexicographically by buffer and position.
+Buffers and processes are compared by name.
+Other types are considered unordered and the return value will be `nil'.  */)
+  (Lisp_Object a, Lisp_Object b)
+{
+  int maxdepth = 20;		  /* FIXME: arbitrary value */
+  return value_less_p (a, b, maxdepth, true) < 0 ? Qt : Qnil;
+}
+
 \f
 
 DEFUN ("fillarray", Ffillarray, Sfillarray, 2, 2, 0,
@@ -6589,6 +6814,7 @@ syms_of_fns (void)
   defsubr (&Seql);
   defsubr (&Sequal);
   defsubr (&Sequal_including_properties);
+  defsubr (&Svalue_less_p);
   defsubr (&Sfillarray);
   defsubr (&Sclear_string);
   defsubr (&Snconc);
diff --git a/test/src/fns-tests.el b/test/src/fns-tests.el
index 7437c07f156..f81b1eadd09 100644
--- a/test/src/fns-tests.el
+++ b/test/src/fns-tests.el
@@ -1513,4 +1513,126 @@ fns--copy-alist
   (should-error (copy-alist "abc")
                 :type 'wrong-type-argument))
 
+(ert-deftest fns-value-less-p-ordered ()
+  ;; values (X . Y) where X<Y
+  (let* ((big (* 10 most-positive-fixnum))
+         (buf1 (get-buffer-create " *one*"))
+         (buf2 (get-buffer-create " *two*"))
+         (_ (progn (with-current-buffer buf1 (insert (make-string 20 ?a)))
+                   (with-current-buffer buf2 (insert (make-string 20 ?b)))))
+         (mark1 (set-marker (make-marker) 12 buf1))
+         (mark2 (set-marker (make-marker) 13 buf1))
+         (mark3 (set-marker (make-marker) 12 buf2))
+         (mark4 (set-marker (make-marker) 13 buf2))
+         (proc1 (make-pipe-process :name " *proc one*"))
+         (proc2 (make-pipe-process :name " *proc two*")))
+    (unwind-protect
+        (dolist (c
+                 `(
+                   ;; fixnums
+                   (1 . 2)  (-2 . -1) (-2 . 1) (-1 . 2)
+                   ;; bignums
+                   (,big . ,(1+ big)) (,(- big) . ,big)
+                   (,(- -1 big) . ,(- big))
+                   ;; fixnums/bignums
+                   (1 . ,big) (-1 . ,big) (,(- big) . -1) (,(- big) . 1)
+                   ;; floats
+                   (1.5 . 1.6) (-1.3 . -1.2) (-13.0 . 12.0)
+                   ;; floats/fixnums
+                   (1 . 1.1) (1.9 . 2) (-2.0 . 1) (-2 . 1.0)
+                   ;; floats/bignums
+                   (,big . ,(float (* 2 big))) (,(float big) . ,(* 2 big))
+                   ;; symbols
+                   (a . b) (nil . nix) (b . ba) (## . a) (A . a)
+                   (#:a . #:b) (a . #:b) (#:a . b)
+                   ;; strings
+                   ("" . "a") ("a" . "b") ("A" . "a") ("abc" . "abd")
+                   ("b" . "ba")
+
+                   ;; lists
+                   ((1 2 3) . (2 3 4)) ((2) . (2 1)) (() . (0))
+                   ((1 2 3) . (1 3)) ((1 2 3) . (1 3 2))
+                   (((b a) (c d) e) . ((b a) (c d) f))
+                   (((b a) (c D) e) . ((b a) (c d) e))
+                   (((b a) (c d () x) e) . ((b a) (c d (1) x) e))
+                   ((1 . 2) . (1 . 3)) ((1 2 . 3) . (1 2 . 4))
+
+                   ;; vectors
+                   ([1 2 3] . [2 3 4]) ([2] . [2 1]) ([] . [0])
+                   ([1 2 3] . [1 3]) ([1 2 3] . [1 3 2])
+                   ([[b a] [c d] e] . [[b a] [c d] f])
+                   ([[b a] [c D] e] . [[b a] [c d] e])
+                   ([[b a] [c d [] x] e] . [[b a] [c d [1] x] e])
+
+                   ;; bool-vectors
+                   (,(bool-vector) . ,(bool-vector nil))
+                   (,(bool-vector nil) . ,(bool-vector t))
+                   (,(bool-vector t nil t nil) . ,(bool-vector t nil t t))
+                   (,(bool-vector t nil t) . ,(bool-vector t nil t nil))
+
+                   ;; records
+                   (#s(a 2 3) . #s(b 3 4)) (#s(b) . #s(b a))
+                   (#s(a 2 3) . #s(a 3)) (#s(a 2 3) . #s(a 3 2))
+                   (#s(#s(b a) #s(c d) e) . #s(#s(b a) #s(c d) f))
+                   (#s(#s(b a) #s(c D) e) . #s(#s(b a) #s(c d) e))
+                   (#s(#s(b a) #s(c d #s(u) x) e)
+                    . #s(#s(b a) #s(c d #s(v) x) e))
+
+                   ;; markers
+                   (,mark1 . ,mark2) (,mark1 . ,mark3) (,mark1 . ,mark4)
+                   (,mark2 . ,mark3) (,mark2 . ,mark4) (,mark3 . ,mark4)
+
+                   ;; buffers
+                   (,buf1 . ,buf2)
+
+                   ;; processes
+                   (,proc1 . ,proc2)
+                   ))
+          (let ((x (car c))
+                (y (cdr c)))
+            (should (value-less-p x y))
+            (should-not (value-less-p y x))
+            (should-not (value-less-p x x))
+            (should-not (value-less-p y y))))
+
+      (delete-process proc2)
+      (delete-process proc1)
+      (kill-buffer buf2)
+      (kill-buffer buf1))))
+
+(ert-deftest fns-value-less-p-unordered ()
+  ;; values (X . Y) where neither X<Y nor Y<X
+  (dolist (c `(
+               ;; numbers
+               (0 . 0.0) (0 . -0.0) (0.0 . -0.0)
+               ;; symbols
+               (a . #:a)
+
+               ;; unordered types
+               (,(make-hash-table) . ,(make-hash-table))
+               (,(obarray-make) . ,(obarray-make))
+               ;; FIXME: more?
+               ))
+    (let ((x (car c))
+          (y (cdr c)))
+      (should-not (value-less-p x y))
+      (should-not (value-less-p y x)))))
+
+(ert-deftest fns-value-less-p-type-mismatch ()
+  ;; values of disjoint (incomparable) types
+  (let ((incomparable
+         `( 1 a "a" (a b) [a b] ,(bool-vector nil t) #s(a b)
+            ,(make-char-table 'test)
+            ,(make-hash-table)
+            ,(obarray-make)
+            ;; FIXME: more?
+            )))
+    (let ((tail incomparable))
+      (while tail
+        (let ((x (car tail)))
+          (dolist (y (cdr tail))
+            (should-error (value-less-p x y) :type 'type-mismatch)
+            (should-error (value-less-p y x) :type 'type-mismatch)))
+        (setq tail (cdr tail))))))
+
 ;;; fns-tests.el ends here
-- 
2.32.0 (Apple Git-132)


[-- Attachment #3: Type: text/plain, Size: 3 bytes --]





^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-05-17 17:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
2024-03-10 14:09 ` Eli Zaretskii
2024-03-10 14:56   ` Mattias Engdegård
2024-03-20 19:01     ` Mattias Engdegård
2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-21 14:55         ` Mattias Engdegård
2024-03-21 14:54       ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-22 20:55       ` Dmitry Gutov
2024-03-23 14:58         ` Mattias Engdegård
2024-03-23 17:39           ` Dmitry Gutov
2024-03-23 20:09             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-23 23:19               ` Dmitry Gutov
2024-03-23 23:25                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-25 11:11                   ` Mattias Engdegård
2024-03-29 10:59                     ` Mattias Engdegård
2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-29 11:52                         ` Mattias Engdegård
2024-05-17 12:29                           ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-17 17:49                             ` Mattias Engdegård
2024-03-29 12:06                       ` Eli Zaretskii
2024-03-29 15:02                         ` Mattias Engdegård
2024-03-29 15:35                           ` Eli Zaretskii
2024-03-29 16:13                             ` Mattias Engdegård
2024-03-29 18:09                               ` Eli Zaretskii
2024-03-10 15:48 ` Dmitry Gutov
2024-03-10 15:56   ` Mattias Engdegård
2024-03-10 16:03     ` Dmitry Gutov
2024-03-10 16:46       ` Mattias Engdegård
2024-03-10 16:55         ` Eli Zaretskii
2024-03-10 17:54           ` Dmitry Gutov
2024-03-11  7:01 ` Gerd Möllmann
2024-04-14 14:03 ` Aris Spathis
2024-04-14 16:26   ` Eli Zaretskii
2024-04-14 16:33     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).