bug#69709: `sort` interface improvement and universal ordering predicate

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#69709: `sort` interface improvement and universal ordering predicate
@ 2024-03-10 13:28 Mattias Engdegård
  2024-03-10 14:09 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-10 13:28 UTC (permalink / raw)
  To: 69709

[-- Attachment #1: Type: text/plain, Size: 2276 bytes --]

The existing `sort` interface suffers from some usability problems:

1. Writing an ordering predicate is often difficult and error-prone, even for very basic tasks such as selecting the field to sort on. It's not uncommon to botch a predicate as simple as

  (lambda (a b) (< (f a) (f b)))

which I've managed to do myself more than once. It gets particularly messy when sorting on multiple fields.
Having to write a custom comparison function for almost every occasion also means that performance suffers.

2. Mutability by default is a bug magnet as well.

To deal with the first problem, we could:

* Add a universal ordering predicate that will compare two values of the same type for many built-in types: numbers, strings, symbols, markers, lists, vectors, records, and a few more.
* Make this ordering the default.
* Add a key (accessor) function argument, like that in the recent `sort-on` interface, but built-in. This is important.

These work very well together: the user does not need to write or even choose an ordering predicate in most cases. Key functions are much less error-prone to write, and with the lexicographic ordering of lists, vectors and records, multi-key sorting is made much easier.

A key function combined with a standard ordering can also be used to optimise comparisons since we have all key values up front along with how they should be compared. The original timsort code that we stole from Python did this.

As a starting point, a patch implementing a universal ordering predicate is attached below.

The proposed sorting function interface would be

  (new-sort seq &key key lessp destructive)

because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.

Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both

  (sort seq lessp)                       ; old-style
  (sort seq &key key lessp destructive)  ; new-style

since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.


[-- Attachment #2: 0001-value-less-p.patch --]
[-- Type: application/octet-stream, Size: 14046 bytes --]

From 6980c8cf54a23eddf97ecf4881e43f9a49a18a55 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sun, 10 Mar 2024 13:18:22 +0100
Subject: [PATCH] value-less-p

---
 src/data.c            |   2 +
 src/fns.c             | 226 ++++++++++++++++++++++++++++++++++++++++++
 test/src/fns-tests.el | 122 +++++++++++++++++++++++
 3 files changed, 350 insertions(+)

diff --git a/src/data.c b/src/data.c
index df08eaf8102..8f3ba6438b9 100644
--- a/src/data.c
+++ b/src/data.c
@@ -4039,6 +4039,7 @@ syms_of_data (void)
   DEFSYM (Qminibuffer_quit, "minibuffer-quit");
   DEFSYM (Qwrong_length_argument, "wrong-length-argument");
   DEFSYM (Qwrong_type_argument, "wrong-type-argument");
+  DEFSYM (Qtype_mismatch, "type-mismatch")
   DEFSYM (Qargs_out_of_range, "args-out-of-range");
   DEFSYM (Qvoid_function, "void-function");
   DEFSYM (Qcyclic_function_indirection, "cyclic-function-indirection");
@@ -4130,6 +4131,7 @@ #define PUT_ERROR(sym, tail, msg)			\
   PUT_ERROR (Quser_error, error_tail, "");
   PUT_ERROR (Qwrong_length_argument, error_tail, "Wrong length argument");
   PUT_ERROR (Qwrong_type_argument, error_tail, "Wrong type argument");
+  PUT_ERROR (Qtype_mismatch, error_tail, "Types do not match");
   PUT_ERROR (Qargs_out_of_range, error_tail, "Args out of range");
   PUT_ERROR (Qvoid_function, error_tail,
 	     "Symbol's function definition is void");
diff --git a/src/fns.c b/src/fns.c
index 0a64e515402..cc017839996 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -27,6 +27,7 @@ Copyright (C) 1985-2024 Free Software Foundation, Inc.
 #include <vla.h>
 #include <errno.h>
 #include <ctype.h>
+#include <math.h>
 
 #include "lisp.h"
 #include "bignum.h"
@@ -2908,6 +2909,230 @@ internal_equal (Lisp_Object o1, Lisp_Object o2, enum equal_kind equal_kind,
 
   return false;
 }
+
+
+/* Return -1, 0 or 1 to indicate whether a<b, a=b or a>b in the sense
+   of value-less-p.
+   In particular 0 does not mean equality in the sense of Fequal, only
+   that the arguments cannot be ordered yet they can be compared (same
+   type).
+
+   If lessp_only is true, then we may return 0 instead of 1 when a>b,
+   if this is faster.  */
+static int
+value_less_p (Lisp_Object a, Lisp_Object b, int maxdepth, bool lessp_only)
+{
+  if (maxdepth < 0)
+    error ("Maximum depth exceeded in comparison");
+
+ tail_recurse:
+  /* Shortcut for a common case.  */
+  if (BASE_EQ (a, b))
+    return 0;
+
+  switch (XTYPE (a))
+    {
+    case_Lisp_Int:
+      {
+	EMACS_INT ia = XFIXNUM (a);
+	if (FIXNUMP (b))
+	  return ia < XFIXNUM (b) ? -1 : ia > XFIXNUM (b);
+	if (FLOATP (b))
+	  return ia < XFLOAT_DATA (b) ? -1 : ia > XFLOAT_DATA (b);
+	if (BIGNUMP (b))
+	  return -mpz_sgn (*xbignum_val (b));
+      }
+      goto type_mismatch;
+
+    case Lisp_Symbol:
+      if (BARE_SYMBOL_P (b))
+	{
+	  struct Lisp_Symbol *sa = XBARE_SYMBOL (a);
+	  struct Lisp_Symbol *sb = XBARE_SYMBOL (b);
+	  if (!NILP (Fstring_lessp (sa->u.s.name, sb->u.s.name)))
+	    return -1;
+	  if (lessp_only)
+	    return 0;
+	  if (sa->u.s.interned == SYMBOL_INTERNED_IN_INITIAL_OBARRAY
+	      && sb->u.s.interned == SYMBOL_INTERNED_IN_INITIAL_OBARRAY)
+	    /* Both symbols are interned in the initial obarray, so cannot have
+	       equal names.  */
+	    return 1;
+	  return NILP (Fequal (sa->u.s.name, sb->u.s.name));
+	}
+      if (CONSP (b) && NILP (a))
+	return -1;
+      if (SYMBOLP (b))
+	{
+	  /* Slow-path branch when B is a symbol-with-pos.  */
+	  if (!NILP (Fstring_lessp (a, b)))
+	    return -1;
+	  if (lessp_only)
+	    return 0;
+	  return NILP (Fequal (a, b));
+	}
+      goto type_mismatch;
+
+    case Lisp_String:
+      if (STRINGP (b))
+	{
+	  if (!NILP (Fstring_lessp (a, b)))
+	    return -1;
+	  /* FIXME: We would go even faster, and wouldn't need the
+	     lessp_only hack, if we had a string comparison with -1/0/1 result.
+	     Generalise the code in Fstring_lessp for internal use?  */
+	  if (lessp_only)
+	    return 0;
+	  return NILP (Fequal (a, b));
+	}
+      goto type_mismatch;
+
+    case Lisp_Cons:
+      while (CONSP (b))
+	{
+	  int cmp = value_less_p (XCAR (a), XCAR (b), maxdepth - 1, false);
+	  if (cmp != 0)
+	    return cmp;
+	  a = XCDR (a);
+	  b = XCDR (b);
+	  if (!CONSP (a))
+	    break;
+	}
+      if (CONSP (a))
+	{
+	  if (NILP (b))
+	    return 1;
+	  else
+	    goto type_mismatch;
+	}
+      goto tail_recurse;
+
+    case Lisp_Vectorlike:
+      if (VECTORLIKEP (b))
+	{
+	  enum pvec_type ta = PSEUDOVECTOR_TYPE (XVECTOR (a));
+	  enum pvec_type tb = PSEUDOVECTOR_TYPE (XVECTOR (b));
+	  if (ta == tb)
+	    switch (ta)
+	      {
+	      case PVEC_NORMAL_VECTOR:
+	      case PVEC_RECORD:
+		{
+		  ptrdiff_t len_a = ASIZE (a);
+		  ptrdiff_t len_b = ASIZE (b);
+		  if (ta == PVEC_RECORD)
+		    {
+		      len_a &= PSEUDOVECTOR_SIZE_MASK;
+		      len_b &= PSEUDOVECTOR_SIZE_MASK;
+		    }
+		  ptrdiff_t len_min = min (len_a, len_b);
+		  for (ptrdiff_t i = 0; i < len_min; i++)
+		    {
+		      int cmp = value_less_p (AREF (a, i), AREF (b, i),
+					      maxdepth - 1, false);
+		      if (cmp != 0)
+			return cmp;
+		    }
+		  return len_a < len_b ? -1 : len_a > len_b;
+		}
+
+	      case PVEC_BOOL_VECTOR:
+		{
+		  ptrdiff_t len_a = bool_vector_size (a);
+		  ptrdiff_t len_b = bool_vector_size (b);
+		  ptrdiff_t len_min = min (len_a, len_b);
+		  /* FIXME: very inefficient, we could compare words.  */
+		  for (ptrdiff_t i = 0; i < len_min; i++)
+		    {
+		      bool ai = bool_vector_bitref (a, i);
+		      bool bi = bool_vector_bitref (b, i);
+		      if (ai != bi)
+			return bi ? -1 : ai;
+		    }
+		  return len_a < len_b ? -1 : len_a > len_b;
+		}
+
+	      case PVEC_MARKER:
+		{
+		  Lisp_Object buf_a = Fmarker_buffer (a);
+		  Lisp_Object buf_b = Fmarker_buffer (b);
+		  if (NILP (buf_a))
+		    return NILP (buf_b) ? 0 : -1;
+		  if (NILP (buf_b))
+		    return 1;
+		  int cmp = value_less_p (buf_a, buf_b, maxdepth - 1, false);
+		  if (cmp != 0)
+		    return cmp;
+		  ptrdiff_t pa = XMARKER (a)->charpos;
+		  ptrdiff_t pb = XMARKER (b)->charpos;
+		  return pa < pb ? -1 : pa > pb;
+		}
+
+	      case PVEC_PROCESS:
+		return value_less_p (Fprocess_name (a), Fprocess_name (b),
+				     maxdepth - 1, lessp_only);
+	      case PVEC_BUFFER:
+		{
+		  /* Killed buffers lack names and sort before those alive.  */
+		  Lisp_Object na = Fbuffer_name (a);
+		  Lisp_Object nb = Fbuffer_name (b);
+		  if (NILP (na))
+		    return NILP (nb) ? 0 : -1;
+		  if (NILP (nb))
+		    return 1;
+		  return value_less_p (na, nb, maxdepth - 1, lessp_only);
+		}
+
+	      case PVEC_BIGNUM:
+		return mpz_cmp (*xbignum_val (a), *xbignum_val (b));
+
+	      default:
+		/* Treat other types as unordered.  */
+		return 0;
+	      }
+	}
+      else if (BIGNUMP (a))
+	return -value_less_p (b, a, maxdepth, false);
+      goto type_mismatch;
+
+    case Lisp_Float:
+      {
+	double fa = XFLOAT_DATA (a);
+	if (FLOATP (b))
+	  return fa < XFLOAT_DATA (b) ? -1 : fa > XFLOAT_DATA (b);
+	if (FIXNUMP (b))
+	  return fa < XFIXNUM (b) ? -1 : fa > XFIXNUM (b);
+	if (BIGNUMP (b))
+	  {
+	    if (isnan (fa))
+	      return 0;
+	    return -mpz_cmp_d (*xbignum_val (b), fa);
+	  }
+      }
+      goto type_mismatch;
+
+    default:
+      eassume (0);
+    }
+ type_mismatch:
+  xsignal2 (Qtype_mismatch, a, b);
+}
+
+DEFUN ("value-less-p", Fvalue_less_p, Svalue_less_p, 2, 2, 0,
+       doc: /* Return non-nil if A precedes B in standard value order.
+A and B must have the same basic type.
+Numbers are compared with `<'.
+Strings and symbols are compared with `string-lessp'.
+Lists, vectors, bool-vectors and records are compared lexicographically.
+Markers are compared lexicographically by buffer and position.
+Buffers and processes are compared by name.
+Other types are considered unordered and the return value will be `nil'.  */)
+  (Lisp_Object a, Lisp_Object b)
+{
+  int maxdepth = 20;		  /* FIXME: arbitrary value */
+  return value_less_p (a, b, maxdepth, true) < 0 ? Qt : Qnil;
+}
+
 \f
 
 DEFUN ("fillarray", Ffillarray, Sfillarray, 2, 2, 0,
@@ -6589,6 +6814,7 @@ syms_of_fns (void)
   defsubr (&Seql);
   defsubr (&Sequal);
   defsubr (&Sequal_including_properties);
+  defsubr (&Svalue_less_p);
   defsubr (&Sfillarray);
   defsubr (&Sclear_string);
   defsubr (&Snconc);
diff --git a/test/src/fns-tests.el b/test/src/fns-tests.el
index 7437c07f156..f81b1eadd09 100644
--- a/test/src/fns-tests.el
+++ b/test/src/fns-tests.el
@@ -1513,4 +1513,126 @@ fns--copy-alist
   (should-error (copy-alist "abc")
                 :type 'wrong-type-argument))
 
+(ert-deftest fns-value-less-p-ordered ()
+  ;; values (X . Y) where X<Y
+  (let* ((big (* 10 most-positive-fixnum))
+         (buf1 (get-buffer-create " *one*"))
+         (buf2 (get-buffer-create " *two*"))
+         (_ (progn (with-current-buffer buf1 (insert (make-string 20 ?a)))
+                   (with-current-buffer buf2 (insert (make-string 20 ?b)))))
+         (mark1 (set-marker (make-marker) 12 buf1))
+         (mark2 (set-marker (make-marker) 13 buf1))
+         (mark3 (set-marker (make-marker) 12 buf2))
+         (mark4 (set-marker (make-marker) 13 buf2))
+         (proc1 (make-pipe-process :name " *proc one*"))
+         (proc2 (make-pipe-process :name " *proc two*")))
+    (unwind-protect
+        (dolist (c
+                 `(
+                   ;; fixnums
+                   (1 . 2)  (-2 . -1) (-2 . 1) (-1 . 2)
+                   ;; bignums
+                   (,big . ,(1+ big)) (,(- big) . ,big)
+                   (,(- -1 big) . ,(- big))
+                   ;; fixnums/bignums
+                   (1 . ,big) (-1 . ,big) (,(- big) . -1) (,(- big) . 1)
+                   ;; floats
+                   (1.5 . 1.6) (-1.3 . -1.2) (-13.0 . 12.0)
+                   ;; floats/fixnums
+                   (1 . 1.1) (1.9 . 2) (-2.0 . 1) (-2 . 1.0)
+                   ;; floats/bignums
+                   (,big . ,(float (* 2 big))) (,(float big) . ,(* 2 big))
+                   ;; symbols
+                   (a . b) (nil . nix) (b . ba) (## . a) (A . a)
+                   (#:a . #:b) (a . #:b) (#:a . b)
+                   ;; strings
+                   ("" . "a") ("a" . "b") ("A" . "a") ("abc" . "abd")
+                   ("b" . "ba")
+
+                   ;; lists
+                   ((1 2 3) . (2 3 4)) ((2) . (2 1)) (() . (0))
+                   ((1 2 3) . (1 3)) ((1 2 3) . (1 3 2))
+                   (((b a) (c d) e) . ((b a) (c d) f))
+                   (((b a) (c D) e) . ((b a) (c d) e))
+                   (((b a) (c d () x) e) . ((b a) (c d (1) x) e))
+                   ((1 . 2) . (1 . 3)) ((1 2 . 3) . (1 2 . 4))
+
+                   ;; vectors
+                   ([1 2 3] . [2 3 4]) ([2] . [2 1]) ([] . [0])
+                   ([1 2 3] . [1 3]) ([1 2 3] . [1 3 2])
+                   ([[b a] [c d] e] . [[b a] [c d] f])
+                   ([[b a] [c D] e] . [[b a] [c d] e])
+                   ([[b a] [c d [] x] e] . [[b a] [c d [1] x] e])
+
+                   ;; bool-vectors
+                   (,(bool-vector) . ,(bool-vector nil))
+                   (,(bool-vector nil) . ,(bool-vector t))
+                   (,(bool-vector t nil t nil) . ,(bool-vector t nil t t))
+                   (,(bool-vector t nil t) . ,(bool-vector t nil t nil))
+
+                   ;; records
+                   (#s(a 2 3) . #s(b 3 4)) (#s(b) . #s(b a))
+                   (#s(a 2 3) . #s(a 3)) (#s(a 2 3) . #s(a 3 2))
+                   (#s(#s(b a) #s(c d) e) . #s(#s(b a) #s(c d) f))
+                   (#s(#s(b a) #s(c D) e) . #s(#s(b a) #s(c d) e))
+                   (#s(#s(b a) #s(c d #s(u) x) e)
+                    . #s(#s(b a) #s(c d #s(v) x) e))
+
+                   ;; markers
+                   (,mark1 . ,mark2) (,mark1 . ,mark3) (,mark1 . ,mark4)
+                   (,mark2 . ,mark3) (,mark2 . ,mark4) (,mark3 . ,mark4)
+
+                   ;; buffers
+                   (,buf1 . ,buf2)
+
+                   ;; processes
+                   (,proc1 . ,proc2)
+                   ))
+          (let ((x (car c))
+                (y (cdr c)))
+            (should (value-less-p x y))
+            (should-not (value-less-p y x))
+            (should-not (value-less-p x x))
+            (should-not (value-less-p y y))))
+
+      (delete-process proc2)
+      (delete-process proc1)
+      (kill-buffer buf2)
+      (kill-buffer buf1))))
+
+(ert-deftest fns-value-less-p-unordered ()
+  ;; values (X . Y) where neither X<Y nor Y<X
+  (dolist (c `(
+               ;; numbers
+               (0 . 0.0) (0 . -0.0) (0.0 . -0.0)
+               ;; symbols
+               (a . #:a)
+
+               ;; unordered types
+               (,(make-hash-table) . ,(make-hash-table))
+               (,(obarray-make) . ,(obarray-make))
+               ;; FIXME: more?
+               ))
+    (let ((x (car c))
+          (y (cdr c)))
+      (should-not (value-less-p x y))
+      (should-not (value-less-p y x)))))
+
+(ert-deftest fns-value-less-p-type-mismatch ()
+  ;; values of disjoint (incomparable) types
+  (let ((incomparable
+         `( 1 a "a" (a b) [a b] ,(bool-vector nil t) #s(a b)
+            ,(make-char-table 'test)
+            ,(make-hash-table)
+            ,(obarray-make)
+            ;; FIXME: more?
+            )))
+    (let ((tail incomparable))
+      (while tail
+        (let ((x (car tail)))
+          (dolist (y (cdr tail))
+            (should-error (value-less-p x y) :type 'type-mismatch)
+            (should-error (value-less-p y x) :type 'type-mismatch)))
+        (setq tail (cdr tail))))))
+
 ;;; fns-tests.el ends here
-- 
2.32.0 (Apple Git-132)


[-- Attachment #3: Type: text/plain, Size: 3 bytes --]





^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
@ 2024-03-10 14:09 ` Eli Zaretskii
  2024-03-10 14:56   ` Mattias Engdegård
  2024-03-10 15:48 ` Dmitry Gutov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-10 14:09 UTC (permalink / raw)
  To: Mattias Engdegård, Stefan Monnier; +Cc: 69709

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sun, 10 Mar 2024 14:28:02 +0100
> 
> The proposed sorting function interface would be
> 
>   (new-sort seq &key key lessp destructive)

A nit: let's go with a name that doesn't have "new" as part of it.
Something like "lsort" or "xsort" or somesuch.  (I don't suggest
"nsort" because 'n' as the first character has a special meaning in
Emacs Lisp, so I'd like to avoid confusion.)

> because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.
> 
> Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both
> 
>   (sort seq lessp)                       ; old-style
>   (sort seq &key key lessp destructive)  ; new-style
> 
> since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.

Do you intend to present an implementation that replaces sort-on as
well?

And what about performance?

Thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 14:09 ` Eli Zaretskii
@ 2024-03-10 14:56   ` Mattias Engdegård
  2024-03-20 19:01     ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-10 14:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69709, Stefan Monnier

10 mars 2024 kl. 15.09 skrev Eli Zaretskii <eliz@gnu.org>:

> A nit: let's go with a name that doesn't have "new" as part of it.

Sorry, that was meant as a placeholder. I should have used a more obvious non-name.

> Do you intend to present an implementation that replaces sort-on as
> well?

Yes, `sort-on` would be completely subsumed by the new interface.

> And what about performance?

Existing code should see little or no change. Users of the new interface would probably see performance improvements. You should trust me on neither of these points until there is actual code.

However the performance of the universal predicate seems fine, and is definitely better than hand-writing a predicate in Lisp when applicable.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
  2024-03-10 14:09 ` Eli Zaretskii
@ 2024-03-10 15:48 ` Dmitry Gutov
  2024-03-10 15:56   ` Mattias Engdegård
  2024-03-11  7:01 ` Gerd Möllmann
  2024-04-14 14:03 ` Aris Spathis
  3 siblings, 1 reply; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-10 15:48 UTC (permalink / raw)
  To: Mattias Engdegård, 69709

On 10/03/2024 15:28, Mattias Engdegård wrote:
> 2. Mutability by default is a bug magnet as well.
> 
> To deal with the first problem, we could:
> 
> * Add a universal ordering predicate that will compare two values of the
>   same type for many built-in types: numbers, strings, symbols, markers,
> lists, vectors, records, and a few more.
> * Make this ordering the default.
> * Add a key (accessor) function argument, like that in the recent
> `sort-on` interface, but built-in. This is important.
> 
> These work very well together: the user does not need to write or even
> choose an ordering predicate in most cases. Key functions are much less
> error-prone to write, and with the lexicographic ordering of lists,
> vectors and records, multi-key sorting is made much easier.
> 
> A key function combined with a standard ordering can also be used to
> optimise comparisons since we have all key values up front along with
> how they should be compared. The original timsort code that we stole
> from Python did this.
> 
> As a starting point, a patch implementing a universal ordering predicate
>   is attached below.
> 
> The proposed sorting function interface would be
> 
>    (new-sort seq &key key lessp destructive)

Here's a concern: a lot of existing code is written with either 
mutability in mind (the source list is a throwaway one, owned by the 
caller), or coupled with copy-sequence already.

If the new 'sort' default to non-destructive, wouldn't that make those 
existing callsites slower? Or at least some of them.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 15:48 ` Dmitry Gutov
@ 2024-03-10 15:56   ` Mattias Engdegård
  2024-03-10 16:03     ` Dmitry Gutov
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-10 15:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 69709

10 mars 2024 kl. 16.48 skrev Dmitry Gutov <dmitry@gutov.dev>:

> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
> 
> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.

No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 15:56   ` Mattias Engdegård
@ 2024-03-10 16:03     ` Dmitry Gutov
  2024-03-10 16:46       ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-10 16:03 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709

On 10/03/2024 17:56, Mattias Engdegård wrote:
> 10 mars 2024 kl. 16.48 skrev Dmitry Gutov<dmitry@gutov.dev>:
> 
>> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
>>
>> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.
> No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.

I see. That sounds more confusing, though (you drop the 'pred' argument, 
and the function changes behavior, possibly becoming slower too).





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 16:03     ` Dmitry Gutov
@ 2024-03-10 16:46       ` Mattias Engdegård
  2024-03-10 16:55         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-10 16:46 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 69709

10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry@gutov.dev>:

> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).

Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 16:46       ` Mattias Engdegård
@ 2024-03-10 16:55         ` Eli Zaretskii
  2024-03-10 17:54           ` Dmitry Gutov
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-10 16:55 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709, dmitry

> Cc: 69709@debbugs.gnu.org
> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sun, 10 Mar 2024 17:46:59 +0100
> 
> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry@gutov.dev>:
> 
> > That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
> 
> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.

I think we should delay this discussion until we see the first draft
of the code and the documentation to go with it.  As long as the issue
is in Mattias's sights, I'm sure the result will be acceptable, to say
the least.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 16:55         ` Eli Zaretskii
@ 2024-03-10 17:54           ` Dmitry Gutov
  0 siblings, 0 replies; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-10 17:54 UTC (permalink / raw)
  To: Eli Zaretskii, Mattias Engdegård; +Cc: 69709

On 10/03/2024 18:55, Eli Zaretskii wrote:
>> Cc:69709@debbugs.gnu.org
>> From: Mattias Engdegård<mattias.engdegard@gmail.com>
>> Date: Sun, 10 Mar 2024 17:46:59 +0100
>>
>> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov<dmitry@gutov.dev>:
>>
>>> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
>> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.
> I think we should delay this discussion until we see the first draft
> of the code and the documentation to go with it.  As long as the issue
> is in Mattias's sights, I'm sure the result will be acceptable, to say
> the least.

I do think it's problematic, but it's not a deal-breaker, so if everyone 
else is fine with it, let's proceed.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
  2024-03-10 14:09 ` Eli Zaretskii
  2024-03-10 15:48 ` Dmitry Gutov
@ 2024-03-11  7:01 ` Gerd Möllmann
  2024-04-14 14:03 ` Aris Spathis
  3 siblings, 0 replies; 34+ messages in thread
From: Gerd Möllmann @ 2024-03-11  7:01 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> Instead of inventing a new and rather meaningless function name, I
> suggest we re-use `sort` and allow both
>
>   (sort seq lessp)                       ; old-style
>   (sort seq &key key lessp destructive)  ; new-style

FWIW, I wouldn't like overloading the name 'sort', because it feels like
magic happening. A new name would make things explicit - no magic.

Otherwise, I can't say much. The destructiveness of sort/stable-sort has
actually never been a problem for me. It has instead often been just
right, when consing a sequence and sorting it in the end.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 14:56   ` Mattias Engdegård
@ 2024-03-20 19:01     ` Mattias Engdegård
  2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
                         ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-20 19:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69709, Gerd Möllmann, Stefan Monnier, Dmitry Gutov

Now the origin/scratch/sort-key branch contains a draft proposal. Summary:

* Our timsort has now the key function handling from the original code (ported to Emacs).
* New keyword-based calling convention for `sort`. The old one is still there and works as before.
* New `value-less-p` universal ordering predicate.
* No manual updates yet.
* NEWS entries are there.
* `sort-on` is now completely superfluous (slower, less convenient) and should be removed.
* Performance seems fine from initial tests. More comprehensive benchmarking will be done.

Some things that I haven't made up my mind about:

* Better name for `value-less-p`:
    value<
    {generic,universal,standard,lisp}{-less-p,<}

* Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.

* Should the :reverse keyword be called :reversed or even :descending ?

* Internally, `value-less-p` computes a 3-way result; it would be easy to expose that to Lisp as `value-compare`, could be useful.

* The design space for `value-less-p` is vast: the current code is an attempt at intuitive semantics without too much complexity. There are also good (and bad) arguments for:

- heterogeneous comparisons (compare objects of different types)
- total order on all values
- strict agreement with `equal`
- automatic call-out to user-defined comparison functions for classes that have them

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-20 19:01     ` Mattias Engdegård
@ 2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-21 14:55         ` Mattias Engdegård
  2024-03-21 14:54       ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-22 20:55       ` Dmitry Gutov
  2 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-20 19:37 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 69709, Gerd Möllmann, Eli Zaretskii, Dmitry Gutov

> * Better name for `value-less-p`:
>     value<
>     {generic,universal,standard,lisp}{-less-p,<}
                                     ^^
                                     ,
:-)

IOW, I vote for `<`!


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-20 19:01     ` Mattias Engdegård
  2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-21 14:54       ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-22 20:55       ` Dmitry Gutov
  2 siblings, 0 replies; 34+ messages in thread
From: Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-21 14:54 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 69709, Gerd Möllmann, Eli Zaretskii, Stefan Monnier,
	Dmitry Gutov

Hi,

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> Now the origin/scratch/sort-key branch contains a draft proposal. Summary:
>
> * Our timsort has now the key function handling from the original code (ported to Emacs).
> * New keyword-based calling convention for `sort`. The old one is still there and works as before.
> * New `value-less-p` universal ordering predicate.
> * No manual updates yet.
> * NEWS entries are there.
> * `sort-on` is now completely superfluous (slower, less convenient) and should be removed.

In case it hasn't already been mentioned, AFAICT this also applies to
`minibuffer--sort-by-key` (which is very similar to `sort-on`).

> * Performance seems fine from initial tests. More comprehensive benchmarking will be done.
>
> Some things that I haven't made up my mind about:
>
> * Better name for `value-less-p`:
>     value<
>     {generic,universal,standard,lisp}{-less-p,<}
>
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.
>
> * Should the :reverse keyword be called :reversed or even :descending ?
>
> * Internally, `value-less-p` computes a 3-way result; it would be easy
> to expose that to Lisp as `value-compare`, could be useful.
>
> * The design space for `value-less-p` is vast: the current code is an
> attempt at intuitive semantics without too much complexity.

Perhaps the "standard order" of Prolog terms programs could be an
interesting reference:
https://www.swi-prolog.org/pldoc/man?section=standardorder


Best,

Eshel





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-21 14:55         ` Mattias Engdegård
  0 siblings, 0 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-21 14:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 69709, Gerd Möllmann, Eli Zaretskii, Dmitry Gutov

20 mars 2024 kl. 20.37 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> IOW, I vote for `<`!

That may actually be within the realms of possibility. Some snags:

* Not sure if code using `<` and expect a type error for non-numbers would be confused (probably not)
* Making `value<` n-ary would slow it down a bit, unclear how much
* `value<` compares markers by buffer then position, `<` by position only
* `<` can compare markers with numbers
* We expect consistency and shared code for all of < > <= >= = /= but it's unclear how `value<` would be generalised in that direction (at least if we care about NaN correctness)

Currently, `value<` treats NaN as if it were equal to any number, which is terrible and useless but consistent with `<`. Other options would include order NaN distinct from any number, perhaps just consider all NaNs to be equal and put them before or after all numbers.

`value<` is actually quite a bit faster than `<` which has a terribly cumbersome implementation, except for pairs of fixnums which apparently got special-cased in `<` and the other inequalities fairly recently.

By the way, I ran some more benchmarks: it turns out that `car-less-than-car`, the secret shortcut that you just had to know, now gives roughly the same speed as :key #'car which is a lot more discoverable (and works for many other types).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-20 19:01     ` Mattias Engdegård
  2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-21 14:54       ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-22 20:55       ` Dmitry Gutov
  2024-03-23 14:58         ` Mattias Engdegård
  2 siblings, 1 reply; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-22 20:55 UTC (permalink / raw)
  To: Mattias Engdegård, Eli Zaretskii
  Cc: 69709, Gerd Möllmann, Stefan Monnier

On 20/03/2024 21:01, Mattias Engdegård wrote:
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.

If the function default to destructive, perhaps the argument could be 
called :copy.

:in-place is not too bad. But I've looked around and don't see many 
sorting functions in other stdlibs that do it non-destructively.

Even 'sort' in Clojure might mutate: 
https://clojuredocs.org/clojure.core/sort





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-22 20:55       ` Dmitry Gutov
@ 2024-03-23 14:58         ` Mattias Engdegård
  2024-03-23 17:39           ` Dmitry Gutov
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-23 14:58 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 69709, Gerd Möllmann, Eli Zaretskii, Stefan Monnier

22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry@gutov.dev>:

> :in-place is not too bad.

Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.

But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.

The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
`value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)

A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.

An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-23 14:58         ` Mattias Engdegård
@ 2024-03-23 17:39           ` Dmitry Gutov
  2024-03-23 20:09             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-23 17:39 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 69709, Gerd Möllmann, Eli Zaretskii, Stefan Monnier

On 23/03/2024 16:58, Mattias Engdegård wrote:
> 22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry@gutov.dev>:
> 
>> :in-place is not too bad.
> 
> Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.

Okay.

> But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.

My concern is that it will make people write less performant code by 
default. Which will degrade unnecessarily on larger inputs (often not 
anticipated by the author in advance).

So others will have to go around each such project, ensure the original 
list is "owned" and add the :in-place argument, to reach the optimum 
performance. That's why, I think, 'sort' is mutating in most languages.

I suppose an extra copy-sequence is not that expensive, compared to most 
things. It's still extra garbage (meaning GC pauses sooner or later). 
Maybe it'll become less important with a better GC algorithm.

> The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
> `value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)
> 
> A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.
> 
> An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.

Very nice.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-23 17:39           ` Dmitry Gutov
@ 2024-03-23 20:09             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-23 23:19               ` Dmitry Gutov
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-23 20:09 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: 69709, Gerd Möllmann, Mattias Engdegård, Eli Zaretskii

> So others will have to go around each such project, ensure the original list
> is "owned" and add the :in-place argument, to reach the optimum
> performance. That's why, I think, 'sort' is mutating in most languages.

I think it's mutating in "most languages" because "most languages" are
imperative and because those primitives usually operate on arrays (a
"naturally imperative" data structure) rather than on singly-linked
lists (a "naturally more immutable" data structure).


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-23 20:09             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-23 23:19               ` Dmitry Gutov
  2024-03-23 23:25                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 34+ messages in thread
From: Dmitry Gutov @ 2024-03-23 23:19 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: 69709, Gerd Möllmann, Mattias Engdegård, Eli Zaretskii

On 23/03/2024 22:09, Stefan Monnier wrote:
>> So others will have to go around each such project, ensure the original list
>> is "owned" and add the :in-place argument, to reach the optimum
>> performance. That's why, I think, 'sort' is mutating in most languages.
> I think it's mutating in "most languages" because "most languages" are
> imperative and because those primitives usually operate on arrays (a
> "naturally imperative" data structure) rather than on singly-linked
> lists (a "naturally more immutable" data structure).

There aren't many general purpose languages around which use cons cells 
and have 'sort' in stdlib for them, to compare.

To get back to the example of Clojure, which touts immutability most of 
the time, the 'sort' routine first copies a sequence into an array 
(apparently for performance - fewer indirections and better locality 
when subsequently swapping values around), and then returns that array 
with a thin wrapper called ArraySeq (which it 1 extra object, not N).

https://github.com/clojure/clojure/blob/ce55092f2b2f5481d25cff6205470c1335760ef6/src/clj/clojure/core.clj#L3103-L3118

IIUC we allocate an array internally as well during the sorting phrase, 
but then we can't return it as-is or with a thin wrapper (if only 
because the user expects back a list, not a vector), so we'll need to 
allocate a whole new list to avoid mutating the original. And our GC is 
worse than JVM's.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-23 23:19               ` Dmitry Gutov
@ 2024-03-23 23:25                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-25 11:11                   ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-23 23:25 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: 69709, Gerd Möllmann, Mattias Engdegård, Eli Zaretskii

> IIUC we allocate an array internally as well during the sorting phrase, but
> then we can't return it as-is or with a thin wrapper (if only because the
> user expects back a list, not a vector), so we'll need to allocate a whole
> new list to avoid mutating the original. And our GC is worse than JVM's.

When it matters, people can use `:in-place`.


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-23 23:25                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-25 11:11                   ` Mattias Engdegård
  2024-03-29 10:59                     ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-25 11:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 69709, Dmitry Gutov, Eli Zaretskii, Gerd Möllmann

24 mars 2024 kl. 00.25 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> When it matters, people can use `:in-place`.

Indeed, we follow the policy of being safe by default with a faster, more dangerous alternative on explicit request: `reverse` vs `nreverse`, for instance.

I'm quite happy with the result overall: safer, much easier to use, and faster, all at the same time. Many of the uses of `sort` in the Emacs tree would have benefitted a lot from the new design, especially the hand-written lambda monstrosities that try to implement multi-key comparisons.

It should also be quite feasible to supply a working (but slower) all-Lisp implementation for the `compat` package; I have some basic proof-of-concept code.

The plan is to add scratch/sort-key to master in a few days if no serious issues turn up. There are a few minor adjustments to the documentation that haven't been pushed yet.

Oh, and the embarrassing code in `value<` for comparing bool-vectors has now been sped up ~150x in my tree, if you were worried about that.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-25 11:11                   ` Mattias Engdegård
@ 2024-03-29 10:59                     ` Mattias Engdegård
  2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 12:06                       ` Eli Zaretskii
  0 siblings, 2 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-29 10:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 69709, Dmitry Gutov, Eli Zaretskii, Gerd Möllmann

25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:

> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.

Plan executed.






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 10:59                     ` Mattias Engdegård
@ 2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 11:52                         ` Mattias Engdegård
  2024-03-29 12:06                       ` Eli Zaretskii
  1 sibling, 1 reply; 34+ messages in thread
From: Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-29 11:38 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 69709, Dmitry Gutov, Eli Zaretskii, Gerd Möllmann,
	Stefan Monnier

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:
>
>> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
>
> Plan executed.

Thank you for this excellent improvement of the sort function! I would
like to back-port the new feature in the Compat library, specifically in
the emacs-30 branch of the Compat repository. I've seen you mentioned
some pure Lisp code, implementing the new feature, in a slower, but
backward compatible way, which we could perhaps use?

Daniel





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-29 11:52                         ` Mattias Engdegård
  2024-05-17 12:29                           ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-29 11:52 UTC (permalink / raw)
  To: Daniel Mendler
  Cc: 69709, Dmitry Gutov, Eli Zaretskii, Gerd Möllmann,
	Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail@daniel-mendler.de>:

> I would
> like to back-port the new feature in the Compat library, specifically in
> the emacs-30 branch of the Compat repository. I've seen you mentioned
> some pure Lisp code, implementing the new feature, in a slower, but
> backward compatible way, which we could perhaps use?

Here is my proof-of-concept hack that maybe you could use as starting point. It's incomplete but I think you can fill in the blanks. You could use the tests in fns-tests.el, perhaps adapted to fit. Let us know how it goes.


[-- Attachment #2: compatsort.el --]
[-- Type: application/octet-stream, Size: 2787 bytes --]

;;; -*- lexical-binding: t -*-

;; Quick-and-dirty compat backport of Emacs 30 `sort' keyword args to
;; older versions

(defun value< (a b)
  "Cheapo Lisp impl of Emacs 30 `value<'."
  (cond
   ((numberp a) (< a b))
   ((stringp a) (string< a b))
   ((symbolp a)
    (or (and (null a) (consp b))
        (string< a b)))
   ((consp a)
    (while (and (consp b) (equal (car a) (car b)))
      (setq a (cdr a))
      (setq b (cdr b)))
    (if (consp a)
        (if (consp b)
            (value< (car a) (car b))
          (if b
              (error "value< type mismatch: %S %S" a b)
            nil))
      (value< a b)))
   ((vectorp a)
    (if (vectorp b)
        (let* ((na (length a))
               (nb (length b))
               (n (min na nb))
               (i 0))
          (while (and (< i n) (equal (aref a i) (aref b i)))
            (setq i (1+ i)))
          (if (< i n)
              (value< (aref a i) (aref b i))
            (< n nb)))
      (error "value< type mismatch: %S %S" a b)))
   ;; FIXME: records, markers, buffers, processes, bool-vectors
   (t (if (eq (type-of a) (type-of b))
          nil
        (error "value< type mismatch: %S %S" a b)))))

(defun sort-30 (seq key lessp reverse in-place)
  (unless lessp
    (setq lessp #'value<))
  ;; careful, can't just assign to lessp here -- use a new binding
  (let ((new-lessp
         (if key
             ;; FIXME: this is a bit inefficient if `key' is expensive
             (lambda (a b) (funcall lessp (funcall key a) (funcall key b)))
           lessp)))
    (unless in-place
      (setq seq (copy-sequence seq)))
    ;; FIXME: this is actually not quite correct when sorting lists
    ;; in-place -- we should really update the original list.
    (if reverse
        (nreverse (sort (if in-place (nreverse seq) (reverse seq))
                        new-lessp))
      (sort seq new-lessp))))

(defun sort-compat-transformer (form &rest args)
  "Compiler macro for `sort' that takes Emacs 30 keyword args."
  (let ((nargs (length args)))
    (if (zerop (% nargs 2))
        form                    ; old syntax, or error -- no transform
      ;; new syntax
      (let ((seq (car args))
            (key nil)
            (in-place nil)
            (reverse nil)
            (lessp nil))
        (setq args (cdr args))
        (while (cdr args)
          (let ((kw (car args))
                (val (cadr args)))
            (pcase kw
              (:key (setq key val))
              (:lessp (setq lessp val))
              (:reverse (setq reverse val))
              (:in-place (setq in-place val))
              (_ (error "bad `sort' keyword `%s'" kw))))
          (setq args (cddr args)))
        `(sort-30 ,seq ,key ,lessp ,reverse ,in-place)))))
    
(put 'sort 'compiler-macro 'sort-compat-transformer)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 10:59                     ` Mattias Engdegård
  2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-29 12:06                       ` Eli Zaretskii
  2024-03-29 15:02                         ` Mattias Engdegård
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-29 12:06 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709, dmitry, gerd.moellmann, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 29 Mar 2024 11:59:39 +0100
> Cc: Dmitry Gutov <dmitry@gutov.dev>,
>  Eli Zaretskii <eliz@gnu.org>,
>  69709@debbugs.gnu.org,
>  Gerd Möllmann <gerd.moellmann@gmail.com>
> 
> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:
> 
> > The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
> 
> Plan executed.

Thanks.

I have a few comments/questions about value<:

 . The description of value< says "lexicographic order" about some
   types, but I think that is not clear enough, and we should tell
   explicitly how the comparison works in those cases.  AFAIU, the
   corresponding elements are compared by recursively calling value<
   for them? And what happens if one has more elements than the other?
   These are all questions that popped up when I read the new text,
   and I think they should be answered in the text.
 . AFAICT, no ordering is defined for overlays, and I wonder why.  I
   think this could be very useful; we certainly sort overlays in C in
   several places.
 . Various fine details about value< are never mentioned: the fact
   that there's a limit to recursion, the special treatment of nil in
   some cases, etc.  I think we should document them, at least in the
   doc string if not in the manual as well.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 12:06                       ` Eli Zaretskii
@ 2024-03-29 15:02                         ` Mattias Engdegård
  2024-03-29 15:35                           ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-29 15:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69709, dmitry, gerd.moellmann, monnier

29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:

> . The description of value< says "lexicographic order" about some
>   types, but I think that is not clear enough, and we should tell
>   explicitly how the comparison works in those cases.

That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?

> . AFAICT, no ordering is defined for overlays, and I wonder why.  I
>   think this could be very useful; we certainly sort overlays in C in
>   several places.

Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.

That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.

> . Various fine details about value< are never mentioned: the fact
>   that there's a limit to recursion, the special treatment of nil in
>   some cases, etc.  I think we should document them, at least in the
>   doc string if not in the manual as well.

Right -- the depth limit is very much like that of `equal` which is documented in the manual but not the doc string; I'll probably do the same for `value<`. `nil` deserves a note about its dual nature (symbol and list) as well.

Thank you for the markup fixes, by the way. One question:

> -@table @asis
> +@table @code

I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 15:02                         ` Mattias Engdegård
@ 2024-03-29 15:35                           ` Eli Zaretskii
  2024-03-29 16:13                             ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-29 15:35 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709, dmitry, gerd.moellmann, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 29 Mar 2024 16:02:01 +0100
> Cc: monnier@iro.umontreal.ca,
>  dmitry@gutov.dev,
>  69709@debbugs.gnu.org,
>  gerd.moellmann@gmail.com
> 
> 29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > . The description of value< says "lexicographic order" about some
> >   types, but I think that is not clear enough, and we should tell
> >   explicitly how the comparison works in those cases.
> 
> That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?

That'd be fine, thanks.

> > . AFAICT, no ordering is defined for overlays, and I wonder why.  I
> >   think this could be very useful; we certainly sort overlays in C in
> >   several places.
> 
> Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.
> 
> That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.

The canonical sorting order for overlays is in compare_overlays,
AFAIK.

> Thank you for the markup fixes, by the way. One question:
> 
> > -@table @asis
> > +@table @code
> 
> I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?

Makeinfo can handle @code{@var{..}} just fine, we have that in many
other places, so that's not a concern.  It is indeed possible to use
@asis in @table, and then mark each @item with @code, but that's just
too much trouble; "@table @code" exists to save us that...

I think the places where keywords don't have the @code markup are
simply wrong.  This should be most evident in the printed format of
the manual, where @code produces a distinct typeface, but doesn't have
the quotes around it.  Maybe you could produce the PDF manual and see
the difference by yourself.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 15:35                           ` Eli Zaretskii
@ 2024-03-29 16:13                             ` Mattias Engdegård
  2024-03-29 18:09                               ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Mattias Engdegård @ 2024-03-29 16:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69709, dmitry, gerd.moellmann, monnier

29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz@gnu.org>:

> The canonical sorting order for overlays is in compare_overlays,
> AFAIK.

Maybe, but I haven't see much Lisp code trying to use an order like that -- sorting overlays is common but is done with all kinds of orders. Perhaps I haven't looked very carefully.
In any case, maybe compare_overlays would work as a starting point. It seems to focus on vertical order (by priority), but much Lisp code sorts overlays horizontally (by position).

> Makeinfo can handle @code{@var{..}} just fine, we have that in many
> other places, so that's not a concern.  It is indeed possible to use
> @asis in @table, and then mark each @item with @code, but that's just
> too much trouble; "@table @code" exists to save us that...

Looking at the PDF makes me think that we should decide whether we prefer

  @code{:keyword @var{variable}}

or

  @code{:keyword} @var{variable}

in general. I can think of arguments for either but consistency would be even better.

> I think the places where keywords don't have the @code markup are
> simply wrong.

Yes, I agree.






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 16:13                             ` Mattias Engdegård
@ 2024-03-29 18:09                               ` Eli Zaretskii
  0 siblings, 0 replies; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-29 18:09 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 69709, dmitry, gerd.moellmann, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Fri, 29 Mar 2024 17:13:33 +0100
> Cc: monnier@iro.umontreal.ca,
>  dmitry@gutov.dev,
>  69709@debbugs.gnu.org,
>  gerd.moellmann@gmail.com
> 
> 29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Makeinfo can handle @code{@var{..}} just fine, we have that in many
> > other places, so that's not a concern.  It is indeed possible to use
> > @asis in @table, and then mark each @item with @code, but that's just
> > too much trouble; "@table @code" exists to save us that...
> 
> Looking at the PDF makes me think that we should decide whether we prefer
> 
>   @code{:keyword @var{variable}}
> 
> or
> 
>   @code{:keyword} @var{variable}
> 
> in general. I can think of arguments for either but consistency would be even better.

The former is the correct one.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
                   ` (2 preceding siblings ...)
  2024-03-11  7:01 ` Gerd Möllmann
@ 2024-04-14 14:03 ` Aris Spathis
  2024-04-14 16:26   ` Eli Zaretskii
  3 siblings, 1 reply; 34+ messages in thread
From: Aris Spathis @ 2024-04-14 14:03 UTC (permalink / raw)
  To: mattias.engdegard; +Cc: 69709


[-- Attachment #1.1: Type: text/plain, Size: 1170 bytes --]

Thank you for your excellent work on `sort` and related functionality!

Unfortunately, the new `sort` implementation occasionally crashes with a
segfault. The following code reproduces that in current master:

(dotimes (i 500)
  (sort (make-list 128 42)
        :key (lambda (n) (make-list i n))))

It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
currently) along with a non-NIL `:key` function. In such cases, a
`Lisp_Object` array is explicitly heap-allocated to store the keys, which is
never marked against GC. This would not be a problem if not for the fact
that
the `:key` function call may trigger GC.

I'm attaching a patch with a proposed solution, consisting of three changes:

1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
   ensures its objects can be marked properly.

2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
   `ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
   under this exact condition.

3. Move the computation of keys (which may trigger a GC) after `merge_init`,
   which registers `merge_markmem`.

I hope this is useful.

Cheers,
Aris

[-- Attachment #1.2: Type: text/html, Size: 1327 bytes --]

[-- Attachment #2: sort-segfault-fix.patch --]
[-- Type: text/x-patch, Size: 1664 bytes --]

diff --git a/src/sort.c b/src/sort.c
index 527d5550342..562f88ede3c 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -532,6 +532,8 @@ merge_markmem (void *arg)
   merge_state *ms = arg;
   eassume (ms != NULL);
 
+  mark_objects (ms->allocated_keys, ms->listlen);
+
   if (ms->reloc.size != NULL && *ms->reloc.size > 0)
     {
       Lisp_Object *src = (ms->reloc.src->values
@@ -1107,21 +1109,29 @@ tim_sort (Lisp_Object predicate, Lisp_Object keyfunc,
       if (length < MERGESTATE_TEMP_SIZE / 2)
 	keys = &ms.temparray[length + 1];
       else
-	keys = allocated_keys = xmalloc (length * word_size);
-
-      for (ptrdiff_t i = 0; i < length; i++)
-	keys[i] = call1 (keyfunc, seq[i]);
+	{
+	  /* Allocate with xzalloc to obtain an array of valid
+	     Lisp_Objects (Qnils), so that they can be marked. */
+	  verify (NIL_IS_ZERO);
+	  keys = allocated_keys = xzalloc (length * word_size);
+	}
 
       lo.keys = keys;
       lo.values = seq;
     }
 
+  merge_init (&ms, length, allocated_keys, &lo, predicate);
+
+  /* Compute keys after merge_markmem has been registered by merge_init
+     (any call to keyfunc might trigger a GC). */
+  if (!NILP (keyfunc))
+    for (ptrdiff_t i = 0; i < length; i++)
+      keys[i] = call1 (keyfunc, seq[i]);
+
   /* FIXME: This is where we would check the keys for interesting
      properties for more optimised comparison (such as all being fixnums
      etc).  */
 
-  merge_init (&ms, length, allocated_keys, &lo, predicate);
-
   /* March over the array once, left to right, finding natural runs,
      and extending short natural runs to minrun elements.  */
   const ptrdiff_t minrun = merge_compute_minrun (length);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-04-14 14:03 ` Aris Spathis
@ 2024-04-14 16:26   ` Eli Zaretskii
  2024-04-14 16:33     ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-04-14 16:26 UTC (permalink / raw)
  To: Aris Spathis, Mattias Engdegård; +Cc: 69709, mattias.engdegard

> Cc: 69709@debbugs.gnu.org
> From: Aris Spathis <agspathis@gmail.com>
> Date: Sun, 14 Apr 2024 17:03:11 +0300
> 
> Thank you for your excellent work on `sort` and related functionality!
> 
> Unfortunately, the new `sort` implementation occasionally crashes with a
> segfault. The following code reproduces that in current master:
> 
> (dotimes (i 500)
>   (sort (make-list 128 42)
>         :key (lambda (n) (make-list i n))))
> 
> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
> currently) along with a non-NIL `:key` function. In such cases, a
> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
> never marked against GC. This would not be a problem if not for the fact that
> the `:key` function call may trigger GC.
> 
> I'm attaching a patch with a proposed solution, consisting of three changes:
> 
> 1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
>    ensures its objects can be marked properly.
> 
> 2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
>    `ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
>    under this exact condition.
> 
> 3. Move the computation of keys (which may trigger a GC) after `merge_init`,
>    which registers `merge_markmem`.
> 
> I hope this is useful.

Thanks, I took the liberty of CC'ing Mattias using his "usual"
address, in the hope that it will help bring this to his attention
sooner.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-04-14 16:26   ` Eli Zaretskii
@ 2024-04-14 16:33     ` Mattias Engdegård
  0 siblings, 0 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-04-14 16:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69709, Aris Spathis

14 apr. 2024 kl. 18.26 skrev Eli Zaretskii <eliz@gnu.org>:

>> Thank you for your excellent work on `sort` and related functionality!

Apparently not excellent enough!

>> Unfortunately, the new `sort` implementation occasionally crashes with a
>> segfault. The following code reproduces that in current master:
>> 
>> (dotimes (i 500)
>>  (sort (make-list 128 42)
>>        :key (lambda (n) (make-list i n))))
>> 
>> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
>> currently) along with a non-NIL `:key` function. In such cases, a
>> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
>> never marked against GC. This would not be a problem if not for the fact that
>> the `:key` function call may trigger GC.

Oh dear, what a silly mistake. Many thanks for spotting this and providing the patch, which I applied with a few minor adjustments (and a regression test).






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-03-29 11:52                         ` Mattias Engdegård
@ 2024-05-17 12:29                           ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-05-17 17:49                             ` Mattias Engdegård
  0 siblings, 1 reply; 34+ messages in thread
From: Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-05-17 12:29 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 69709, ~pkal/compat-devel, Gerd Möllmann, Dmitry Gutov,
	Stefan Monnier, Eli Zaretskii

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail@daniel-mendler.de>:
>
>> I would
>> like to back-port the new feature in the Compat library, specifically in
>> the emacs-30 branch of the Compat repository. I've seen you mentioned
>> some pure Lisp code, implementing the new feature, in a slower, but
>> backward compatible way, which we could perhaps use?
>
> Here is my proof-of-concept hack that maybe you could use as starting point.
> It's incomplete but I think you can fill in the blanks. You could use the tests
> in fns-tests.el, perhaps adapted to fit. Let us know how it goes.

Just letting you know that I've implemented value< and sort in the
emacs-30 branch of Compat. It works well so far, but value< is not yet
completely implemented. See the following commit:
https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770

There is one thing I'd like to ask about value<. Would it make sense to
support comparing mixed types, e.g., numbers and markers or strings and
symbols? In particular I often like to mix numbers and markers, in
particular in cases where we can avoid creating a marker for performance
reasons if the pure position is sufficient and the location doesn't
change due to edits.

Daniel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#69709: `sort` interface improvement and universal ordering predicate
  2024-05-17 12:29                           ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-05-17 17:49                             ` Mattias Engdegård
  0 siblings, 0 replies; 34+ messages in thread
From: Mattias Engdegård @ 2024-05-17 17:49 UTC (permalink / raw)
  To: Daniel Mendler
  Cc: 69709, ~pkal/compat-devel, Gerd Möllmann, Dmitry Gutov,
	Stefan Monnier, Eli Zaretskii

17 maj 2024 kl. 14.29 skrev Daniel Mendler <mail@daniel-mendler.de>:

> Just letting you know that I've implemented value< and sort in the
> emacs-30 branch of Compat. It works well so far, but value< is not yet
> completely implemented. See the following commit:
> https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770

An excellent start! I'll post comments on the source site.

> There is one thing I'd like to ask about value<. Would it make sense to
> support comparing mixed types, e.g., numbers and markers or strings and
> symbols?

I went back and forth quite a bit, but decided to start small homogeneous comparisons which would at least give the option to extend to heterogeneous (or in your case, ad-hoc mixed) comparisons later on, without committing too much to any particular design until we have more experience.

When it comes to mixing numbers and markers, I'm still leaning against it. `<` allows mixed comparisons but it doesn't work well with markers from different buffers. `value<` orders markers by grouping them by buffer which feels more useful.

(I've never been a great fan of the way elisp allows comparison and arithmetic on markers and numbers; it probably seemed to be a good idea at the time. That feature seems to have caused more muddled than clear code.)

Same with strings and symbols: I can't remember having seen a collection where I would have wanted them to be sorted together, and `nil` being a symbol means that we would lose a useful error check.

There is a more tempting case for a truly heterogeneous comparisons, like a universal total order. Some other languages have this but I think it is more difficult to construct with so much legacy. I went with unifying all numbers for pragmatic reasons even though -0.0 and NaN are terrible.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-05-17 17:49 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-10 13:28 bug#69709: `sort` interface improvement and universal ordering predicate Mattias Engdegård
2024-03-10 14:09 ` Eli Zaretskii
2024-03-10 14:56   ` Mattias Engdegård
2024-03-20 19:01     ` Mattias Engdegård
2024-03-20 19:37       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-21 14:55         ` Mattias Engdegård
2024-03-21 14:54       ` Eshel Yaron via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-22 20:55       ` Dmitry Gutov
2024-03-23 14:58         ` Mattias Engdegård
2024-03-23 17:39           ` Dmitry Gutov
2024-03-23 20:09             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-23 23:19               ` Dmitry Gutov
2024-03-23 23:25                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-25 11:11                   ` Mattias Engdegård
2024-03-29 10:59                     ` Mattias Engdegård
2024-03-29 11:38                       ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-29 11:52                         ` Mattias Engdegård
2024-05-17 12:29                           ` Daniel Mendler via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-05-17 17:49                             ` Mattias Engdegård
2024-03-29 12:06                       ` Eli Zaretskii
2024-03-29 15:02                         ` Mattias Engdegård
2024-03-29 15:35                           ` Eli Zaretskii
2024-03-29 16:13                             ` Mattias Engdegård
2024-03-29 18:09                               ` Eli Zaretskii
2024-03-10 15:48 ` Dmitry Gutov
2024-03-10 15:56   ` Mattias Engdegård
2024-03-10 16:03     ` Dmitry Gutov
2024-03-10 16:46       ` Mattias Engdegård
2024-03-10 16:55         ` Eli Zaretskii
2024-03-10 17:54           ` Dmitry Gutov
2024-03-11  7:01 ` Gerd Möllmann
2024-04-14 14:03 ` Aris Spathis
2024-04-14 16:26   ` Eli Zaretskii
2024-04-14 16:33     ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).