unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65491: [PATCH] Improve performance allocating vectors
@ 2023-08-24  9:59 Ihor Radchenko
  2023-08-26  7:14 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-24  9:59 UTC (permalink / raw)
  To: 65491

[-- Attachment #1: Type: text/plain, Size: 7044 bytes --]

Tags: patch

Hi,

Following up the bignum performance discussion in
https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost

This patch adds a heuristic that reduces the time spent searching
`vector_free_lists' when trying to allocate a new vector.

`vector_free_lists' is a rather long array with few hundreds of
elements. And it does not make sense to check the whole array in
`allocate_vector_from_block' if we can get information about free vector
that was recently made available of if we know for sure that no free
vectors are available (after GC).

In the patch, we start searching `vector_free_lists' no earlier than the
last known index of the available free vector, or skip the search
entirely when there is known index (after sweep).

The described approach may sometimes miss free vectors in
`vector_free_lists', especially if the allocation happens from larger
vector to smaller. The cost will be slightly higher memory consumption -
no larger than VECTOR_MAX_FREE_LIST_INDEX extra free vectors.

With the patch, CPU time spent allocating new vectors in the fib.eln
test from https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost drops 10x.

Also, the patch gives a noticeable improvement when running
elisp-benchmarks (ELPA package). The overall speedup is around 10%
(including unaffected tests).

No single test gets worse within error margins and the following tests
get a significant speedup:

- eieio        1.32±0.04 -> 1.04±0.03
- pack-unpack  0.40±0.00 -> 0.35±0.01
- pidigits     6.00±0.06 -> 4.08±0.12

pidigits is no surprise as it likely uses bignums.

* Results without the patch

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.68 |       0.06 |       1 |        0.73 |            0.05 |
  | bubble-no-cons     |           1.17 |       0.00 |       0 |        1.17 |            0.07 |
  | bytecomp           |           1.64 |       0.32 |      13 |        1.95 |            0.03 |
  | dhrystone          |           2.13 |       0.00 |       0 |        2.13 |            0.02 |
  | eieio              |           1.19 |       0.13 |       7 |        1.32 |            0.04 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.47 |       0.00 |       0 |        1.47 |            0.04 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.41 |       0.00 |       0 |        1.41 |            0.03 |
  | inclist            |           0.84 |       0.00 |       0 |        0.84 |            0.03 |
  | inclist-type-hints |           0.76 |       0.00 |       0 |        0.76 |            0.00 |
  | listlen-tc         |           0.12 |       0.00 |       0 |        0.12 |            0.01 |
  | map-closure        |           5.25 |       0.00 |       0 |        5.25 |            0.02 |
  | nbody              |           1.47 |       0.15 |       1 |        1.62 |            0.07 |
  | pack-unpack        |           0.38 |       0.02 |       1 |        0.40 |            0.00 |
  | pack-unpack-old    |           1.13 |       0.05 |       3 |        1.19 |            0.03 |
  | pcase              |           1.77 |       0.00 |       0 |        1.77 |            0.01 |
  | pidigits           |           5.04 |       0.97 |      17 |        6.00 |            0.06 |
  | scroll             |           0.58 |       0.00 |       0 |        0.58 |            0.02 |
  | smie               |           1.47 |       0.05 |       2 |        1.52 |            0.02 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          28.49 |       1.74 |      45 |       30.23 |            0.16 |

* Results with the patch

  | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | bubble             |           0.69 |       0.05 |       1 |        0.74 |            0.07 |
  | bubble-no-cons     |           1.03 |       0.00 |       0 |        1.03 |            0.02 |
  | bytecomp           |           1.45 |       0.25 |      13 |        1.70 |            0.10 |
  | dhrystone          |           1.98 |       0.00 |       0 |        1.98 |            0.05 |
  | eieio              |           0.92 |       0.12 |       7 |        1.04 |            0.03 |
  | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-named-let     |           1.45 |       0.00 |       0 |        1.45 |            0.12 |
  | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
  | flet               |           1.39 |       0.00 |       0 |        1.39 |            0.07 |
  | inclist            |           0.83 |       0.00 |       0 |        0.83 |            0.03 |
  | inclist-type-hints |           0.77 |       0.00 |       0 |        0.77 |            0.03 |
  | listlen-tc         |           0.10 |       0.00 |       0 |        0.10 |            0.00 |
  | map-closure        |           5.25 |       0.00 |       0 |        5.25 |            0.37 |
  | nbody              |           1.45 |       0.16 |       1 |        1.60 |            0.05 |
  | pack-unpack        |           0.33 |       0.02 |       1 |        0.35 |            0.01 |
  | pack-unpack-old    |           1.07 |       0.06 |       3 |        1.13 |            0.08 |
  | pcase              |           1.78 |       0.00 |       0 |        1.78 |            0.05 |
  | pidigits           |           3.15 |       0.92 |      17 |        4.08 |            0.12 |
  | scroll             |           0.55 |       0.00 |       0 |        0.55 |            0.01 |
  | smie               |           1.45 |       0.04 |       2 |        1.49 |            0.06 |
  |--------------------+----------------+------------+---------+-------------+-----------------|
  | total              |          25.63 |       1.62 |      45 |       27.25 |            0.45 |


In GNU Emacs 30.0.50 (build 54, x86_64-pc-linux-gnu, GTK+ Version
 3.24.38, cairo version 1.17.8) of 2023-08-22 built on localhost
Repository revision: c09d78f3c0818d7391760e84f94a442e8beb22dd
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101008
System Description: Gentoo Linux

Configured using:
 'configure --with-native-compilation
 JAVAC=/etc/java-config-2/current-system-vm/bin/javac'


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Improve-performance-allocating-vectors.patch --]
[-- Type: text/patch, Size: 2613 bytes --]

From 476db65dc780d8c501d73dfaaf92c53eb9c73b2f Mon Sep 17 00:00:00 2001
Message-ID: <476db65dc780d8c501d73dfaaf92c53eb9c73b2f.1692869114.git.yantar92@posteo.net>
From: Ihor Radchenko <yantar92@posteo.net>
Date: Thu, 24 Aug 2023 11:05:18 +0300
Subject: [PATCH] Improve performance allocating vectors

* src/alloc.c (last_known_vector_free_idx): New global variable
holding the last known free allocated vector index in
`vector_free_lists'.
(setup_on_free_list): Record the vector index when adding a new
allocated vector to `vector_free_lists'.
(allocate_vector_from_block): Add heuristics searching for an existing
allocated vector no earlier than `last_known_vector_free_idx'.  This
heuristics may sometimes skip available allocated vectors, but reduces
the number of the `for' loop iterations significantly.
(sweep_vectors): Set `last_known_vector_free_idx' to large value - no
free allocated vectors remain available after sweep operation.

Link: https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost
---
 src/alloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/alloc.c b/src/alloc.c
index c77bdc6372d..4293de2bdf3 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3145,6 +3145,7 @@ large_vector_vec (struct large_vector *p)
 
 static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
 
+static ptrdiff_t last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
 /* Singly-linked list of large vectors.  */
 
 static struct large_vector *large_vectors;
@@ -3180,6 +3181,7 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t nbytes)
   set_next_vector (v, vector_free_lists[vindex]);
   ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
   vector_free_lists[vindex] = v;
+  last_known_vector_free_idx = vindex;
 }
 
 /* Get a new vector block.  */
@@ -3234,7 +3236,7 @@ allocate_vector_from_block (ptrdiff_t nbytes)
   /* Next, check free lists containing larger vectors.  Since
      we will split the result, we should have remaining space
      large enough to use for one-slot vector at least.  */
-  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
+  for (index = max (VINDEX (nbytes + VBLOCK_BYTES_MIN), last_known_vector_free_idx);
        index < VECTOR_MAX_FREE_LIST_INDEX; index++)
     if (vector_free_lists[index])
       {
@@ -3426,6 +3428,7 @@ sweep_vectors (void)
   gcstat.total_vectors = 0;
   gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
   memset (vector_free_lists, 0, sizeof (vector_free_lists));
+  last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
 
   /* Looking through vector blocks.  */
 
-- 
2.41.0


[-- Attachment #3: Type: text/plain, Size: 224 bytes --]


-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-24  9:59 bug#65491: [PATCH] Improve performance allocating vectors Ihor Radchenko
@ 2023-08-26  7:14 ` Eli Zaretskii
  2023-08-26  7:27   ` Ihor Radchenko
  2023-08-26 12:01 ` Mattias Engdegård
  2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-08-26  7:14 UTC (permalink / raw)
  To: Ihor Radchenko, Stefan Monnier; +Cc: 65491

> From: Ihor Radchenko <yantar92@posteo.net>
> Date: Thu, 24 Aug 2023 09:59:33 +0000
> 
> Following up the bignum performance discussion in
> https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost
> 
> This patch adds a heuristic that reduces the time spent searching
> `vector_free_lists' when trying to allocate a new vector.
> 
> `vector_free_lists' is a rather long array with few hundreds of
> elements. And it does not make sense to check the whole array in
> `allocate_vector_from_block' if we can get information about free vector
> that was recently made available of if we know for sure that no free
> vectors are available (after GC).
> 
> In the patch, we start searching `vector_free_lists' no earlier than the
> last known index of the available free vector, or skip the search
> entirely when there is known index (after sweep).
> 
> The described approach may sometimes miss free vectors in
> `vector_free_lists', especially if the allocation happens from larger
> vector to smaller. The cost will be slightly higher memory consumption -
> no larger than VECTOR_MAX_FREE_LIST_INDEX extra free vectors.
> 
> With the patch, CPU time spent allocating new vectors in the fib.eln
> test from https://yhetil.org/emacs-devel/87bkfdsmde.fsf@localhost drops 10x.
> 
> Also, the patch gives a noticeable improvement when running
> elisp-benchmarks (ELPA package). The overall speedup is around 10%
> (including unaffected tests).
> 
> No single test gets worse within error margins and the following tests
> get a significant speedup:
> 
> - eieio        1.32±0.04 -> 1.04±0.03
> - pack-unpack  0.40±0.00 -> 0.35±0.01
> - pidigits     6.00±0.06 -> 4.08±0.12
> 
> pidigits is no surprise as it likely uses bignums.
> 
> * Results without the patch
> 
>   | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble             |           0.68 |       0.06 |       1 |        0.73 |            0.05 |
>   | bubble-no-cons     |           1.17 |       0.00 |       0 |        1.17 |            0.07 |
>   | bytecomp           |           1.64 |       0.32 |      13 |        1.95 |            0.03 |
>   | dhrystone          |           2.13 |       0.00 |       0 |        2.13 |            0.02 |
>   | eieio              |           1.19 |       0.13 |       7 |        1.32 |            0.04 |
>   | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | fibn-named-let     |           1.47 |       0.00 |       0 |        1.47 |            0.04 |
>   | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | flet               |           1.41 |       0.00 |       0 |        1.41 |            0.03 |
>   | inclist            |           0.84 |       0.00 |       0 |        0.84 |            0.03 |
>   | inclist-type-hints |           0.76 |       0.00 |       0 |        0.76 |            0.00 |
>   | listlen-tc         |           0.12 |       0.00 |       0 |        0.12 |            0.01 |
>   | map-closure        |           5.25 |       0.00 |       0 |        5.25 |            0.02 |
>   | nbody              |           1.47 |       0.15 |       1 |        1.62 |            0.07 |
>   | pack-unpack        |           0.38 |       0.02 |       1 |        0.40 |            0.00 |
>   | pack-unpack-old    |           1.13 |       0.05 |       3 |        1.19 |            0.03 |
>   | pcase              |           1.77 |       0.00 |       0 |        1.77 |            0.01 |
>   | pidigits           |           5.04 |       0.97 |      17 |        6.00 |            0.06 |
>   | scroll             |           0.58 |       0.00 |       0 |        0.58 |            0.02 |
>   | smie               |           1.47 |       0.05 |       2 |        1.52 |            0.02 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total              |          28.49 |       1.74 |      45 |       30.23 |            0.16 |
> 
> * Results with the patch
> 
>   | test               | non-gc avg (s) | gc avg (s) | gcs avg | tot avg (s) | tot avg err (s) |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | bubble             |           0.69 |       0.05 |       1 |        0.74 |            0.07 |
>   | bubble-no-cons     |           1.03 |       0.00 |       0 |        1.03 |            0.02 |
>   | bytecomp           |           1.45 |       0.25 |      13 |        1.70 |            0.10 |
>   | dhrystone          |           1.98 |       0.00 |       0 |        1.98 |            0.05 |
>   | eieio              |           0.92 |       0.12 |       7 |        1.04 |            0.03 |
>   | fibn               |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | fibn-named-let     |           1.45 |       0.00 |       0 |        1.45 |            0.12 |
>   | fibn-rec           |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | fibn-tc            |           0.00 |       0.00 |       0 |        0.00 |            0.00 |
>   | flet               |           1.39 |       0.00 |       0 |        1.39 |            0.07 |
>   | inclist            |           0.83 |       0.00 |       0 |        0.83 |            0.03 |
>   | inclist-type-hints |           0.77 |       0.00 |       0 |        0.77 |            0.03 |
>   | listlen-tc         |           0.10 |       0.00 |       0 |        0.10 |            0.00 |
>   | map-closure        |           5.25 |       0.00 |       0 |        5.25 |            0.37 |
>   | nbody              |           1.45 |       0.16 |       1 |        1.60 |            0.05 |
>   | pack-unpack        |           0.33 |       0.02 |       1 |        0.35 |            0.01 |
>   | pack-unpack-old    |           1.07 |       0.06 |       3 |        1.13 |            0.08 |
>   | pcase              |           1.78 |       0.00 |       0 |        1.78 |            0.05 |
>   | pidigits           |           3.15 |       0.92 |      17 |        4.08 |            0.12 |
>   | scroll             |           0.55 |       0.00 |       0 |        0.55 |            0.01 |
>   | smie               |           1.45 |       0.04 |       2 |        1.49 |            0.06 |
>   |--------------------+----------------+------------+---------+-------------+-----------------|
>   | total              |          25.63 |       1.62 |      45 |       27.25 |            0.45 |
> 

Stefan, any comments?

my comment is that the savings are quite small, so it seems, so I'm
not sure we should install this.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:14 ` Eli Zaretskii
@ 2023-08-26  7:27   ` Ihor Radchenko
  2023-08-26  7:31     ` Eli Zaretskii
  2023-08-26  7:47     ` Ihor Radchenko
  0 siblings, 2 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-26  7:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, Stefan Monnier

Eli Zaretskii <eliz@gnu.org> writes:

> my comment is that the savings are quite small, so it seems, so I'm
> not sure we should install this.

Overall saving are small because the effect is blurred by the code that
does not allocate vectors - my patch has 0 effect on such code.

Calculations involving vectors are pidigits that is 30% faster and
bignum version of fib.el - over 50% faster.

If you know any useful code that makes heavy use of vector allocation, I
can also benchmark it.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:27   ` Ihor Radchenko
@ 2023-08-26  7:31     ` Eli Zaretskii
  2023-08-26  7:51       ` Ihor Radchenko
  2023-08-26  7:47     ` Ihor Radchenko
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-08-26  7:31 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65491, monnier

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, 65491@debbugs.gnu.org
> Date: Sat, 26 Aug 2023 07:27:27 +0000
> 
> If you know any useful code that makes heavy use of vector allocation, I
> can also benchmark it.

Look in the area of encoding/decoding and automatic compositions --
these tend to use vectors quite a lot.  For example, rendering text
that uses a script where most of characters are composed, such as
Arabic or Hangul (Korean) should allocate vectors.  Note that this is
the case where we call Lisp from C display code.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:27   ` Ihor Radchenko
  2023-08-26  7:31     ` Eli Zaretskii
@ 2023-08-26  7:47     ` Ihor Radchenko
  1 sibling, 0 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-26  7:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, Stefan Monnier

Ihor Radchenko <yantar92@posteo.net> writes:

> If you know any useful code that makes heavy use of vector allocation, I
> can also benchmark it.

One example is the newer versions of Org parser, where I know for sure
that vector allocation is used. Although the code is mostly doing regexp
search for parsing, so a better real world example would be still useful.

I ran a quick parser run on a large Org file using

(let ((large-file-warning-threshold nil)
      (gc-cons-threshold most-positive-fixnum)
      (org-inhibit-startup t)
      (org-element-cache-persistent nil))
  (with-current-buffer (find-file-noselect "~/Org/notes.org")
    (message "%s" (benchmark-call (lambda () (org-element-parse-buffer nil nil 'defer))))))

Without the patch:

$ perf record ./src/emacs -Q -batch -L ~/Git/org-mode/lisp -L ~/.emacs.d/eln-cache/30.0.50-17ebec90/ -l org-element -l /tmp/test.el
(15.724130802000001 1 0.429624333)

without the patch:

    26.60%  emacs    emacs                           [.] exec_byte_code
    12.40%  emacs    emacs                           [.] re_match_2_internal
     8.70%  emacs    emacs                           [.] re_search_2
     8.19%  emacs    emacs                           [.] re_compile_pattern
     3.73%  emacs    emacs                           [.] re_iswctype
     3.60%  emacs    emacs                           [.] funcall_subr
     3.08%  emacs    emacs                           [.] allocate_vectorlike
     2.70%  emacs    emacs                           [.] plist_get
     1.80%  emacs    emacs                           [.] buf_charpos_to_bytepos

Vector allocation takes about 3% of the time.

-----------
with the patch, vector allocation contributions drops to 0.42% - 6x
decrease.
-----------

$ perf record ./src/emacs -Q -batch -L ~/Git/org-mode/lisp -L ~/.emacs.d/eln-cache/30.0.50-17ebec90/ -l org-element -l /tmp/test.el
(15.100695088 1 0.43508134400000004)

    27.06%  emacs    emacs                                [.] exec_byte_code
    12.96%  emacs    emacs                                [.] re_match_2_internal
     8.90%  emacs    emacs                                [.] re_search_2
     8.55%  emacs    emacs                                [.] re_compile_pattern
     3.68%  emacs    emacs                                [.] re_iswctype
     3.57%  emacs    emacs                                [.] funcall_subr
     2.80%  emacs    emacs                                [.] plist_get
...
     0.42%  emacs    emacs                                [.] allocate_vectorlike

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:31     ` Eli Zaretskii
@ 2023-08-26  7:51       ` Ihor Radchenko
  2023-08-26  8:07         ` Ihor Radchenko
  2023-08-26  9:01         ` Eli Zaretskii
  0 siblings, 2 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-26  7:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, monnier

Eli Zaretskii <eliz@gnu.org> writes:

>> If you know any useful code that makes heavy use of vector allocation, I
>> can also benchmark it.
>
> Look in the area of encoding/decoding and automatic compositions --
> these tend to use vectors quite a lot.  For example, rendering text
> that uses a script where most of characters are composed, such as
> Arabic or Hangul (Korean) should allocate vectors.  Note that this is
> the case where we call Lisp from C display code.

Do you mean something like scrolling performance when scrolling a large
Arabic/Korean text file?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:51       ` Ihor Radchenko
@ 2023-08-26  8:07         ` Ihor Radchenko
  2023-08-26  9:01         ` Eli Zaretskii
  1 sibling, 0 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-26  8:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, monnier

Ihor Radchenko <yantar92@posteo.net> writes:

> Do you mean something like scrolling performance when scrolling a large
> Arabic/Korean text file?

I downloaded Quran from https://tanzil.net/download/, opened it Emacs
like

$ perf record ./src/emacs -Q ~/Downloads/quran-simple-plain.txt

and scrolled the buffer with PgDown all the way to the bottom.
It does not look like allocation vectors is all that heavy here
(compared to the edge case with fib.el with bignums).

without the patch:

    10.14%  emacs           emacs                            [.] process_mark_stack
     3.79%  emacs           emacs                            [.] sub_char_table_ref
     3.68%  emacs           emacs                            [.] x_draw_glyph_string
     2.75%  emacs           libc.so.6                        [.] pthread_mutex_lock
     1.75%  emacs           libc.so.6                        [.] 0x0000000000088b0a
->   1.41%  emacs           emacs                            [.] allocate_vectorlike
     1.30%  emacs           emacs                            [.] sweep_vectors
     1.27%  emacs           emacs                            [.] char_table_ref

with the patch

     9.97%  emacs           emacs                             [.] process_mark_stack
     4.30%  emacs           emacs                             [.] x_draw_glyph_string
     3.58%  emacs           emacs                             [.] sub_char_table_ref
     2.66%  emacs           libc.so.6                         [.] pthread_mutex_lock
     1.87%  emacs           libc.so.6                         [.] 0x0000000000088b0a
     1.29%  emacs           emacs                             [.] sweep_vectors
     1.22%  emacs           emacs                             [.] face_for_font
     1.17%  emacs           emacs                             [.] char_table_ref
...
->   0.26%  emacs           emacs                             [.] allocate_vectorlike

1.41% down to 0.26% ~5x speedup.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26  7:51       ` Ihor Radchenko
  2023-08-26  8:07         ` Ihor Radchenko
@ 2023-08-26  9:01         ` Eli Zaretskii
  1 sibling, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2023-08-26  9:01 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65491, monnier

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: monnier@iro.umontreal.ca, 65491@debbugs.gnu.org
> Date: Sat, 26 Aug 2023 07:51:50 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> If you know any useful code that makes heavy use of vector allocation, I
> >> can also benchmark it.
> >
> > Look in the area of encoding/decoding and automatic compositions --
> > these tend to use vectors quite a lot.  For example, rendering text
> > that uses a script where most of characters are composed, such as
> > Arabic or Hangul (Korean) should allocate vectors.  Note that this is
> > the case where we call Lisp from C display code.
> 
> Do you mean something like scrolling performance when scrolling a large
> Arabic/Korean text file?

Yes.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-24  9:59 bug#65491: [PATCH] Improve performance allocating vectors Ihor Radchenko
  2023-08-26  7:14 ` Eli Zaretskii
@ 2023-08-26 12:01 ` Mattias Engdegård
  2023-08-26 14:54   ` Ihor Radchenko
  2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 2 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-08-26 12:01 UTC (permalink / raw)
  To: 65491; +Cc: Eli Zaretskii, Stefan Monnier

First of all, let's leave bignums out of this entirely. They are not relevant here.

Second, please do not motivate any perceived performance problem from benchmarks found on the internet, especially anything derived from the Gabriel benchmarks. This includes the benchmarks in ELPA.

That said, vectorlike object allocation in general is definitely relevant and can certainly be improved but I'm not persuaded by the proposed patch. Please do not apply it right away.

However, the important part is not the patch but the problem it highlights, and here there is evidently plenty to do. 

For example:

- isn't vector_free_list twice as big as it needs to be?
- 4096 bytes for vector blocks seems a tad small
- to what extent are we duplicating the work done by modern libc allocators (very generously including glibc here)?
- not happy with the way struct vector_block co-allocates metadata with a big power-of-two data chunk
- next_vector is a dangerously unstable concoction of C undefined behaviour

and so on.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26 12:01 ` Mattias Engdegård
@ 2023-08-26 14:54   ` Ihor Radchenko
  2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-26 14:54 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, Eli Zaretskii, Stefan Monnier

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> First of all, let's leave bignums out of this entirely. They are not relevant here.
>
> Second, please do not motivate any perceived performance problem from
> benchmarks found on the internet, especially anything derived from the
> Gabriel benchmarks. This includes the benchmarks in ELPA.

> That said, vectorlike object allocation in general is definitely
> relevant and can certainly be improved but I'm not persuaded by the
> proposed patch. Please do not apply it right away.

I have no problem with this and I have supplied more relevant benchmarks
with Org and with composition (as suggested by Eli).

Of course, the problem is not with bignums - it just revealed the
inefficiency with array iteration I tried to address. If more can be
done with vectorlike allocation, it will be even better.

> However, the important part is not the patch but the problem it highlights, and here there is evidently plenty to do. 
>
> For example:
>
> - isn't vector_free_list twice as big as it needs to be?

AFAIR, trying to reduce this array size was the first thing I tried.
When I touched VECTOR_BLOCK_SIZE, I got segfaults and compilation
failures. (Do note that I am missing understanding about the motivation
behind this constant).

> - to what extent are we duplicating the work done by modern libc allocators (very generously including glibc here)?
> - next_vector is a dangerously unstable concoction of C undefined behaviour

Isn't vector_free_list following the pattern used across alloc.c? For
example, Fcons uses a similar idea with holding pre-allocated memory as
a chain of pointers.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26 12:01 ` Mattias Engdegård
  2023-08-26 14:54   ` Ihor Radchenko
@ 2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-27  9:54     ` Mattias Engdegård
  2023-09-16 14:58     ` Mattias Engdegård
  1 sibling, 2 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-08-26 14:55 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, Eli Zaretskii

> - isn't vector_free_list twice as big as it needs to be?
> - 4096 bytes for vector blocks seems a tad small

AFAIK this number was chosen arbitrarily and we never tried to tune it,
so there's probably room for improvement, indeed.
Note that many vectors will be small (they're really Lisp structs), tho.

> - to what extent are we duplicating the work done by modern libc allocators
> (very generously including glibc here)?

We are exactly duplicating that code (and poorly so, as is the rule),
but there's a reason for it: we need to allocate those vectors within
specifically marked/known address ranges so that our conservative
stack scanning can distinguish "potential valid pointer to a vector"
from "definitely not a pointer to a vector".

It would be nice if we could more or less literally copy the code of
a good malloc library.

[ The previous vector allocation code was even more costly because every
  vector allocation came together with allocation and insertion of a new
  node in the red-black tree where we keep track of which ranges of
  memory hold which kind of data.  ]

It's important to improve efficiency of vector allocation because it
allows us to get rid of specialized allocations we're using for other
Lisp types.  E.g. we got rid of the specialized `marker_block`s we used
to use for markers and overlays (and a handful of more esoteric data
types), and next in line are the `symbol_block`s and `interval_block`s.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-08-27  9:54     ` Mattias Engdegård
  2023-09-16 14:58     ` Mattias Engdegård
  1 sibling, 0 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-08-27  9:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491, Eli Zaretskii

26 aug. 2023 kl. 16.55 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

>> - 4096 bytes for vector blocks seems a tad small
> 
> AFAIK this number was chosen arbitrarily and we never tried to tune it,
> so there's probably room for improvement, indeed.
> Note that many vectors will be small (they're really Lisp structs), tho.

That's definitely the case, and the same allocation mechanism is used for many other fairly small objects.

Another observation is that most vectorlike allocations are fixed-size. This is mostly true even for nominally variable-sized objects (vectors and records) because as you say, they are often used as (fixed-sized) structs.

We might want to segregate allocations by size entirely and cease splitting longer vectors. That, with larger and N-bit aligned blocks, would provide rapid access to common metadata from any object pointer. This may reduce GC costs; will have to look at that.

Naturally, allocation and GC costs must be considered together (and run-time, in case we change representation which is occasionally warranted).

> It's important to improve efficiency of vector allocation because it
> allows us to get rid of specialized allocations we're using for other
> Lisp types.  E.g. we got rid of the specialized `marker_block`s we used
> to use for markers and overlays (and a handful of more esoteric data
> types), and next in line are the `symbol_block`s and `interval_block`s.

Yes, many important object types are fixed-sized vectorlike allocations. Allocations tend to be spikey in size; that's why vector_free_lists is so sparse and should probably be reformed.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-24  9:59 bug#65491: [PATCH] Improve performance allocating vectors Ihor Radchenko
  2023-08-26  7:14 ` Eli Zaretskii
  2023-08-26 12:01 ` Mattias Engdegård
@ 2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-28 10:14   ` Ihor Radchenko
  2023-08-28 12:47   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 2 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-08-27 16:21 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65491

> This patch adds a heuristic that reduces the time spent searching
> `vector_free_lists' when trying to allocate a new vector.

The microbenchmarks give surprisingly good performance improvements.

> `vector_free_lists' is a rather long array with few hundreds of
> elements. And it does not make sense to check the whole array in
> `allocate_vector_from_block' if we can get information about free vector
> that was recently made available of if we know for sure that no free
> vectors are available (after GC).

Hmm... after GC we should usually have a non-zero number of free vectors
available (the unused parts of vector blocks which could not be `free`d
because they still contain a live vector), no?
[ See comment below.  ]

> The described approach may sometimes miss free vectors in
> `vector_free_lists', especially if the allocation happens from larger
> vector to smaller.

Right, it could lead to an increase in fragmentation, tho it might tend
to allocate temporally related objects together, which might be beneficial.

> - pack-unpack  0.40±0.00 -> 0.35±0.01

Interesting.  I didn't expect such a large effect there.

> @@ -3145,6 +3145,7 @@ large_vector_vec (struct large_vector *p)
>  
>  static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
>  
> +static ptrdiff_t last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
>  /* Singly-linked list of large vectors.  */
>  
>  static struct large_vector *large_vectors;

There's clearly some spacing issue with the following comment but more
importantly the new var would need a good comment explaining what the
variable should hold and why it's useful, so we know when it's safe and
desirable to set or use the var.

> @@ -3180,6 +3181,7 @@ setup_on_free_list (struct Lisp_Vector *v, ptrdiff_t nbytes)
>    set_next_vector (v, vector_free_lists[vindex]);
>    ASAN_POISON_VECTOR_CONTENTS (v, nbytes - header_size);
>    vector_free_lists[vindex] = v;
> +  last_known_vector_free_idx = vindex;
>  }
>  
>  /* Get a new vector block.  */
> @@ -3234,7 +3236,7 @@ allocate_vector_from_block (ptrdiff_t nbytes)
>    /* Next, check free lists containing larger vectors.  Since
>       we will split the result, we should have remaining space
>       large enough to use for one-slot vector at least.  */
> -  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
> +  for (index = max (VINDEX (nbytes + VBLOCK_BYTES_MIN), last_known_vector_free_idx);
>         index < VECTOR_MAX_FREE_LIST_INDEX; index++)
>      if (vector_free_lists[index])
>        {

IIUC that's the core of your patch.  Nice.

> @@ -3426,6 +3428,7 @@ sweep_vectors (void)
>    gcstat.total_vectors = 0;
>    gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
> +  last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
>  
>    /* Looking through vector blocks.  */

Hmm... so I was wrong and after GC there are aren't any free vectors?
I need to go re-read that code, then, because it doesn't match my mental
model of how it work(s|ed).


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-08-28 10:14   ` Ihor Radchenko
  2023-08-28 16:32     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-28 12:47   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 46+ messages in thread
From: Ihor Radchenko @ 2023-08-28 10:14 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> +static ptrdiff_t last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
>>  /* Singly-linked list of large vectors.  */
>>  
>>  static struct large_vector *large_vectors;
>
> There's clearly some spacing issue with the following comment but more
> importantly the new var would need a good comment explaining what the
> variable should hold and why it's useful, so we know when it's safe and
> desirable to set or use the var.

Sure. I can add a comment. However, from previous answers, we may better
rewrite the vector allocation code more significantly.

>> -  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
>> +  for (index = max (VINDEX (nbytes + VBLOCK_BYTES_MIN), last_known_vector_free_idx);
>>         index < VECTOR_MAX_FREE_LIST_INDEX; index++)
>>      if (vector_free_lists[index])
>>        {
>
> IIUC that's the core of your patch.  Nice.

>>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
>> +  last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
>>  
>>    /* Looking through vector blocks.  */
>
> Hmm... so I was wrong and after GC there are aren't any free vectors?
> I need to go re-read that code, then, because it doesn't match my mental
> model of how it work(s|ed).

Interestingly, when I just tried to skip searching vector_free_lists
when there are no vectors available there, it yielded to no significant
improvement.

Only that max(VINDEX (...), last_known_vector_free_idx) yielded a good
performance improvement.

The reason might be that vector sizes are distributed non-uniformly and
some segments of vector_free_lists are filled more than others.

It could be a good exercise to look into statistics of how
vector_free_lists is filled. Is there some standard way to print debug
information from the allocation code?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-28 10:14   ` Ihor Radchenko
@ 2023-08-28 12:47   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-08-28 12:47 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65491

>> @@ -3426,6 +3428,7 @@ sweep_vectors (void)
>>    gcstat.total_vectors = 0;
>>    gcstat.total_vector_slots = gcstat.total_free_vector_slots = 0;
>>    memset (vector_free_lists, 0, sizeof (vector_free_lists));
>> +  last_known_vector_free_idx = VECTOR_MAX_FREE_LIST_INDEX;
>>  
>>    /* Looking through vector blocks.  */
>
> Hmm... so I was wrong and after GC there are aren't any free vectors?
> I need to go re-read that code, then, because it doesn't match my mental
> model of how it work(s|ed).

Ah, no, I see what's going on: the above code is run at the *beginning*
of `sweep_vectors` before we discover which (free) vectors remain.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-28 10:14   ` Ihor Radchenko
@ 2023-08-28 16:32     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-08-28 16:32 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65491

> The reason might be that vector sizes are distributed non-uniformly and
> some segments of vector_free_lists are filled more than others.

Indeed, I'd expect that at any given time, many buckets of
`vector_free_lists` are empty.

Also, when ELisp code allocates many structs of the same size, the first
few allocations may be satisfied by free elements in the appropriate
bucket, but very quickly that bucket will become empty, so we'll look
for the next non-empty bucket, and the remaining slightly-smaller free
vector will then be put back into a lower bucket but the next allocation
will go through the same loop to find that same "slightly-smaller free
vector", to make it yet a bit smaller, ... until it's all consumed after
which we'll look for the next bigger free vector etc...

So your code makes a lot of sense from that point of view.

Many mallocs approach the problem by creating whole pages dedicated to
a given object size, so when the bucket is empty, N new free vectors of
that exact size are created (in a new vector-block) and put into the
appropriate bucket so the next N allocations of that same size will
quickly find a free vector.  Since all those vector-blocks are made of
vectors of the same size, the size info can be shared between them
instead of each one of them carrying its own size.
[ In our case, vectors need to carry their size for other purposes than
  memory management, so it's not clear we'd gain much there.
  But it could help make `live_small_vector_holding` faster.  ]


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-08-27  9:54     ` Mattias Engdegård
@ 2023-09-16 14:58     ` Mattias Engdegård
  2023-09-16 16:12       ` Eli Zaretskii
  2023-09-25 16:06       ` Mattias Engdegård
  1 sibling, 2 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 14:58 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491, Eli Zaretskii

Results of some further experiments:

- Reducing the `vector_free_lists` array to its actually used first half does improve performance a bit, even with subsequent scanning speed-ups.

- Enlarging `struct vector_block` from 4 KiB does not necessarily speed things up but more measurement is needed.

- Using a bitmap for faster `vector_free_lists` scanning is definitely beneficial.

- The previously mentioned hack where scanning begins in the most recently added bucket is surprisingly effective, often even more so than a bitmap, but I'm wary of it being brittle. Need more measurements.

- Our special-casing of the bool-vector representation is annoying. It would be good if we didn't have to pay the price on 64-bit platforms.

I also pushed two small changes that were essential to working with the code so we might just as well have them in now: b1881d7dab (fix a static assertion) and 056c99a34c (better Lisp_Object untagging).

I'd like to try size-homogeneous blocks as well, but this requires some GC updates. Just a matter of taking the time, of course.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 14:58     ` Mattias Engdegård
@ 2023-09-16 16:12       ` Eli Zaretskii
  2023-09-16 16:17         ` Eli Zaretskii
  2023-09-25 16:06       ` Mattias Engdegård
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 16:12 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 16 Sep 2023 16:58:01 +0200
> Cc: 65491@debbugs.gnu.org,
>  Eli Zaretskii <eliz@gnu.org>
> 
> I also pushed two small changes that were essential to working with the code so we might just as well have them in now: b1881d7dab (fix a static assertion) and 056c99a34c (better Lisp_Object untagging).

The latter one completely broke the 32-bit build --with-wide-int, most
probably because the last argument to XUNTAG is frequently a pointer
to a 64-bit type, where uintptr_t is only 32-bit wide.  As result, I
cannot even compile Emacs: each C file emits the same warnings, which
fill up my screen in a matter of milliseconds.

Please try to fix this ASAP, as it makes me unable to work on master.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:12       ` Eli Zaretskii
@ 2023-09-16 16:17         ` Eli Zaretskii
  2023-09-16 16:32           ` Mattias Engdegård
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 16:17 UTC (permalink / raw)
  To: mattias.engdegard; +Cc: 65491, monnier

> Cc: 65491@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Sat, 16 Sep 2023 19:12:40 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> The latter one completely broke the 32-bit build --with-wide-int, most
> probably because the last argument to XUNTAG is frequently a pointer
> to a 64-bit type, where uintptr_t is only 32-bit wide.

No, that's not it: the reason is that the _first_ argument is a 64-bit
data type, and then casting XLP(a) to uintptr_t causes the warning,
because uintptr_t is a 32-bit type.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:17         ` Eli Zaretskii
@ 2023-09-16 16:32           ` Mattias Engdegård
  2023-09-16 16:54             ` Eli Zaretskii
  2023-09-16 16:54             ` Mattias Engdegård
  0 siblings, 2 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 16:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, monnier

16 sep. 2023 kl. 18.17 skrev Eli Zaretskii <eliz@gnu.org>:

>> The latter one completely broke the 32-bit build --with-wide-int, most
>> probably because the last argument to XUNTAG is frequently a pointer
>> to a 64-bit type, where uintptr_t is only 32-bit wide.
> 
> No, that's not it: the reason is that the _first_ argument is a 64-bit
> data type, and then casting XLP(a) to uintptr_t causes the warning,
> because uintptr_t is a 32-bit type.

Let's see if I understand this correctly, as I can't try that configuration myself.

In your configuration, Lisp_Object is a 64-bit integer (or a struct containing one).
LISP_WORD_TAG(type) is also a 64-bit integer with tag bits at the top (USE_LSB_TAG=0).
The old XUNTAG was:

#define XUNTAG(a, type, ctype) ((ctype *) \
				((char *) XLP (a) - LISP_WORD_TAG (type)))

so that we get a subtraction of a pointer and a very large 64-bit number, which results in just the pointer.

The new XUNTAG is:

#define XUNTAG(a, type, ctype) ((ctype *) \
				((uintptr_t) XLP (a) - LISP_WORD_TAG (type)))

so you get a warning from what, conversion of a 64-bit number to (ctype *)?
Does changing the definition to

#define XUNTAG(a, type, ctype) \
  ((ctype *) ((uintptr_t) XLP (a) - (uintptr_t)LISP_WORD_TAG (type)))

help? (That is, cast the LISP_WORD_TAG return value to uintptr_t.)






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:32           ` Mattias Engdegård
@ 2023-09-16 16:54             ` Eli Zaretskii
  2023-09-16 17:03               ` Mattias Engdegård
  2023-09-16 16:54             ` Mattias Engdegård
  1 sibling, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 16:54 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 16 Sep 2023 18:32:31 +0200
> Cc: 65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> The new XUNTAG is:
> 
> #define XUNTAG(a, type, ctype) ((ctype *) \
> 				((uintptr_t) XLP (a) - LISP_WORD_TAG (type)))
> 
> so you get a warning from what, conversion of a 64-bit number to (ctype *)?

Yes, I think so:

  In file included from dispnew.c:27:
  lisp.h: In function 'PSEUDOVECTORP':
  lisp.h:813:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    813 | #define XUNTAG(a, type, ctype) ((ctype *) \
	|                                 ^
  lisp.h:374:6: note: in expansion of macro 'XUNTAG'
    374 |    ((XUNTAG ((a), Lisp_Vectorlike, union vectorlike_header)->size     \
	|      ^~~~~~
  lisp.h:1127:10: note: in expansion of macro 'lisp_h_PSEUDOVECTORP'
   1127 |   return lisp_h_PSEUDOVECTORP (a, code);
	|          ^~~~~~~~~~~~~~~~~~~~
  lisp.h: In function 'XSYMBOL_WITH_POS':
  lisp.h:813:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    813 | #define XUNTAG(a, type, ctype) ((ctype *) \
	|                                 ^
  lisp.h:1152:12: note: in expansion of macro 'XUNTAG'
   1152 |     return XUNTAG (a, Lisp_Vectorlike, struct Lisp_Symbol_With_Pos);
	|            ^~~~~~
  lisp.h: In function 'XBARE_SYMBOL':
  lisp.h:813:33: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    813 | #define XUNTAG(a, type, ctype) ((ctype *) \
	|                                 ^
  lisp.h:1159:27: note: in expansion of macro 'XUNTAG'
   1159 |   intptr_t i = (intptr_t) XUNTAG (a, Lisp_Symbol, struct Lisp_Symbol);
      |                           ^~~~~~

> Does changing the definition to
> 
> #define XUNTAG(a, type, ctype) \
>   ((ctype *) ((uintptr_t) XLP (a) - (uintptr_t)LISP_WORD_TAG (type)))
> 
> help? (That is, cast the LISP_WORD_TAG return value to uintptr_t.)

It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
set above 32 bit, so how casting it to uintptr_t is TRT?

Why did you need to change the original cast in the first place?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:32           ` Mattias Engdegård
  2023-09-16 16:54             ` Eli Zaretskii
@ 2023-09-16 16:54             ` Mattias Engdegård
  2023-09-16 17:09               ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 16:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, monnier

16 sep. 2023 kl. 18.32 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:

> #define XUNTAG(a, type, ctype) \
>  ((ctype *) ((uintptr_t) XLP (a) - (uintptr_t)LISP_WORD_TAG (type)))

I pushed this to master after some experiments.

Interestingly, Clang didn't warn about the new code at all. On the other hand, it did probably warn about the old code (with your configuration), because

  char *g(char *p) {return p+(1ULL<<62);}

results in

<source>:8:26: warning: the pointer incremented by 4611686018427387904 refers past the last possible element for an array in 32-bit address space containing 8-bit (1-byte) elements (max possible 4294967296 elements) [-Warray-bounds]
char *g(char *p) {return p+(1ULL<<62);}
                         ^  ~~~~~~~~
<source>:8:9: note: array 'p' declared here
char *g(char *p) {return p+(1ULL<<62);}
        ^

but GCC thinks it's fine.







^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:54             ` Eli Zaretskii
@ 2023-09-16 17:03               ` Mattias Engdegård
  2023-09-16 17:11                 ` Eli Zaretskii
  2023-09-17  3:02                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 17:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, monnier

16 sep. 2023 kl. 18.54 skrev Eli Zaretskii <eliz@gnu.org>:

> It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
> set above 32 bit, so how casting it to uintptr_t is TRT?

Because XUNTAG is used to get the pointer part; we don't want the tag bits. 

> Why did you need to change the original cast in the first place?

The commit message tried to explain that, but in essence, the old code untagged a Lisp_Object value by casting it to char *, then do pointer arithmetic on that, and then cast the result to whatever pointer we want.

The C standard severely restricts pointer arithmetic: in particular, for P+X where X is an integer and P is a pointer, P cannot be null (nor will P+X, since both P and P+X must be pointers to objects in the same array).

This means that XUNTAG could never reliably untag a null pointer and this did cause mayhem in some places. We have just been lucky not to trigger it so far but I noticed when attempting to make some innocent-looking changes.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 16:54             ` Mattias Engdegård
@ 2023-09-16 17:09               ` Eli Zaretskii
  2023-09-16 17:22                 ` Mattias Engdegård
  2023-09-16 19:46                 ` Paul Eggert
  0 siblings, 2 replies; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 17:09 UTC (permalink / raw)
  To: Mattias Engdegård, Paul Eggert; +Cc: 65491, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 16 Sep 2023 18:54:27 +0200
> Cc: 65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> 16 sep. 2023 kl. 18.32 skrev Mattias Engdegård <mattias.engdegard@gmail.com>:
> 
> > #define XUNTAG(a, type, ctype) \
> >  ((ctype *) ((uintptr_t) XLP (a) - (uintptr_t)LISP_WORD_TAG (type)))
> 
> I pushed this to master after some experiments.

Sorry, I cannot accept this kind of "discussions" when such tricky
issues come up.  What's the rush of installing changes when you still
didn't answer my questions, and we still are not sure these changes
are correct?  We are touching the deepest bowels of the
implementation, so some caution, even trepidation, is in order.

So I've reverted the last two changes.  Let's finish discussing the
need for this change and why you think the last variant is TRT, and
install then whatever we decide is needed.

I also added Paul to the discussion, since he wrote most of the
involved macros.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 17:03               ` Mattias Engdegård
@ 2023-09-16 17:11                 ` Eli Zaretskii
  2023-09-17  3:02                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 17:11 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 16 Sep 2023 19:03:33 +0200
> Cc: 65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> 16 sep. 2023 kl. 18.54 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
> > set above 32 bit, so how casting it to uintptr_t is TRT?
> 
> Because XUNTAG is used to get the pointer part; we don't want the tag bits. 

Then just casting should do, no?  Why the subtraction?

> > Why did you need to change the original cast in the first place?
> 
> The commit message tried to explain that, but in essence, the old code untagged a Lisp_Object value by casting it to char *, then do pointer arithmetic on that, and then cast the result to whatever pointer we want.
> 
> The C standard severely restricts pointer arithmetic: in particular, for P+X where X is an integer and P is a pointer, P cannot be null (nor will P+X, since both P and P+X must be pointers to objects in the same array).
> 
> This means that XUNTAG could never reliably untag a null pointer and this did cause mayhem in some places. We have just been lucky not to trigger it so far but I noticed when attempting to make some innocent-looking changes.

Sorry, this is not a reason good enough to make changes in these
parts.  However "unreliable" the old code worked for many moons, luck
or no luck, so if there's no other reason, I prefer to leave it alone.

And I'd like to hear from Paul before we make any further changes in
that area, please.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 17:09               ` Eli Zaretskii
@ 2023-09-16 17:22                 ` Mattias Engdegård
  2023-09-16 18:19                   ` Eli Zaretskii
  2023-09-16 19:46                 ` Paul Eggert
  1 sibling, 1 reply; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 17:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, Paul Eggert, monnier

16 sep. 2023 kl. 19.09 skrev Eli Zaretskii <eliz@gnu.org>:

> Sorry, I cannot accept this kind of "discussions" when such tricky
> issues come up.  What's the rush of installing changes when you still
> didn't answer my questions, and we still are not sure these changes
> are correct?

I'm confident that they are correct. Moreover, I'm also confident that the old code was incorrect, which is why the change was carried out. Both the C standard and modern C compilers agree.

There's nothing strange or unusual that the 32-bit --with-wide-int configuration sees unexpected warnings when code is changed. You must have seen that many times before. It doesn't mean that there is anything wrong with the change; in this case it was just a somewhat pedantic GCC warning, quickly silenced.

>>> It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
>>> set above 32 bit, so how casting it to uintptr_t is TRT?
>> 
>> Because XUNTAG is used to get the pointer part; we don't want the tag bits. 
> 
> Then just casting should do, no?  Why the subtraction?

Because when Lisp_Object is pointer-sized we need to remove the tag bits from the word. Only in the special configuration with a Lisp_Object that is larger than pointers can we simply cast away the tag bits.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 17:22                 ` Mattias Engdegård
@ 2023-09-16 18:19                   ` Eli Zaretskii
  2023-09-16 19:04                     ` Mattias Engdegård
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-16 18:19 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, eggert, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 16 Sep 2023 19:22:54 +0200
> Cc: Paul Eggert <eggert@cs.ucla.edu>,
>  65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> 16 sep. 2023 kl. 19.09 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Sorry, I cannot accept this kind of "discussions" when such tricky
> > issues come up.  What's the rush of installing changes when you still
> > didn't answer my questions, and we still are not sure these changes
> > are correct?
> 
> I'm confident that they are correct. Moreover, I'm also confident that the old code was incorrect, which is why the change was carried out. Both the C standard and modern C compilers agree.
> 
> There's nothing strange or unusual that the 32-bit --with-wide-int configuration sees unexpected warnings when code is changed. You must have seen that many times before. It doesn't mean that there is anything wrong with the change; in this case it was just a somewhat pedantic GCC warning, quickly silenced.

I get it that you are confident, but I want to be confident as well,
and I'm not there yet.  A discussion is not over until all of its
parties say it is.  By rushing to install changes while the discussion
is still not over you create an unhealthy atmosphere where some people
might conclude their opinions, questions, and doubts are ignored, and
that doesn't contribute to the sense of cooperation towards common
goals.

> >>> It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
> >>> set above 32 bit, so how casting it to uintptr_t is TRT?
> >> 
> >> Because XUNTAG is used to get the pointer part; we don't want the tag bits. 
> > 
> > Then just casting should do, no?  Why the subtraction?
> 
> Because when Lisp_Object is pointer-sized we need to remove the tag bits from the word. Only in the special configuration with a Lisp_Object that is larger than pointers can we simply cast away the tag bits.

Those special configurations have telltale traits that can be used in
cpp conditionals.  IOW, we could have different definitions of XUNTAG
for different configurations.  It isn't unheard off, and other macros,
including some that are involved in XUNTAG, do indeed have separate
definitions for several configurations.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 18:19                   ` Eli Zaretskii
@ 2023-09-16 19:04                     ` Mattias Engdegård
  0 siblings, 0 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-16 19:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, eggert, monnier

16 sep. 2023 kl. 20.19 skrev Eli Zaretskii <eliz@gnu.org>:

> I get it that you are confident, but I want to be confident as well,
> and I'm not there yet.

That's fine, I'm in no rush. I'll be happy to answer any questions you may have.

It's natural that our perspectives are different. From my point of view I have a well-researched and carefully tested change with everything indicating that it's safe and an improvement. But when the first thing you see is a flood of warnings, it's quite understandable that it is taken as a sign that something is wrong.

> Those special configurations have telltale traits that can be used in
> cpp conditionals.  IOW, we could have different definitions of XUNTAG
> for different configurations.  It isn't unheard off, and other macros,
> including some that are involved in XUNTAG, do indeed have separate
> definitions for several configurations.

Certainly, but we didn't need multiple definitions for XUNTAG before and nothing has changed in that respect.

However, if you think that it would make the code easier to understand we could separate the code into two or more #if clauses, although most of the time a single definition is easier to maintain.
In any case, that would be a separate change.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 17:09               ` Eli Zaretskii
  2023-09-16 17:22                 ` Mattias Engdegård
@ 2023-09-16 19:46                 ` Paul Eggert
  2023-09-17  5:18                   ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Paul Eggert @ 2023-09-16 19:46 UTC (permalink / raw)
  To: Eli Zaretskii, Mattias Engdegård; +Cc: 65491, monnier

On 2023-09-16 10:09, Eli Zaretskii wrote:

> I also added Paul to the discussion, since he wrote most of the
> involved macros.

Mattias and I had a private email discussion about XUNTAG last month, 
and he's correct that Emacs's current code, which #defines XUNTAG(a, 
type, ctype) to ((ctype *) ((char *) XLP (a) - LISP_WORD_TAG (type))), 
violates the C standard due to calculating a pointer outside its 
containing object's bounds, whereas the patch P that you just reverted, 
which defines the same macro to ((ctype *) ((uintptr_t) XLP (a) - 
(uintptr_t) LISP_WORD_TAG (type))), does not have that particular 
undefined behavior.

My own impression is that patch P is a net win, as it should make Emacs 
more reliable after likely future changes, for two reasons.

First, the unpatched version's undefined behavior caused Emacs to crash 
when Mattias tried out what appeared to be an obvious change to the GC's 
vector handling. Had patch P been present, Emacs would not have crashed.

Second, I vaguely suspect any attempt by Emacs to exploit recent memory 
safety improvements in Arm, Intel, etc. on GNU/Linux are more likely to 
work with patch P than without it. I suspect this because most other 
apps that tag pointers do it in the integer world (as patch P does), not 
in the pointer world (as Emacs currently does). The developments I have 
in mind are Arm MTE[1], Intel LAM[2], and (more speculatively) 
CHERI-RISC-V[3]. Although this is just a suspicion, I suspect I've 
thought about the issue more than anyone else has.

I would not favor a two-pronged approach, in which patch P is present 
but is used optionally. This would likely cause more trouble than it'd 
cure, due to lack of testing of the extra combinations. Let's use just 
one approach and keep it simple and more testable.

[1]: https://lwn.net/Articles/834289/
[2]: https://lwn.net/Articles/902094/
[3]: 
https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-risc-v.html





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 17:03               ` Mattias Engdegård
  2023-09-16 17:11                 ` Eli Zaretskii
@ 2023-09-17  3:02                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-09-17 17:02                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 46+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-17  3:02 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, Eli Zaretskii, monnier

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 16 sep. 2023 kl. 18.54 skrev Eli Zaretskii <eliz@gnu.org>:
>
>> It does, but LISP_WORD_TAG(type) is a 64=bit type with the only bits
>> set above 32 bit, so how casting it to uintptr_t is TRT?
>
> Because XUNTAG is used to get the pointer part; we don't want the tag bits. 
>
>> Why did you need to change the original cast in the first place?
>
> The commit message tried to explain that, but in essence, the old code
> untagged a Lisp_Object value by casting it to char *, then do pointer
> arithmetic on that, and then cast the result to whatever pointer we
> want.
>
> The C standard severely restricts pointer arithmetic: in particular,
> for P+X where X is an integer and P is a pointer, P cannot be null
> (nor will P+X, since both P and P+X must be pointers to objects in the
> same array).
>
> This means that XUNTAG could never reliably untag a null pointer and
> this did cause mayhem in some places. We have just been lucky not to
> trigger it so far but I noticed when attempting to make some
> innocent-looking changes.

Within Standard C, the result of converting a pointer value to an
integer and vice versa is also implementation defined behavior.  GCC
regards converting an integer derived from a pointer value back to a
pointer as undefined behavior, should the resulting pointer point to an
object outside that which the pointer value that gave rise to the
original integer references.  From `(gcc)Arrays and Pointers':

     When casting from pointer to integer and back again, the resulting
     pointer must reference the same object as the original pointer,
     otherwise the behavior is undefined.  That is, one may not use
     integer arithmetic to avoid the undefined behavior of pointer
     arithmetic as proscribed in C99 and C11 6.5.6/8.

So whether pointer or integer arithmetic is employed in XUNTAG makes no
real difference to us, where undefined behavior is concerned, inasmuch
as other compilers only afford us even more latitude.  For example,
Sun's C compiler documentation actively encourages using pointer
arithmetic in lieu of integer arithmetic, and speaks nothing of pointer
arithmetic between different objects being forbidden.  And all other
things being equal, I would rather see existing, time-tested, almost
primordial code preserved intact.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 19:46                 ` Paul Eggert
@ 2023-09-17  5:18                   ` Eli Zaretskii
  2023-09-17 15:22                     ` Paul Eggert
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-17  5:18 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 65491, mattias.engdegard, monnier

> Date: Sat, 16 Sep 2023 12:46:34 -0700
> Cc: 65491@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 2023-09-16 10:09, Eli Zaretskii wrote:
> 
> > I also added Paul to the discussion, since he wrote most of the
> > involved macros.
> 
> Mattias and I had a private email discussion about XUNTAG last month, 
> and he's correct that Emacs's current code, which #defines XUNTAG(a, 
> type, ctype) to ((ctype *) ((char *) XLP (a) - LISP_WORD_TAG (type))), 
> violates the C standard due to calculating a pointer outside its 
> containing object's bounds, whereas the patch P that you just reverted, 
> which defines the same macro to ((ctype *) ((uintptr_t) XLP (a) - 
> (uintptr_t) LISP_WORD_TAG (type))), does not have that particular 
> undefined behavior.

First, may I ask that in the future such discussions be held on the
list, or at least their summary be posted?  You can see right here and
now how this practice of discussing purely technical issues in private
causes unnecessary friction and aggravation, let alone breakage.

Next, even if this was discussed, due to our experience with changes
in this area, it would have been prudent to post the proposed patch
before installing it, so that it could be tested in all the supported
configuration before risking such a fundamental breakage of the master
branch.  You personally do that all the time, and for good reasons,
but Mattias has yet to learn that, it seems.

More to the point: can you explain how this part

   ((uintptr_t) XLP (a) - (uintptr_t) LISP_WORD_TAG (type))

works in a 32-bit build --with-wide-int?  AFAIU, XLP(a) yields a
32-bit value, and LISP_WORD_TAG has all of its low 32 bits zero, so
why do we need the subtraction at all?  Aren't we subtracting a zero
from what XLP yields?  It seems to me that in a 32-bit build with wide
ints just

  #define XUNTAG(a, type, ctype) ((ctype *) XLP (a))

should be enough, since XLP yields a 'void *', no?

> I would not favor a two-pronged approach, in which patch P is present 
> but is used optionally. This would likely cause more trouble than it'd 
> cure, due to lack of testing of the extra combinations. Let's use just 
> one approach and keep it simple and more testable.

Not sure what you are alluding to here, but if by "two-pronged
approach" you mean my suggestion to use cpp conditionals, then what I
meant was to have a separate definition of XUNTAG for 32-bit builds
with wide ints (which could still remove the undefined behavior), just
without all the juggling of integer-to-pointer-and-back conversions,
which completely obscure what should be a simple operation of
extracting a pointer from an integer by masking some of its bits.  Why
do I need to fight with these macros each time I read this code, when
the job it does is so simple?

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17  5:18                   ` Eli Zaretskii
@ 2023-09-17 15:22                     ` Paul Eggert
  2023-09-17 16:15                       ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Paul Eggert @ 2023-09-17 15:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, mattias.engdegard, monnier

On 2023-09-16 22:18, Eli Zaretskii wrote:

> It seems to me that in a 32-bit build with wide
> ints just
> 
>    #define XUNTAG(a, type, ctype) ((ctype *) XLP (a))
> 
> should be enough, since XLP yields a 'void *', no?

That would complicate the code unnecessarily. Let's not go there. Patch 
P works as-is, and it's simpler. Let's do that instead.


> what I
> meant was to have a separate definition of XUNTAG for 32-bit builds
> with wide ints (which could still remove the undefined behavior),

Yes, that's exactly what I suggest not to do. Why complicate the source 
code unnecessarily? And if we complicate it here, why not complicate it 
in similar ways in dozens of other places?

I went through a lot of this when adding support for --with-wide-int in 
the first place, years ago. When doing so, I strove to avoid having 
multiple copies of the code whenever I could. And I pretty much 
succeeded: there are only two WITH_WIDE_INT conditionals in lisp.h (and 
only three in other source files, all introduced relatively recently by 
others to work around compiler bugs, and all which should be rewritten 
without the #if).

It's an obvious win to have just one copy of the code instead of two, 
when one copy works and is just as efficient.  As much as possible, 
--with-wide-int should not be a special case.  We should not have 
"#ifdef WITH_WIDE_INT" scattered all over the place.  Keep it simple.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17 15:22                     ` Paul Eggert
@ 2023-09-17 16:15                       ` Eli Zaretskii
  2023-09-17 16:37                         ` Paul Eggert
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-17 16:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 65491, mattias.engdegard, monnier

> Date: Sun, 17 Sep 2023 08:22:01 -0700
> Cc: mattias.engdegard@gmail.com, 65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 2023-09-16 22:18, Eli Zaretskii wrote:
> 
> > It seems to me that in a 32-bit build with wide
> > ints just
> > 
> >    #define XUNTAG(a, type, ctype) ((ctype *) XLP (a))
> > 
> > should be enough, since XLP yields a 'void *', no?
> 
> That would complicate the code unnecessarily. Let's not go there. Patch 
> P works as-is, and it's simpler. Let's do that instead.

Even though the code is so completely obfuscated and hard to
understand?  Where do we draw the line between complications and code
readability (and thus maintainability)?  We don't want to depend on
you personally each time we need to make some change in this stuff.

> > what I
> > meant was to have a separate definition of XUNTAG for 32-bit builds
> > with wide ints (which could still remove the undefined behavior),
> 
> Yes, that's exactly what I suggest not to do. Why complicate the source 
> code unnecessarily?

It isn't "unnecessarily" in this case, IMO.

> And if we complicate it here, why not complicate it in similar ways
> in dozens of other places?

Because in general I agree: we should not have several different
implementations of a macro if a single one will do.  But not at a
price of such obfuscation, where even reasoning about the code
requires very high level of understanding of the intricacies of C and
what is and isn't UB (something that also changes with time).

> I went through a lot of this when adding support for --with-wide-int in 
> the first place, years ago. When doing so, I strove to avoid having 
> multiple copies of the code whenever I could. And I pretty much 
> succeeded: there are only two WITH_WIDE_INT conditionals in lisp.h (and 
> only three in other source files, all introduced relatively recently by 
> others to work around compiler bugs, and all which should be rewritten 
> without the #if).

I will be the last one to push for more WIDE_EMACS_INT conditions, but
let's not forget that they are not the only conditions we have, okay?
We also have USE_LSB_TAG and LISP_WORDS_ARE_POINTERS, and those are no
easier (and maybe even harder) to figure out in each and every
configuration.  At least with WIDE_EMACS_INT all you need is to look
at config.log, whereas with others you need to follow the convoluted
ways of their definition in lisp.h and maybe err a few times.  It will
maybe make you laugh, but I frequently find it easier to fire up GDB
and ask it to show me the values of these macros, to make sure I'm not
making a mistake.  This is a developer's plight we are better off
without, don't you agree?

> It's an obvious win to have just one copy of the code instead of two, 
> when one copy works and is just as efficient.  As much as possible, 
> --with-wide-int should not be a special case.  We should not have 
> "#ifdef WITH_WIDE_INT" scattered all over the place.  Keep it simple.

I agree in general, but the case of XUNTAG seems like this principle
taken ad absurdum: subtraction with no fewer than 3 type-casts (not
counting those in XLP and LISP_WORD_TAG) and a needless subtraction of
zero, when a single type-cast should suffice!

But if both you and Mattias still think a single definition of XUNTAG
is better, all the downsides notwithstanding, I won't mount the
barricades.  Just let it be known that I agree under protest.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17 16:15                       ` Eli Zaretskii
@ 2023-09-17 16:37                         ` Paul Eggert
  2023-09-17 16:44                           ` Eli Zaretskii
  0 siblings, 1 reply; 46+ messages in thread
From: Paul Eggert @ 2023-09-17 16:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, mattias.engdegard, monnier

On 2023-09-17 09:15, Eli Zaretskii wrote:
> Even though the code is so completely obfuscated and hard to
> understand?

If it's confusing, let's add a comment. That would be better than 
unnecessarily duplicating the actual code.


> We also have USE_LSB_TAG and LISP_WORDS_ARE_POINTERS

Yes, and we should minimize dependencies on those macros as well. The 
same principle applies to all these conditional flags - generally 
speaking the fewer uses of these macros, and the fewer distinct copies 
of the code, the better.

We could even dump those two macros entirely. But this would be a bigger 
project, one I lack time for (though perhaps Mattias has time...).


> I frequently find it easier to fire up GDB
> and ask it to show me the values

Oh, I do that all the time too, with both Emacs and other programs. It's 
OK. Why refuse help from GDB?





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17 16:37                         ` Paul Eggert
@ 2023-09-17 16:44                           ` Eli Zaretskii
  2023-09-18 16:10                             ` Mattias Engdegård
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-17 16:44 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 65491, mattias.engdegard, monnier

> Date: Sun, 17 Sep 2023 09:37:29 -0700
> Cc: mattias.engdegard@gmail.com, 65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> From: Paul Eggert <eggert@cs.ucla.edu>
> 
> On 2023-09-17 09:15, Eli Zaretskii wrote:
> > Even though the code is so completely obfuscated and hard to
> > understand?
> 
> If it's confusing, let's add a comment. That would be better than 
> unnecessarily duplicating the actual code.

Feel free to add any comments you find useful, that is always welcome.
But comments about complicated code have one downside: they need to be
updated when the code changes, and people forget to do that...

> > We also have USE_LSB_TAG and LISP_WORDS_ARE_POINTERS
> 
> Yes, and we should minimize dependencies on those macros as well.

My point is that XUNTAG depends on both of those as well.

> > I frequently find it easier to fire up GDB
> > and ask it to show me the values
> 
> Oh, I do that all the time too, with both Emacs and other programs. It's 
> OK. Why refuse help from GDB?

C code is supposed to be easily readable.  Going to GDB means it
isn't.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17  3:02                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-17 17:02                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-09-18  2:19                     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-17 17:02 UTC (permalink / raw)
  To: Po Lu; +Cc: 65491, Mattias Engdegård, Eli Zaretskii

> Within Standard C, the result of converting a pointer value to an
> integer and vice versa is also implementation defined behavior.

FWIW, last I checked it's literally impossible to implement our
conservative GC (or `malloc` for that matter) without relying on
undefined and implementation defined behaviors in C.

So the best we can do is to try and avoid those undefined behavior that
compilers *do* use to bite in the rear.

I still haven't seen any compiler that tries to make use of the
implementation defined behavior of conversion from pointer to integer as
a basis for optimization, so AFAIK we're still safe using those.

Even converting them back to their original pointer (which is what we do
with tag/untag pairs) is documented to be well-defined if you compile
using GCC.

In contrast the pointer arithmetic on NULL pointers appears to be
something which compilers have started to (ab)use as an assumption for
their optimizations.  Hence the need to update our code.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17 17:02                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-18  2:19                     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-09-18  2:27                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 46+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-18  2:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491, Mattias Engdegård, Eli Zaretskii

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Even converting them back to their original pointer (which is what we do
> with tag/untag pairs) is documented to be well-defined if you compile
> using GCC.

Only if the consequent pointer designates the same object as the initial
pointer (of which I see no scrutable definition within GCC's
documentation.)  But in practice, this doesn't matter, or so I thought.

> In contrast the pointer arithmetic on NULL pointers appears to be
> something which compilers have started to (ab)use as an assumption for
> their optimizations.  Hence the need to update our code.

Hmm?  Where in this thread was that fact established?

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-18  2:19                     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-18  2:27                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-09-18  3:08                         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-18  2:27 UTC (permalink / raw)
  To: Po Lu; +Cc: 65491, Mattias Engdegård, Eli Zaretskii

>> Even converting them back to their original pointer (which is what we do
>> with tag/untag pairs) is documented to be well-defined if you compile
>> using GCC.
> Only if the consequent pointer designates the same object as the initial
> pointer (of which I see no scrutable definition within GCC's
> documentation.)

IIUC the "if" here is talking about the fact that if the object is
deallocated between the two casts, then you're on your own, of course.

>> In contrast the pointer arithmetic on NULL pointers appears to be
>> something which compilers have started to (ab)use as an assumption for
>> their optimizations.  Hence the need to update our code.
> Hmm?  Where in this thread was that fact established?

IIUC Mattias bumped into this after he made a few "innocent looking"
changes to the vector allocation code.


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-18  2:27                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-18  3:08                         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-09-18  4:10                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 46+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-18  3:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491, Mattias Engdegård, Eli Zaretskii

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> Even converting them back to their original pointer (which is what we do
>>> with tag/untag pairs) is documented to be well-defined if you compile
>>> using GCC.
>> Only if the consequent pointer designates the same object as the initial
>> pointer (of which I see no scrutable definition within GCC's
>> documentation.)
>
> IIUC the "if" here is talking about the fact that if the object is
> deallocated between the two casts, then you're on your own, of course.

Nope, (gcc)Arrays and Pointers is quite unequivocal:

   * `The result of converting a pointer to an integer or vice versa
     (C90 6.3.4, C99 and C11 6.3.2.3).'

     A cast from pointer to integer discards most-significant bits if
     the pointer representation is larger than the integer type,
     sign-extends(1) if the pointer representation is smaller than the
     integer type, otherwise the bits are unchanged.

     A cast from integer to pointer discards most-significant bits if
     the pointer representation is smaller than the integer type,
     extends according to the signedness of the integer type if the
     pointer representation is larger than the integer type, otherwise
     the bits are unchanged.

     When casting from pointer to integer and back again, the resulting
     pointer must reference the same object as the original pointer,
     otherwise the behavior is undefined.  That is, one may not use
     integer arithmetic to avoid the undefined behavior of pointer
     arithmetic as proscribed in C99 and C11 6.5.6/8.

and directly cites the apposite portions of the Standard.

> IIUC Mattias bumped into this after he made a few "innocent looking"
> changes to the vector allocation code.

I'll let Mattias fill us in.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-18  3:08                         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-18  4:10                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 46+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-18  4:10 UTC (permalink / raw)
  To: Po Lu; +Cc: 65491, Mattias Engdegård, Eli Zaretskii

>      When casting from pointer to integer and back again, the resulting
>      pointer must reference the same object as the original pointer,
>      otherwise the behavior is undefined.  That is, one may not use
>      integer arithmetic to avoid the undefined behavior of pointer
>      arithmetic as proscribed in C99 and C11 6.5.6/8.

Then they're (still) warning about a different situation than ours,
i.e. the case where the integer is not the same, e.g. you cast to
integer, do some arithmetic giving you a *different* number and then
cast it back to a pointer.  The way they write it means that even if the
number is different it can still be "well"-defined (in the case where
the pointer is still pointing to the same object).  I'd guess this can
happen if you the integer arithmetic ends up moving from one slot to
another inside an array, for example.

But in any case our TAG+UNTAG is simpler since (barring bugs) we always
cast back the exact same integer, so the condition that "the resulting
pointer must reference the same object as the original pointer" can only
fail if the object was deallocated in the mean time, or if the integer
was too small to contain the pointer (and both of those conditions
should be true in our case, barring bugs).


        Stefan






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-17 16:44                           ` Eli Zaretskii
@ 2023-09-18 16:10                             ` Mattias Engdegård
  2023-09-18 17:13                               ` Eli Zaretskii
  2023-09-19 13:28                               ` Mattias Engdegård
  0 siblings, 2 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-18 16:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, Paul Eggert, monnier

17 sep. 2023 kl. 18.44 skrev Eli Zaretskii <eliz@gnu.org>:

> Feel free to add any comments you find useful, that is always welcome.
> But comments about complicated code have one downside: they need to be
> updated when the code changes, and people forget to do that...

Eli and Paul, thank you very much. I'm reapplying the patches (that we now know shouldn't cause warnings), with added comments with respect to the vanishing subtrahend in -with-wide-int builds.

The conversion at the other end, TAG_PTR, suffers from the same potential problem but in reverse, which is perhaps less obvious to the casual C programmer but nevertheless a danger of the same kind, so we'd better do something about it as well. I'll be back with a proposed patch.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-18 16:10                             ` Mattias Engdegård
@ 2023-09-18 17:13                               ` Eli Zaretskii
  2023-09-19 13:28                               ` Mattias Engdegård
  1 sibling, 0 replies; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-18 17:13 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, eggert, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Mon, 18 Sep 2023 18:10:47 +0200
> Cc: Paul Eggert <eggert@cs.ucla.edu>,
>  65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> 17 sep. 2023 kl. 18.44 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > Feel free to add any comments you find useful, that is always welcome.
> > But comments about complicated code have one downside: they need to be
> > updated when the code changes, and people forget to do that...
> 
> Eli and Paul, thank you very much. I'm reapplying the patches (that we now know shouldn't cause warnings), with added comments with respect to the vanishing subtrahend in -with-wide-int builds.

Fine with me, but I still hope Paul will do good on his proposal to
add comments...

> The conversion at the other end, TAG_PTR, suffers from the same potential problem but in reverse, which is perhaps less obvious to the casual C programmer but nevertheless a danger of the same kind, so we'd better do something about it as well. I'll be back with a proposed patch.

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-18 16:10                             ` Mattias Engdegård
  2023-09-18 17:13                               ` Eli Zaretskii
@ 2023-09-19 13:28                               ` Mattias Engdegård
  2023-09-19 14:04                                 ` Eli Zaretskii
  1 sibling, 1 reply; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-19 13:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, Paul Eggert, monnier

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

> The conversion at the other end, TAG_PTR, suffers from the same potential problem but in reverse, which is perhaps less obvious to the casual C programmer but nevertheless a danger of the same kind, so we'd better do something about it as well. I'll be back with a proposed patch.

Here it is. Very straightforward: just flip a cast and remove unused definitions.


[-- Attachment #2: 0001-Don-t-use-pointer-arithmetic-for-pointer-tagging-bug.patch --]
[-- Type: application/octet-stream, Size: 2295 bytes --]

From 9072db6ca24df50dc8dc66cd50ef638b8aac8a1b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Mon, 18 Sep 2023 19:16:05 +0200
Subject: [PATCH] Don't use pointer arithmetic for pointer tagging (bug#65491)

This makes for safer code when tagging null pointers in particular,
since pointer arithmetic on NULL is undefined and therefore can be
assumed, by the compiler, not to occur.

* src/lisp.h (untagged_ptr): Remove.
(TAG_PTR): Cast to uintptr_t instead of untagged_ptr.
---
 src/lisp.h | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/src/lisp.h b/src/lisp.h
index de6746f1c07..f0eea0c1f28 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -920,20 +920,11 @@ #define DEFUN_ARGS_7	(Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, \
 #define DEFUN_ARGS_8	(Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object, \
 			 Lisp_Object, Lisp_Object, Lisp_Object, Lisp_Object)
 
-/* untagged_ptr represents a pointer before tagging, and Lisp_Word_tag
-   contains a possibly-shifted tag to be added to an untagged_ptr to
-   convert it to a Lisp_Word.  */
+/* Lisp_Word_tag is big enough for a possibly-shifted tag, to be
+   added to a pointer value for conversion to a Lisp_Word.  */
 #if LISP_WORDS_ARE_POINTERS
-/* untagged_ptr is a pointer so that the compiler knows that TAG_PTR
-   yields a pointer.  It is char * so that adding a tag uses simple
-   machine addition.  */
-typedef char *untagged_ptr;
 typedef uintptr_t Lisp_Word_tag;
 #else
-/* untagged_ptr is an unsigned integer instead of a pointer, so that
-   it can be added to the possibly-wider Lisp_Word_tag type without
-   losing information.  */
-typedef uintptr_t untagged_ptr;
 typedef EMACS_UINT Lisp_Word_tag;
 #endif
 
@@ -943,7 +934,7 @@ #define LISP_WORD_TAG(tag) \
 
 /* An initializer for a Lisp_Object that contains TAG along with PTR.  */
 #define TAG_PTR(tag, ptr) \
-  LISP_INITIALLY ((Lisp_Word) ((untagged_ptr) (ptr) + LISP_WORD_TAG (tag)))
+  LISP_INITIALLY ((Lisp_Word) ((uintptr_t) (ptr) + LISP_WORD_TAG (tag)))
 
 /* LISPSYM_INITIALLY (Qfoo) is equivalent to Qfoo except it is
    designed for use as an initializer, even for a constant initializer.  */
-- 
2.32.0 (Apple Git-132)


[-- Attachment #3: Type: text/plain, Size: 3 bytes --]





^ permalink raw reply related	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-19 13:28                               ` Mattias Engdegård
@ 2023-09-19 14:04                                 ` Eli Zaretskii
  2023-09-19 14:05                                   ` Mattias Engdegård
  0 siblings, 1 reply; 46+ messages in thread
From: Eli Zaretskii @ 2023-09-19 14:04 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 65491, eggert, monnier

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Tue, 19 Sep 2023 15:28:41 +0200
> Cc: Paul Eggert <eggert@cs.ucla.edu>,
>  65491@debbugs.gnu.org,
>  monnier@iro.umontreal.ca
> 
> Here it is. Very straightforward: just flip a cast and remove unused definitions.

Builds okay on MS-Windows (with wide-ints) and on GNU/Linux (64-bit
build).

Thanks.





^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-19 14:04                                 ` Eli Zaretskii
@ 2023-09-19 14:05                                   ` Mattias Engdegård
  0 siblings, 0 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-19 14:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65491, eggert, monnier

19 sep. 2023 kl. 16.04 skrev Eli Zaretskii <eliz@gnu.org>:

> Builds okay on MS-Windows (with wide-ints) and on GNU/Linux (64-bit
> build).
> 
> Thanks.

Thanks for testing! Pushed to master.






^ permalink raw reply	[flat|nested] 46+ messages in thread

* bug#65491: [PATCH] Improve performance allocating vectors
  2023-09-16 14:58     ` Mattias Engdegård
  2023-09-16 16:12       ` Eli Zaretskii
@ 2023-09-25 16:06       ` Mattias Engdegård
  1 sibling, 0 replies; 46+ messages in thread
From: Mattias Engdegård @ 2023-09-25 16:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 65491, Eli Zaretskii

> - Reducing the `vector_free_lists` array to its actually used first half does improve performance a bit, even with subsequent scanning speed-ups.

> - The previously mentioned hack where scanning begins in the most recently added bucket is surprisingly effective, often even more so than a bitmap, but I'm wary of it being brittle. Need more measurements.

These two changes were simple, fairly effective, and found to be reasonably safe (in the sense that they are unlikely to make performance worse), so I pushed them to master. This way we should be better off even if no further progress is made.

As mentioned, a bitmap is essentially as good as or better than the last-used-index heuristic but I went with the latter because it involved a lot less code. This can be changed.

But it is possible to do better: size-homogeneous blocks, where each block only houses vectors of a single size, have the major benefit that live_small_vector_holding no longer needs to traverse half the block on average but becomes is done in constant time. Sweeping, always performance-critical since it traverses both living and dead objects, becomes slightly faster. There is also no free-list array scanning at all.

Benchmarks suggest that larger homogeneous blocks (32 KiB, say) would be beneficial, but we need to look at the potential extra fragmentation caused by allocating a big block for each odd vector size.







^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2023-09-25 16:06 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24  9:59 bug#65491: [PATCH] Improve performance allocating vectors Ihor Radchenko
2023-08-26  7:14 ` Eli Zaretskii
2023-08-26  7:27   ` Ihor Radchenko
2023-08-26  7:31     ` Eli Zaretskii
2023-08-26  7:51       ` Ihor Radchenko
2023-08-26  8:07         ` Ihor Radchenko
2023-08-26  9:01         ` Eli Zaretskii
2023-08-26  7:47     ` Ihor Radchenko
2023-08-26 12:01 ` Mattias Engdegård
2023-08-26 14:54   ` Ihor Radchenko
2023-08-26 14:55   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-27  9:54     ` Mattias Engdegård
2023-09-16 14:58     ` Mattias Engdegård
2023-09-16 16:12       ` Eli Zaretskii
2023-09-16 16:17         ` Eli Zaretskii
2023-09-16 16:32           ` Mattias Engdegård
2023-09-16 16:54             ` Eli Zaretskii
2023-09-16 17:03               ` Mattias Engdegård
2023-09-16 17:11                 ` Eli Zaretskii
2023-09-17  3:02                 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-17 17:02                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18  2:19                     ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18  2:27                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18  3:08                         ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18  4:10                           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-16 16:54             ` Mattias Engdegård
2023-09-16 17:09               ` Eli Zaretskii
2023-09-16 17:22                 ` Mattias Engdegård
2023-09-16 18:19                   ` Eli Zaretskii
2023-09-16 19:04                     ` Mattias Engdegård
2023-09-16 19:46                 ` Paul Eggert
2023-09-17  5:18                   ` Eli Zaretskii
2023-09-17 15:22                     ` Paul Eggert
2023-09-17 16:15                       ` Eli Zaretskii
2023-09-17 16:37                         ` Paul Eggert
2023-09-17 16:44                           ` Eli Zaretskii
2023-09-18 16:10                             ` Mattias Engdegård
2023-09-18 17:13                               ` Eli Zaretskii
2023-09-19 13:28                               ` Mattias Engdegård
2023-09-19 14:04                                 ` Eli Zaretskii
2023-09-19 14:05                                   ` Mattias Engdegård
2023-09-25 16:06       ` Mattias Engdegård
2023-08-27 16:21 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-28 10:14   ` Ihor Radchenko
2023-08-28 16:32     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-28 12:47   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).