After looking at and considering more of the undo code, here's my
current thinking about how to solve bug 2.

The Elisp manual states that undo elements of buffer-undo-list can be
of the form:

  (apply delta beg end funname . args)

Define a "redo element" as an element of buffer-undo-list created
because of an undo command. Suppose we change the way redo elements
are represented to:

  (apply delta beg end 'undo-redo . (undone-element undo-delta))

Where:
  • delta, beg, end are as documented.
  • undo-redo is a new function that copies undone-element, applies
    undo-delta, then calls primitive-undo on it.
  • undone-element is the undone element of buffer-undo-list.
  • undo-delta is (position . offset) indicating how undo-redo should
    modify a copy of undone-element in order to carry out the redo.

When executing undo-only:
  • Assume pending-undo-list can be lazily computed
  • Let undo-skip-set be an empty set of undo elements to skip
  • Iterate over pending-undo-list until prefix-arg undos complete:
    • If the element is in undo-skip-set:
      • Remove it from undo-skip-set
      • Continue loop
    • If the element is of the form for a redo element:
      • Insert its undone-element in the undo-skip-set
    • Else undo the element

Using a BST for the undo-skip-set would be appropriate, but I don't
see that one is available to core Elisp code. A skip list data
structure would be a simple option with similar performance
characteristics. Note: this is a different kind of "skip", you can
duckduckgo "skip lists".

In your earlier reply, you hinted that switching undo-only from using
the undo-equiv-table to calling undo-make-selective-list might create
a noticable performance regression. The latter is O(N*N) for N sized
undo history! I would expect undo-only will generally have an element
to find, in which case, being able to compute next elements of
pending-undo-list as needed instead of in full upfront may be a
sufficient solution.

If it isn't sufficient, I can also make the algorithm O(N*log(N)).
One way to do this is using another skip list to store the
undo-deltas: sort by position, and at each node store the sum of the
offsets over the skip interval following that node. The analogous
thing can be done with a BST: put the sum of a subtree's offsets at
each node. This is more complex in part because the sums need to be
updated during tree rotations.

There's another benefit to my proposed representation for redo
elements. Consider the use case:
  • Delete text X bytes large
  • Do Y cycles of undo and redo

As I understand the current code, the size of undo history grows by
approximately X*Y. With this new way to represent redo elements, undo
history grows approximately Y instead.

This proposal implies recursive calls to primitive-undo. The size of
the Lisp call stack would be a function of how deep redo records refer
to undone redo records. Maybe there are reasonable ways to deal with
that or maybe it's a non issue.

Are there examples of existing code putting (apply delta beg end
funname . args) into the buffer-undo-list? I want to see what the
options might be for pushing redo elements in that format. The best I
can think of right now is a dynamically bound variable set only during
an undo and examined in record functions in undo.c.

There's more I could say about other details, but I'll stop now to get
feedback on the overall idea.