Stefan, I read only Eli's reply this morning.  Got to yours just now.


Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I had a different idea of tightening the otree invariant to this:
>>
>>  1) A node's otick must be greater than or equal to its children's.
>
> That's already checked in the current code, right?

Not yet...until...just now.  See patches below.


>>  3) All downward tree traversal propagates offsets and otick.
>
> I think we already do that, but if there are places we missed, then yes,
> of course.

Yes, I think we do.  The wrinkle is that we don't always start
inheriting at the root, but otick is not updated in that case.


> Regarding `otick`, I can see 2 more options:
> - Get rid of it completely: its sole purpose is to try and keep
>   `overlay-start/end` O(1) in the usual case instead of O(log N), but
>   I'm not convinced it's worth the cost of propagating `otick` everywhere
>   all the time.
> - A halfway point is to keep `otick` but update it more lazily,
>   i.e. only update it when we do `overlay-start/end` (e.g. in
>   `interval_tree_validate`).

These ideas are simpler but similar in direction to my idea to use a
btree instead.  Then some of the "augmentation" could live in btree
arrays, which are a lot faster to traverse and modify in bulk.  (pie in
sky idea, maybe for Emacs 30 but more likely 40!).


>> Subject: [PATCH 2/2] Add ./configure --enable-checking=overlays
>
> I think this is going overboard.  I'd rather focus on providing good
> "everyday checks" (i.e. cheap checks which help document our invariants,
> controlled by the normal ENABLE_CHECKING) and leave the rest of the
> debugging code under "manual" control rather than expose it as an
> `--enable-checking=` option.

Yes, Eli said the same.  I've removed it.


> My assumption here is that once debugged, this overlay code will stay
> untouched for the next many years (based on the past history of our
> overlay code).

Yep, a great reason to have good ENABLE_CHECKING code.


>> +static struct check_subtree_result
>> +check_subtree (struct interval_node *node, uintmax_t tree_otick,
>> +               int max_depth, ptrdiff_t offset, ptrdiff_t min_begin,
>> +               ptrdiff_t max_begin)
>
> This max_depth also sounds to me like over-engineering.

I'd like to keep max_depth.

My reason for adding it is informed by experience on similar projects.
It is easier to understand an assertion that the tree is too deep than
it is to debug, say, stack overflow when there is a cycle in the link
structure of the tree.  It is easier to limit the height of checking if,
indeed, on some night I am diagnosing an unrelated bug but the perf cost
here is too much.  With max_depth in place I can easily manually hack up
a functon to verify checks only in a small subset of the tree
(e.g. around a rotation operation, etc.).  I've done and benefited from
this sort of thing in the past.

There is a small concrete benefit to it right now.  MAX_DEPTH is now
initialized from interval_tree_max_height(), which currently could wildy
under estimate the height and we wouldn't notice (because generator
stacks auto-grow).  Since we call that fucntion to make "big enough to
not need growing" stacks, it is nice to have test coverage for it.

I don't like over-engeneering, but I wouldn't call this engineering at
all.  It is just an arg and a bit of related code, with some utility,
and a tiny maintenance cost.  :-)


>> -  /* We don't check the RB invariants here (neither the absence of
>> -     red+red nor the equal-black-depth), so that we can use this check
>> -     even while the tree is temporarily breaking some of those invarints.  */
>> -  /* Red nodes cannot have red parents.  */
>> -  /* eassert (node->parent == ITREE_NULL
>> -              || !(node->red && node->parent->red)); */
>> +  /* We don't normally check the RB invariants here (neither the
>> +     absence of red+red nor the equal-black-depth), so that we can use
>> +     this check even while the tree is temporarily breaking some of
>> +     those invarints.  You can enable them if you want.  */
>> +  if (false)
>> +    {
>> +      /* If a node is red then both of its children are black.  Red
>> +         nodes cannot have red parents.  */
>> +      if (node->red)
>> +        {
>> +          eassert (node->left == ITREE_NULL
>> +                   || node->left->red == false);
>> +          eassert (node->right == ITREE_NULL
>> +                   || node->right->red == false);
>> +          eassert (node->parent == ITREE_NULL || !node->parent->red);
>> +        }
>> +    }
>
> Since we'll be recursing into the children, the first two easserts seem
> redundant with the third (IOW the new code just does the same as the old
> code, only more verbosely and less efficiently :-( ).
> What am I missing?

Good point, I reworked this for simplicity.


>> +  result.complete = left_result.complete && right_result.complete;
>> +  if (result.complete)
>
> I think all calls to `check_tree` should be complete traversals.
>
> Most of the invariants checked in it are already checked incrementally
> via sprinkled assertions elsewhere, so it's only really useful when
> debugging a concrete known issue where the "local" checks aren't good
> enough.

All `check_tree` calls are complete traversals.  If they aren't there is
a bug and an `eassert` will fire.


> We could also include a few "cheap" calls to `check_tree` via `eassert`
> (i.e. calls which we know shouldn't be algorithmically too expensive
> because we're already traversing the whole tree for some other reason,
> e.g. when killing a buffer (e.g. to verify that, at the end of the day,
> we still preversed the RB invariants :-) or maybe during GC).

...I'm not understanding something here.  All `check_tree` calls are in
`eassert` already.


>> +    {
>> +      /* Every path from a node to a descendent leaf contains the same
>> +         number of black nodes.  Often said this way: all nodes have
>> +         the same "black height".  */
>> +      eassert (left_result.black_height == right_result.black_height);
>
> IIUC this assertion should be conditionalized in the same way as the
> red+red check, since this invariant can be temporarily false.

Done, and now we have `check_tree_no_rb` for the one call site.

At https://git.sr.ht/~matta/emacs/log/feature/noverlay or below.