I recently posted about the high variability of Emacs 29’s tree-sitter navigation performance within a file.  I decided to conduct a simple test on a large python file of about 8400 lines to see if I could learn more.  The test is as follows: at the start of each line, locate the current syntax node, and starting from it, navigate up to the root node via `treesit-node-parent’.  

I was surprised to find that the time this takes grows as sqrt(N), for line number N.  This leads to performance variability of >100x for code that needs to walk the local syntax tree in large files.  Such variability can make performance projections and optimizations for latency-sensitive uses of tree-sitter (e.g. via font-lock) tricky.  

I’m unclear whether this is fundamental to the tree-sitter parse/tree algorithm, or if the scaling comes from Emacs’ TS implementation.  It does vaguely remind me of similar scaling with an old line-numbering algorithm, where lines were always being counted from the beginning of the buffer, so very fast at the front, and very slow near the end. 

Code and details here:

  https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472￼
tree-sitter navigation speed test
gist.github.com