I figured out the race condition in the tests. The previous test was
still running when the failing test started, the joys of using a shared
emacs for running all of the tests in one file.

The attached diff is split into the the commits that introduce the tests
in question in my working series, but you should be able to just apply
it on top of the posted series if you want.