18 seconds to show every frame
Time to fully extract the images from that (cold cache) on my laptop was ~ 30 seconds. It contains one key frame, the very first, and everything else is a P frame, so to decode frame 20 you need to run forward applying 19 frames worth of data from the start.
If the components were being cached or generated independently – highly likely, I suspect – then that time-cost would be paid the first time. A grand total of 18 seconds to do that processing doesn't actually sound unreasonable to me, honestly, given that decoding all those frames in individual processes takes ~ 5.35 seconds total on my machine. It wouldn't take much inefficiency on top of that to bring it up to that rate.