Thanks for the detailed report.
You're welcome.
I feel a bit culprit since I lived with that bug for such a long time and only started debugging today... Shame on me.
(...) About using LANG specifically, any chance that it might interfere with
the system's configured encoding, e.g. UTF-8 vs other? In your example,
does searching for accented characters work as well?
Yes. I added LANG=C then checked succesfully that search succeed with french guillemet «, accented letters é, ç, and non-breaking space.
IIUC we can try LC_MESSAGES as the more specialized var. Does
LC_MESSAGES=en work as well?
No. Matches in binary files make the search fail in that case.