>>> Then I think injecting LC_ALL=C into the environment when running Grep >>> in this case makes the results more useful? And we can then avoid >>> using -a? >> >> I'm not so sure. LC_ALL=C seems more problematic than -a: >> >> $ grep ф test.txt >> фыва >> $ grep -a ф test.txt >> фыва >> $ LC_ALL=C grep ф test.txt >> (nothing) > > I guess this regression in Grep happened when they "internationalized" > the DFA code, sigh... > FWIW, I "bisected" this with various versions of grep, and this regression happened in 2014, between versions 2.20 and 2.21: echo -ne "premi\xE8re\n" > latin1.txt echo -ne "premi\xC3\xA8re\n" > utf8.txt echo -ne "premi\xE8re\npremi\xC3\xA8re\n" > both.txt With 2.20 with rxvt (which is clever enough to display UTF-8 and Latin-1 at the same time): $ grep prem *.txt both.txt:première both.txt:première latin1.txt:première utf8.txt:première With 2.20 with M-x shell (the \350 is a single character): both.txt:premi\350re both.txt:première latin1.txt:premi\350re utf8.txt:première With 2.21, with rxvt or M-x shell: grep prem *.txt Binary file both.txt matches Binary file latin1.txt matches utf8.txt:première