Is there already a fuction to turn these HTML character references into proper characters? "Я" must come out as the cyrillic "$B'A(B" etc. On the command line recode can do the trick: .AN echo "Я" | recode html..utf-8 -- | ,__o http://www.gnu.franken.de/ke/ | _-\_<, ke@suse.de (work) / keichwa@gmx.net (home) | (*)/'(*)