On Tue, Feb 04, 2014 at 08:40:18PM +0200, Tomi Ollila wrote: > On Tue, Feb 04 2014, W. Trevor King wrote: > > > > >>> from __future__ import unicode_literals > > >>> import codecs > > >>> import locale > > >>> import sys > > >>> print(locale.getpreferredencoding()) # same as yours > > UTF-8 > > >>> print(sys.getdefaultencoding()) # same as yours > > ascii > > >>> _ENCODING = locale.getpreferredencoding() or sys.getdefaultencoding() > > >>> print(_ENCODING) # double-check default encodings > > UTF-8 > > >>> byte_stream = sys.stdout # copied from Page.write > > >>> stream = codecs.getwriter(encoding=_ENCODING)(stream=byte_stream) > > >>> data = {'from': '\u017b'} # fake the troublesome data > > >>> print(type(data['from'])) # double-check unicode_literals > > > > >>> string = ' {from}\n'.format(**data) > > >>> stream.write(string) > > Ż > > > > It looks like you'll have the same _ENCODING as I do (UTF-8). That > > means your stream should be wrapped in a UTF-8 StreamWriter, so I > > don't understand why it's converting to ASCII. Can you run through > > the above on your troublesome machine and confirm that stream.write() > > is still raising the exception? If it doesn't work, can you just > > paste that whole run in your next email? > > I don't know what to paste, so i paste this: > > $ python > Python 2.6.6 (r266:84292, Nov 21 2013, 12:39:37) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. It looks like you left out: from __future__ import unicode_literals Can you try again with that line as the first command? > >>> data = {'from': '\u017b'} > >>> print(type(data['from'])) > which is why your data is a 'str' and not a 'unicode' instance. > >>> string = ' {from}\n'.format(**data) > >>> print string > \u017b > > and then: > > >>> data = {'from': u'\u017b'} This works around the lack of unicode_literals with an explicit u''. > >>> print(type(data['from'])) > > >>> string = ' {from}\n'.format(**data) > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in However, without unicode_literals or an explicit u'', you're format string '…{from}' is a str (it should be a 'unicode' instance with unicode_literals). > >>> import os > >>> print os.environ['LANG'] > en_US.UTF-8 That's good anyway ;). Thanks for digging into this :). Cheers, Trevor -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy