On Tue, Feb 11, 2014 at 04:14:45PM +0200, Tomi Ollila wrote: > On Tue, Feb 11 2014, David Bremner wrote: > > W. Trevor King writes: > >> Instead of always writing UTF-8, allow the user to configure the > >> output encoding using their locale. This is useful for > >> previewing output in the terminal, for poor souls that don't use > >> UTF-8 locales ;). > > > > … > > remote: UnicodeEncodeError: 'ascii' codec can't encode character > > u'\u017b' in position 219: ordinal not in range(128) > > > > possibly because of > > > > LANG=C > > … > > > > I think it's fine to _allow_ the user to configure the output > > encoding. I'm less sure about _requiring_ it. If a user has set LANG=C, I expect that's what we should use for output (in which case dying with an encoding error is the right thing to do). If you want UTF-8 output, using a UTF-8 locale seems like a reasonable requirement. For the HTML case, we could fall back on numerical character references (e.g. Ż) if the requested locale didn't support the required character directly, but I don't see an easy solution for the text-mode output. > That reminded me that yesterday (after review, of course) I thought > that we probably want configuration file to be parsed as utf-8 > instead of any encoding user may have in their system... The POSIX spec for LANG doesn't restrict the scoping to the terminal intput / output [1], so I feel like we should also be using LANG to read the config file as well. I expect folks with UTF-8 LANGs will want UTF-8 file contents. In both cases (terminal output and config-file input), it is easy for users to pick their preferred encoding: $ LANG=en_US.UTF-8 nmbug-status … I think we should trust what they've chosen, rather than guessing that they actually want UTF-8 ;). Cheers, Trevor [1]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy