On Sun, Mar 29, 2015 at 07:10:53PM -0400, Sebastian Fischmeister wrote: > > My first guess is that the file's encoding doesn't match your > > locale. Do you have a non-ASCII locale set? You can check with: > > It seems to be more tricky than I thought. I didn't have a locale set. > > When I set one, I can parse some emails with this: > > export LANG=en_US.latin-1 > > Others with this: > > export LANG=en_US.UTF-8 > > Others fail with either of the two. Hmm, that's surprising. In hindsight, the locale should only be affecting the *output* (e.g., a non-Unicode locale might cause a UnicodeEncodeError). However, you're getting your errors on input. I'd expect the files to be loaded and parsed as byte-streams, but maybe there's a bug in Python's email parser. It wouldn't be the first time it's had trouble with bytes-vs-Unicode (see these old bugs with similar tracebacks from the initial transition to 3.0 [1,2], or search “unicode email” on http://bugs.python.org/). I'd try to reproduce this failure by calling email.message_from_file(…) directly (getting notmuch out of the loop), and then file a bug against Python once you have a pure-Python reproduction. Cheers, Trevor [1]: http://bugs.python.org/issue1086 [2]: http://bugs.python.org/issue1258#msg56470 -- This email may be signed or encrypted with GnuPG (http://www.gnupg.org). For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy