From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D63486DE0C66 for ; Tue, 31 Oct 2017 11:47:46 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -1.803 X-Spam-Level: X-Spam-Status: No, score=-1.803 tagged_above=-999 required=5 tests=[AWL=0.709, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.211, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HH6z15XIHzPY for ; Tue, 31 Oct 2017 11:47:45 -0700 (PDT) Received: from mout02.posteo.de (mout02.posteo.de [185.67.36.66]) by arlo.cworth.org (Postfix) with ESMTPS id 3F6796DE0C3F for ; Tue, 31 Oct 2017 11:47:44 -0700 (PDT) Received: from submission (posteo.de [89.146.220.130]) by mout02.posteo.de (Postfix) with ESMTPS id 9F6FF20AF3 for ; Tue, 31 Oct 2017 19:47:40 +0100 (CET) Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 3yRL1z1zp0z1095; Tue, 31 Oct 2017 19:47:35 +0100 (CET) From: Tomas Nordin To: Matthew Lear , Brian Sniffen Cc: notmuch@notmuchmail.org, Vladimir Panteleev , Daniel Kahn Gillmor Subject: Re: web interface to notmuch In-Reply-To: References: <87tvyvp4f2.fsf@istari.evenmere.org> <87376f13ho.fsf@fifthhorseman.net> <87r2tww9tr.fsf@nikula.org> <87wp3ow39i.fsf@fifthhorseman.net> <27e53def-32b4-45ab-1192-77cc0e837a93@gmail.com> <87zi8eopgq.fsf@istari.evenmere.org> <877evhy53k.fsf@fifthhorseman.net> <87she5nsmy.fsf@istari.evenmere.org> <87inf1gm7l.fsf@fifthhorseman.net> <87mv4co4vz.fsf@istari.evenmere.org> Date: Tue, 31 Oct 2017 19:47:34 +0100 Message-ID: <87vaiv40jd.fsf@fliptop> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2017 18:47:47 -0000 Hi Matthew Sorry for just chiming in here out of the blue. I don't really know anything on the code you are discussing, but I have some experience with python. Matthew Lear writes: > Traceback (most recent call last): > File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in > process > return self.handle() > File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in > handle > return self._delegate(fn, self.fvars, args) > File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in > _delegate > return handle_class(cls) > File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in > handle_class > return tocall(*args) > File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153, > in GET > sprefix=3Dwebprefix) > File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989, > in render > return self.environment.handle_exception(exc_info, True) > File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754, > in handle_exception > reraise(exc_type, exc_value, tb) > File "templates/show.html", line 1, in top-level template code > {% extends "base.html" %} > File "templates/base.html", line 32, in top-level template code > {% block content %} > File "templates/show.html", line 12, in block "content" > {% for part in format_message(m.get_filename(),mid): %}{{ part|safe > }}{% endfor %} > File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245, > in format_message_walk > tags=3Dsafe_tags).encode(part.get_content_charset('ascii'))) My guess is that the function part.get_content_charset is requesting the encoding used for a message, providing 'ascii' as a backup if not found. It is getting 'latin-1', which is hence tried for encoding output. > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' > in position 1141: ordinal not in range(256) Here is an interactive python session to reproduce: >>> u =3D u'\u201c' >>> u u'\u201c' >>> type(u) # (un-encoded) >>> u.encode('utf-8') '\xe2\x80\x9c' # utf-8 for encoding work fine >>> print u.encode('utf-8') =E2=80=9C >>> print u.encode('latin-1') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in pos= ition 0: ordinal not in range(256) The character is not encodable with latin-1. So one should check that the function getting the encoding is doing a proper job and if so blame the message information. Just my 2 cents Best regards -- Tomas