From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id AEC4E6DE096A for ; Sun, 24 Sep 2017 05:36:33 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.006 X-Spam-Level: X-Spam-Status: No, score=-0.006 tagged_above=-999 required=5 tests=[AWL=0.005, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vID_KJTUKETb for ; Sun, 24 Sep 2017 05:36:32 -0700 (PDT) Received: from zeus.flokli.de (mail.zeus.flokli.de [88.198.15.28]) by arlo.cworth.org (Postfix) with ESMTPS id 132536DE00DB for ; Sun, 24 Sep 2017 05:36:31 -0700 (PDT) Received: from localhost (unknown [109.236.159.49]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: flokli@flokli.de) by zeus.flokli.de (Postfix) with ESMTPSA id 6028430D9B8; Sun, 24 Sep 2017 12:36:29 +0000 (UTC) From: Florian Klink To: notmuch@notmuchmail.org Cc: David Bremner , Andreas Rammhold , Florian Klink Subject: [PATCH v2 1/2] python: open messages in binary mode Date: Sun, 24 Sep 2017 14:36:11 +0200 Message-Id: <20170924123612.26679-1-flokli@flokli.de> X-Mailer: git-send-email 2.14.1 In-Reply-To: <87bmn47h0b.fsf@tethera.net> References: <87bmn47h0b.fsf@tethera.net> X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Sep 2017 12:36:33 -0000 currently, notmuch's get_message_parts() opens the file in text mode and passes the file object to email.message_from_file(fp). In case the email contains UTF-8 characters, reading might fail inside email.parser with the following exception: File "/usr/lib/python3.6/site-packages/notmuch/message.py", line 591, in get_message_parts email_msg = email.message_from_binary_file(fp) File "/usr/lib/python3.6/email/__init__.py", line 62, in message_from_binary_file return BytesParser(*args, **kws).parse(fp) File "/usr/lib/python3.6/email/parser.py", line 110, in parse return self.parser.parse(fp, headersonly) File "/usr/lib/python3.6/email/parser.py", line 54, in parse data = fp.read(8192) File "/usr/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1865: invalid continuation byte To fix this, read file in binary mode and pass to email.message_from_binary_file(fp). Unfortunately, Python 2 doesn't support email.message_from_binary_file(fp), so keep using email.message_from_file(fp) there. Signed-off-by: Florian Klink --- bindings/python/notmuch/message.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py index cce377d0..d5b98e4f 100644 --- a/bindings/python/notmuch/message.py +++ b/bindings/python/notmuch/message.py @@ -41,6 +41,7 @@ from .tag import Tags from .filenames import Filenames import email +import sys class Message(Python3StringMixIn): @@ -587,8 +588,11 @@ class Message(Python3StringMixIn): def get_message_parts(self): """Output like notmuch show""" - fp = open(self.get_filename()) - email_msg = email.message_from_file(fp) + fp = open(self.get_filename(), 'rb') + if sys.version_info[0] < 3: + email_msg = email.message_from_file(fp) + else: + email_msg = email.message_from_binary_file(fp) fp.close() out = [] -- 2.14.1