unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH] python: open messages in binary mode
@ 2017-08-24 21:30 Florian Klink
  2017-08-24 22:11 ` David Bremner
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Klink @ 2017-08-24 21:30 UTC (permalink / raw)
  To: notmuch

currently, notmuch's get_message_parts() opens the file in text mode and passes
the file object to email.message_from_file(fp). In case the email contains
UTF-8 characters, reading might fail inside email.parser with the following exception:

  File "/usr/lib/python3.6/site-packages/notmuch/message.py", line 591, in get_message_parts
    email_msg = email.message_from_binary_file(fp)
  File "/usr/lib/python3.6/email/__init__.py", line 62, in message_from_binary_file
    return BytesParser(*args, **kws).parse(fp)
  File "/usr/lib/python3.6/email/parser.py", line 110, in parse
    return self.parser.parse(fp, headersonly)
  File "/usr/lib/python3.6/email/parser.py", line 54, in parse
    data = fp.read(8192)
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1865: invalid continuation byte

To fix this, read file in binary mode and pass to
email.message_from_binary_file(fp).

Signed-off-by: Florian Klink <flokli@flokli.de>
---
 bindings/python/notmuch/message.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index cce377d0..531b22d0 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -587,8 +587,8 @@ class Message(Python3StringMixIn):
 
     def get_message_parts(self):
         """Output like notmuch show"""
-        fp = open(self.get_filename())
-        email_msg = email.message_from_file(fp)
+        fp = open(self.get_filename(), 'rb')
+        email_msg = email.message_from_binary_file(fp)
         fp.close()
 
         out = []
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] python: open messages in binary mode
  2017-08-24 21:30 [PATCH] python: open messages in binary mode Florian Klink
@ 2017-08-24 22:11 ` David Bremner
  2017-08-25  6:08   ` Gaute Hope
  2017-09-24 12:36   ` [PATCH v2 1/2] " Florian Klink
  0 siblings, 2 replies; 9+ messages in thread
From: David Bremner @ 2017-08-24 22:11 UTC (permalink / raw)
  To: Florian Klink, notmuch

Florian Klink <flokli@flokli.de> writes:

> To fix this, read file in binary mode and pass to
> email.message_from_binary_file(fp).
>

Thanks for the patch, but notmuch is not (yet) python3 only. Apparently
that function is only since python 3.2. I'm not sure if/when we'll drop
python 2.7 support, but not without deprecating it for a few releases.

Also, since compatibility is a bit tricky here, it would be great to
have a test. See test/T390-python.sh for some examples.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] python: open messages in binary mode
  2017-08-24 22:11 ` David Bremner
@ 2017-08-25  6:08   ` Gaute Hope
  2017-08-25 10:18     ` Florian Klink
  2017-09-24 12:36   ` [PATCH v2 1/2] " Florian Klink
  1 sibling, 1 reply; 9+ messages in thread
From: Gaute Hope @ 2017-08-25  6:08 UTC (permalink / raw)
  To: David Bremner, Florian Klink, notmuch

David Bremner writes on august 25, 2017 0:11:
> Florian Klink <flokli@flokli.de> writes:
> 
>> To fix this, read file in binary mode and pass to
>> email.message_from_binary_file(fp).
>>
> 
> Thanks for the patch, but notmuch is not (yet) python3 only. Apparently
> that function is only since python 3.2. I'm not sure if/when we'll drop
> python 2.7 support, but not without deprecating it for a few releases.

Is there anyone still exclusively on Python 2.7? Perhaps the time is 
ripe for starting that process? Encoding compatability is an unholy mess 
to maintain for one Python distro.

Is any of alot, afew, etc still on Python 2 only?

Regards, Gaute


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] python: open messages in binary mode
  2017-08-25  6:08   ` Gaute Hope
@ 2017-08-25 10:18     ` Florian Klink
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Klink @ 2017-08-25 10:18 UTC (permalink / raw)
  To: Gaute Hope; +Cc: David Bremner, notmuch

>>that function is only since python 3.2. I'm not sure if/when we'll drop
>>python 2.7 support, but not without deprecating it for a few releases.

>Is there anyone still exclusively on Python 2.7? Perhaps the time is 
>ripe for starting that process? Encoding compatability is an unholy 
>mess to maintain for one Python distro.

If Python 2 doesn't have email.message_from_binary_file(), it might be the bug
occuring to be can't really be fixed in Python 2 anyways. Maybe it's possible to
open the file in binary mode on Python 2, and pass this to
email.message_from_file() though, I will tinker around a bit this evening, and
let you know. 

>Is any of alot, afew, etc still on Python 2 only?

afew works on both Python 2 and 3
alot seems to currently be Python 2 only (at least the Travis runs are), but it
looks like they are thinking about moving to Python 3 and dropping Python 2:
https://github.com/pazz/alot/issues/1047#issuecomment-300713819

Florian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] python: open messages in binary mode
  2017-08-24 22:11 ` David Bremner
  2017-08-25  6:08   ` Gaute Hope
@ 2017-09-24 12:36   ` Florian Klink
  2017-09-24 12:36     ` [PATCH v2 2/2] T390-python: add test for get_message_parts and special characters Florian Klink
  2017-10-02 11:04     ` [PATCH v2 1/2] python: open messages in binary mode David Bremner
  1 sibling, 2 replies; 9+ messages in thread
From: Florian Klink @ 2017-09-24 12:36 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner, Andreas Rammhold, Florian Klink

currently, notmuch's get_message_parts() opens the file in text mode and passes
the file object to email.message_from_file(fp). In case the email contains
UTF-8 characters, reading might fail inside email.parser with the following exception:

  File "/usr/lib/python3.6/site-packages/notmuch/message.py", line 591, in get_message_parts
    email_msg = email.message_from_binary_file(fp)
  File "/usr/lib/python3.6/email/__init__.py", line 62, in message_from_binary_file
    return BytesParser(*args, **kws).parse(fp)
  File "/usr/lib/python3.6/email/parser.py", line 110, in parse
    return self.parser.parse(fp, headersonly)
  File "/usr/lib/python3.6/email/parser.py", line 54, in parse
    data = fp.read(8192)
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1865: invalid continuation byte

To fix this, read file in binary mode and pass to
email.message_from_binary_file(fp).

Unfortunately, Python 2 doesn't support
email.message_from_binary_file(fp), so keep using
email.message_from_file(fp) there.

Signed-off-by: Florian Klink <flokli@flokli.de>
---
 bindings/python/notmuch/message.py | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index cce377d0..d5b98e4f 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -41,6 +41,7 @@ from .tag import Tags
 from .filenames import Filenames
 
 import email
+import sys
 
 
 class Message(Python3StringMixIn):
@@ -587,8 +588,11 @@ class Message(Python3StringMixIn):
 
     def get_message_parts(self):
         """Output like notmuch show"""
-        fp = open(self.get_filename())
-        email_msg = email.message_from_file(fp)
+        fp = open(self.get_filename(), 'rb')
+        if sys.version_info[0] < 3:
+            email_msg = email.message_from_file(fp)
+        else:
+            email_msg = email.message_from_binary_file(fp)
         fp.close()
 
         out = []
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] T390-python: add test for get_message_parts and special characters
  2017-09-24 12:36   ` [PATCH v2 1/2] " Florian Klink
@ 2017-09-24 12:36     ` Florian Klink
  2017-10-02 11:04     ` [PATCH v2 1/2] python: open messages in binary mode David Bremner
  1 sibling, 0 replies; 9+ messages in thread
From: Florian Klink @ 2017-09-24 12:36 UTC (permalink / raw)
  To: notmuch; +Cc: David Bremner, Andreas Rammhold, Florian Klink

This imports a message with ISO-8859-2 encoded characters, then opens
the database using the python bindings. We peek through all mesage
parts, afterwards print the message id.

Signed-off-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Andreas Rammhold <andreas@rammhold.de>
---
 test/T390-python.sh | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/test/T390-python.sh b/test/T390-python.sh
index a9a61145..5921cac9 100755
--- a/test/T390-python.sh
+++ b/test/T390-python.sh
@@ -56,5 +56,22 @@ grep '^[0-9a-f]' OUTPUT > INITIAL_OUTPUT
 test_begin_subtest "output of count matches test code"
 notmuch count --lastmod '*' | cut -f2-3 > OUTPUT
 test_expect_equal_file INITIAL_OUTPUT OUTPUT
+add_message '[content-type]="text/plain; charset=iso-8859-2"' \
+            '[content-transfer-encoding]=8bit' \
+            '[subject]="ISO-8859-2 encoded message"' \
+            "[body]=$'Czech word tu\350\362\341\350\350\355 means pinguin\'s.'" # ISO-8859-2 characters are generated by shell's escape sequences
+test_begin_subtest "Add ISO-8859-2 encoded message, call get_message_parts"
+test_python <<EOF
+import notmuch
+db = notmuch.Database(mode=notmuch.Database.MODE.READ_ONLY)
+q_new = notmuch.Query(db, 'ISO-8859-2 encoded message')
+for m in q_new.search_messages():
+    for mp in m.get_message_parts():
+      continue
+    print(m.get_message_id())
+EOF
+
+notmuch search --sort=oldest-first --output=messages "tučňáččí" | sed s/^id:// > EXPECTED
+test_expect_equal_file EXPECTED OUTPUT
 
 test_done
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] python: open messages in binary mode
  2017-09-24 12:36   ` [PATCH v2 1/2] " Florian Klink
  2017-09-24 12:36     ` [PATCH v2 2/2] T390-python: add test for get_message_parts and special characters Florian Klink
@ 2017-10-02 11:04     ` David Bremner
  2017-10-02 11:39       ` Florian Klink
  1 sibling, 1 reply; 9+ messages in thread
From: David Bremner @ 2017-10-02 11:04 UTC (permalink / raw)
  To: Florian Klink, notmuch; +Cc: Andreas Rammhold, Florian Klink

Florian Klink <flokli@flokli.de> writes:

> currently, notmuch's get_message_parts() opens the file in text mode and passes
> the file object to email.message_from_file(fp). In case the email contains
> UTF-8 characters, reading might fail inside email.parser with the following exception:
>

merged series to master. Thanks for the fix. BTW, I noticed the bug only
happens with python3.

d

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] python: open messages in binary mode
  2017-10-02 11:04     ` [PATCH v2 1/2] python: open messages in binary mode David Bremner
@ 2017-10-02 11:39       ` Florian Klink
  2017-10-05 15:50         ` Tomi Ollila
  0 siblings, 1 reply; 9+ messages in thread
From: Florian Klink @ 2017-10-02 11:39 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch, Andreas Rammhold

>merged series to master. Thanks for the fix. BTW, I noticed the bug only
>happens with python3.

Thanks for merging :-)
Yes, most distributions still symlink /usr/bin/python to python2 - maybe that's
the reason why a lot of code still runs on python 2…

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] python: open messages in binary mode
  2017-10-02 11:39       ` Florian Klink
@ 2017-10-05 15:50         ` Tomi Ollila
  0 siblings, 0 replies; 9+ messages in thread
From: Tomi Ollila @ 2017-10-05 15:50 UTC (permalink / raw)
  To: Florian Klink, David Bremner; +Cc: notmuch, Andreas Rammhold

On Mon, Oct 02 2017, Florian Klink wrote:

>>merged series to master. Thanks for the fix. BTW, I noticed the bug only
>>happens with python3.
>
> Thanks for merging :-)
> Yes, most distributions still symlink /usr/bin/python to python2 - maybe that's
> the reason why a lot of code still runs on python 2…

In windows environments one often sees just python2 :(

In macos environments one often sees just python2 :(

In CentOS/RHEL one have to pick python3 (3.4) from EPEL, and python3
did not seem to work out of the box after installing (had to do 
ln -s python34 python3 ) :(


Tomi

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-05 15:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-24 21:30 [PATCH] python: open messages in binary mode Florian Klink
2017-08-24 22:11 ` David Bremner
2017-08-25  6:08   ` Gaute Hope
2017-08-25 10:18     ` Florian Klink
2017-09-24 12:36   ` [PATCH v2 1/2] " Florian Klink
2017-09-24 12:36     ` [PATCH v2 2/2] T390-python: add test for get_message_parts and special characters Florian Klink
2017-10-02 11:04     ` [PATCH v2 1/2] python: open messages in binary mode David Bremner
2017-10-02 11:39       ` Florian Klink
2017-10-05 15:50         ` Tomi Ollila

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).