unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* python-notmuch decoding error on a message
@ 2011-11-06 22:16 Antoine Amarilli
  2011-11-24 16:13 ` David Bremner
  2012-11-06  1:50 ` David Bremner
  0 siblings, 2 replies; 6+ messages in thread
From: Antoine Amarilli @ 2011-11-06 22:16 UTC (permalink / raw)
  To: notmuch


[-- Attachment #1.1: Type: text/plain, Size: 355 bytes --]

Hello,

The attached message makes python-notmuch crash when trying to access it (see
attached log).

I don't know if the encoding of Subject is valid or not, but it would probably
be better anyway to ignore decoding errors and return some approximation of
Subject instead of failing like this.

Any ideas?

Thanks!

-- 
Antoine Amarilli


[-- Attachment #1.2: log --]
[-- Type: text/plain, Size: 902 bytes --]

$ python
Python 2.7.2+ (default, Aug 16 2011, 09:23:59) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import notmuch
>>> db = notmuch.Database()
>>> q = db.create_query("id:test20110928121705.GA3877@example.com")
>>> t = q.search_threads()
>>> for a in t:
...     print a
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/dist-packages/notmuch/thread.py", line 379, in __str__
    thread['subject'] = self.get_subject()
  File "/usr/local/lib/python2.7/dist-packages/notmuch/thread.py", line 311, in get_subject
    return subject.decode('UTF-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe8 in position 6: invalid continuation byte
>>> 

[-- Attachment #1.3: message --]
[-- Type: text/plain, Size: 462 bytes --]

Date: Wed, 28 Sep 2011 14:17:05 +0200
From: nobody@example.com
To: nobody@example.com
Subject: Re: Fwd: =?utf-8?B?M+ht?= =?utf-8?Q?e?= Salon du Livre juridique
Message-ID: <test20110928121705.GA3877@example.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ZGiS0Q5IWpPtfppv"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Content-Length: 865
Lines: 2

test


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-notmuch decoding error on a message
  2011-11-06 22:16 python-notmuch decoding error on a message Antoine Amarilli
@ 2011-11-24 16:13 ` David Bremner
  2011-11-25  9:04   ` Patrick Totzke
  2011-12-01 21:30   ` Sebastian Spaeth
  2012-11-06  1:50 ` David Bremner
  1 sibling, 2 replies; 6+ messages in thread
From: David Bremner @ 2011-11-24 16:13 UTC (permalink / raw)
  To: Antoine Amarilli, notmuch

On Sun, 6 Nov 2011 23:15:54 +0100, Antoine Amarilli <antoine.amarilli@ens.fr> wrote:
> Hello,
> 
> The attached message makes python-notmuch crash when trying to access it (see
> attached log).
> 
> I don't know if the encoding of Subject is valid or not, but it would probably
> be better anyway to ignore decoding errors and return some approximation of
> Subject instead of failing like this.
> 

I get a set of critical errors about forgetting to call g_type_init.

We actually call g_type_init in the CLI now, thanks to 
   
   id:"1311625989-97755-1-git-send-email-aaronecay@gmail.com"

but it sounds like this probably needs to be called either in libnotmuch
or in the bindings. 

For what it is worth this message decodes fine in the CLI

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-notmuch decoding error on a message
  2011-11-24 16:13 ` David Bremner
@ 2011-11-25  9:04   ` Patrick Totzke
  2011-11-25 12:15     ` David Bremner
  2011-12-01 21:30   ` Sebastian Spaeth
  1 sibling, 1 reply; 6+ messages in thread
From: Patrick Totzke @ 2011-11-25  9:04 UTC (permalink / raw)
  To: David Bremner, Antoine Amarilli, notmuch

Silly question: how do i get Antoine's msg stup into notmuch? i tried
using pythons mailbox lib to add this string to one of my mailboxes, which works fine.
but upon `notmuch new` I get something along the lines of "skipped non-mail file $myfile"..

back to the topic:
I find it hichgly suprising that this decode fails because one can easily do sth like:

```
>>>'=?utf-8?B?M+ht?= =?utf-8?Q?e?='.decode('UTF-8')
u'=?utf-8?B?M+ht?= =?utf-8?Q?e?='
```
So the actual string should not be the problem. Apparently,
the string as its stored in the index is not plain ascii anymore, which it was in the msg.
I thought notmuch stores exacctly what it gets?

Apart from this, I'd recommend replacing all decodes to unicode objects
by a subroutine that does the following:
If a global property notmuch.DEBUG is set to true: decode as is,
which will raise these exceptions upon errors
else: use .decode('UTF-8', errors='ignore').

In case the mail is not malformed, it will not contain any non-ascii symbols whatsoever,
so both ways should work. If you happen to deal with a malformed mail, you'd get
the problematic symbols omitted (beware of this when doing cryptostuff).

what do you think?
/p


Quoting David Bremner (2011-11-24 16:13:22)
>On Sun, 6 Nov 2011 23:15:54 +0100, Antoine Amarilli <antoine.amarilli@ens.fr> wrote:
>> Hello,
>> 
>> The attached message makes python-notmuch crash when trying to access it (see
>> attached log).
>> 
>> I don't know if the encoding of Subject is valid or not, but it would probably
>> be better anyway to ignore decoding errors and return some approximation of
>> Subject instead of failing like this.
>> 
>
>I get a set of critical errors about forgetting to call g_type_init.
>
>We actually call g_type_init in the CLI now, thanks to 
>   
>   id:"1311625989-97755-1-git-send-email-aaronecay@gmail.com"
>
>but it sounds like this probably needs to be called either in libnotmuch
>or in the bindings. 
>
>For what it is worth this message decodes fine in the CLI
>
>d
>_______________________________________________
>notmuch mailing list
>notmuch@notmuchmail.org
>http://notmuchmail.org/mailman/listinfo/notmuch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-notmuch decoding error on a message
  2011-11-25  9:04   ` Patrick Totzke
@ 2011-11-25 12:15     ` David Bremner
  0 siblings, 0 replies; 6+ messages in thread
From: David Bremner @ 2011-11-25 12:15 UTC (permalink / raw)
  To: Patrick Totzke, Antoine Amarilli, notmuch

On Fri, 25 Nov 2011 09:04:06 +0000, Patrick Totzke <patricktotzke@googlemail.com> wrote:

> Silly question: how do i get Antoine's msg stup into notmuch? i tried
> using pythons mailbox lib to add this string to one of my mailboxes,
> which works fine.  but upon `notmuch new` I get something along the
> lines of "skipped non-mail file $myfile"..

I saved the attachement using notmuch-emacs, and then ran notmuch-new.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-notmuch decoding error on a message
  2011-11-24 16:13 ` David Bremner
  2011-11-25  9:04   ` Patrick Totzke
@ 2011-12-01 21:30   ` Sebastian Spaeth
  1 sibling, 0 replies; 6+ messages in thread
From: Sebastian Spaeth @ 2011-12-01 21:30 UTC (permalink / raw)
  To: David Bremner, Antoine Amarilli, notmuch

[-- Attachment #1: Type: text/plain, Size: 506 bytes --]

On Thu, 24 Nov 2011 12:13:22 -0400, David Bremner <david@tethera.net> wrote:
> I get a set of critical errors about forgetting to call g_type_init.
> We actually call g_type_init in the CLI now, thanks to 

Oooh, ahh, I just saw these message on doing 'notmuch.py search "moo"'
myself. I would prefer if I (the binding) would not have to deal with
g_type_init stuff myself, it would mean loading more C libraries and
doing stuff that no libnotmuch documentation has told me about :-).

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-notmuch decoding error on a message
  2011-11-06 22:16 python-notmuch decoding error on a message Antoine Amarilli
  2011-11-24 16:13 ` David Bremner
@ 2012-11-06  1:50 ` David Bremner
  1 sibling, 0 replies; 6+ messages in thread
From: David Bremner @ 2012-11-06  1:50 UTC (permalink / raw)
  To: Antoine Amarilli, notmuch

Antoine Amarilli <antoine.amarilli@ens.fr> writes:

> Hello,
>
> The attached message makes python-notmuch crash when trying to access it (see
> attached log).
>
> I don't know if the encoding of Subject is valid or not, but it would probably
> be better anyway to ignore decoding errors and return some approximation of
> Subject instead of failing like this.
>

It seems this bug was fixed a while ago, so I'm removing it from the bug
list (http://nmbug.tethera.net/status). If it still exists, another test
case would be appreciated.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-06  1:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-06 22:16 python-notmuch decoding error on a message Antoine Amarilli
2011-11-24 16:13 ` David Bremner
2011-11-25  9:04   ` Patrick Totzke
2011-11-25 12:15     ` David Bremner
2011-12-01 21:30   ` Sebastian Spaeth
2012-11-06  1:50 ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).