unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Bug: problem decoding some non-ascii characters in subjects
@ 2013-02-08 10:22 Albin Stjerna
  2013-02-08 11:21 ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-08 10:22 UTC (permalink / raw)
  To: notmuch

Hi,

I've been noticing that notmuch has some problems decoding certain strangely-encoded non-ascii characters in certain emails. For example, today I got this: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be rendered: »Digitaliseringsprojektens skadliga förkärlek för PDF-formatet«).

Apparently, some metadata is passed on to help the MUA decode the string, but notmuch doesn't seem to handle it. Entire emails can of course be supplied as needed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-08 10:22 Bug: problem decoding some non-ascii characters in subjects Albin Stjerna
@ 2013-02-08 11:21 ` Jani Nikula
  2013-02-09  9:04   ` Albin Stjerna
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-08 11:21 UTC (permalink / raw)
  To: Albin Stjerna, notmuch

On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> I've been noticing that notmuch has some problems decoding certain
> strangely-encoded non-ascii characters in certain emails. For example,
> today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> rendered: »Digitaliseringsprojektens skadliga förkärlek för
> PDF-formatet«).
>
> Apparently, some metadata is passed on to help the MUA decode the
> string, but notmuch doesn't seem to handle it. Entire emails can of
> course be supplied as needed.

Please copy paste the Subject: header directly from the message file.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-08 11:21 ` Jani Nikula
@ 2013-02-09  9:04   ` Albin Stjerna
  2013-02-09 10:06     ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-09  9:04 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula wrote:

> On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> > I've been noticing that notmuch has some problems decoding certain
> > strangely-encoded non-ascii characters in certain emails. For example,
> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> > rendered: »Digitaliseringsprojektens skadliga förkärlek för
> > PDF-formatet«).
> >
> > Apparently, some metadata is passed on to help the MUA decode the
> > string, but notmuch doesn't seem to handle it. Entire emails can of
> > course be supplied as needed.

> Please copy paste the Subject: header directly from the message file.

The exact Subject: header (from the file, not notmuch) is:
Subject: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Other potentially interesting headers are:
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106
            Thunderbird/17.0.2
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

(formatted as they appeared in the mail file)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-09  9:04   ` Albin Stjerna
@ 2013-02-09 10:06     ` Jani Nikula
  2013-02-10  8:30       ` Albin Stjerna
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-09 10:06 UTC (permalink / raw)
  To: Albin Stjerna, notmuch

On Sat, 09 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> Jani Nikula wrote:
>
>> On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
>> > I've been noticing that notmuch has some problems decoding certain
>> > strangely-encoded non-ascii characters in certain emails. For example,
>> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
>> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
>> > rendered: »Digitaliseringsprojektens skadliga förkärlek för
>> > PDF-formatet«).
>> >
>> > Apparently, some metadata is passed on to help the MUA decode the
>> > string, but notmuch doesn't seem to handle it. Entire emails can of
>> > course be supplied as needed.
>
>> Please copy paste the Subject: header directly from the message file.
>
> The exact Subject: header (from the file, not notmuch) is:
> Subject: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet

Is that entirely on one line in the original message file? If not, where
exactly is it split?

Either way, at a glance, it seems like the encoding is malformed. I
think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
should be separated by space to make it an atom. [RFC 2047, RFC 2822].

If you manually move the leading 'f' after the "?Q?" bit, it works as
expected. It looks like the bug is in the sender's user agent.


BR,
Jani.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-09 10:06     ` Jani Nikula
@ 2013-02-10  8:30       ` Albin Stjerna
  2013-02-10 12:21         ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-10  8:30 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula wrote:

> Is that entirely on one line in the original message file? If not, where
> exactly is it split?

It's in one line.

> Either way, at a glance, it seems like the encoding is malformed. I
> think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
> should be separated by space to make it an atom. [RFC 2047, RFC 2822].

> If you manually move the leading 'f' after the "?Q?" bit, it works as
> expected. It looks like the bug is in the sender's user agent.

Hm. So I should report this to Thunderbird? I tried searching through
their bug reports but didn't find anything.

I didn't think it was a bug, since Gmail rendered it just fine.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-10  8:30       ` Albin Stjerna
@ 2013-02-10 12:21         ` Jani Nikula
  2013-02-10 12:56           ` Albin Stjerna
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-10 12:21 UTC (permalink / raw)
  To: Albin Stjerna, notmuch

On Sun, 10 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> Hm. So I should report this to Thunderbird? I tried searching through
> their bug reports but didn't find anything.

I tried that too; there were other RFC 2047 related header bugs but
could not find this one. And the other bugs were old and fixed. Judging
by the User-Agent header this is fairly up-to-date Thunderbird.

It seems to be list mail. It would not surprise me that some ill-advised
mailing list manager would decode and re-encode the subject. One could
try sending the same message directly and through the list, and see if
there's a difference.

> I didn't think it was a bug, since Gmail rendered it just fine.

It's possible they interpret the RFC in a more relaxed way. AFAICS
notmuch relies on gmime to handle this, so I think we would have to go
out of our way to work around this in notmuch. A quick search did not
bring up anything gmime related about this, so I don't know if this has
been discussed in the gime context.


BR,
Jani.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug: problem decoding some non-ascii characters in subjects
  2013-02-10 12:21         ` Jani Nikula
@ 2013-02-10 12:56           ` Albin Stjerna
  0 siblings, 0 replies; 7+ messages in thread
From: Albin Stjerna @ 2013-02-10 12:56 UTC (permalink / raw)
  To: Jani Nikula, notmuch

Jani Nikula wrote:

> It seems to be list mail. It would not surprise me that some ill-advised
> mailing list manager would decode and re-encode the subject. One could
> try sending the same message directly and through the list, and see if
> there's a difference.

Possibly, though I don't have the original message since I got it from a
list. It's actually from Biblist (a Swedish mailing list for librarians
and such), and it seems to run LISTSERV 15.5 (see
http://segate.sunet.se/cgi-bin/wa?A0=BIBLIST). According to itself, it's
»industry-standard«, which would indeed support your thesis.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-02-11  4:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-08 10:22 Bug: problem decoding some non-ascii characters in subjects Albin Stjerna
2013-02-08 11:21 ` Jani Nikula
2013-02-09  9:04   ` Albin Stjerna
2013-02-09 10:06     ` Jani Nikula
2013-02-10  8:30       ` Albin Stjerna
2013-02-10 12:21         ` Jani Nikula
2013-02-10 12:56           ` Albin Stjerna

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).