* Bug: problem decoding some non-ascii characters in subjects
@ 2013-02-08 10:22 Albin Stjerna
2013-02-08 11:21 ` Jani Nikula
0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-08 10:22 UTC (permalink / raw)
To: notmuch
Hi,
I've been noticing that notmuch has some problems decoding certain strangely-encoded non-ascii characters in certain emails. For example, today I got this: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be rendered: »Digitaliseringsprojektens skadliga förkärlek för PDF-formatet«).
Apparently, some metadata is passed on to help the MUA decode the string, but notmuch doesn't seem to handle it. Entire emails can of course be supplied as needed.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-08 10:22 Bug: problem decoding some non-ascii characters in subjects Albin Stjerna
@ 2013-02-08 11:21 ` Jani Nikula
2013-02-09 9:04 ` Albin Stjerna
0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-08 11:21 UTC (permalink / raw)
To: Albin Stjerna, notmuch
On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> I've been noticing that notmuch has some problems decoding certain
> strangely-encoded non-ascii characters in certain emails. For example,
> today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> rendered: »Digitaliseringsprojektens skadliga förkärlek för
> PDF-formatet«).
>
> Apparently, some metadata is passed on to help the MUA decode the
> string, but notmuch doesn't seem to handle it. Entire emails can of
> course be supplied as needed.
Please copy paste the Subject: header directly from the message file.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-08 11:21 ` Jani Nikula
@ 2013-02-09 9:04 ` Albin Stjerna
2013-02-09 10:06 ` Jani Nikula
0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-09 9:04 UTC (permalink / raw)
To: Jani Nikula, notmuch
Jani Nikula wrote:
> On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> > I've been noticing that notmuch has some problems decoding certain
> > strangely-encoded non-ascii characters in certain emails. For example,
> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
> > rendered: »Digitaliseringsprojektens skadliga förkärlek för
> > PDF-formatet«).
> >
> > Apparently, some metadata is passed on to help the MUA decode the
> > string, but notmuch doesn't seem to handle it. Entire emails can of
> > course be supplied as needed.
> Please copy paste the Subject: header directly from the message file.
The exact Subject: header (from the file, not notmuch) is:
Subject: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet
Other potentially interesting headers are:
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106
Thunderbird/17.0.2
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
(formatted as they appeared in the mail file)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-09 9:04 ` Albin Stjerna
@ 2013-02-09 10:06 ` Jani Nikula
2013-02-10 8:30 ` Albin Stjerna
0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-09 10:06 UTC (permalink / raw)
To: Albin Stjerna, notmuch
On Sat, 09 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> Jani Nikula wrote:
>
>> On Fri, 08 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
>> > I've been noticing that notmuch has some problems decoding certain
>> > strangely-encoded non-ascii characters in certain emails. For example,
>> > today I got this: [BIBLIST] Digitaliseringensprojektens skadliga
>> > f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet (should be
>> > rendered: »Digitaliseringsprojektens skadliga förkärlek för
>> > PDF-formatet«).
>> >
>> > Apparently, some metadata is passed on to help the MUA decode the
>> > string, but notmuch doesn't seem to handle it. Entire emails can of
>> > course be supplied as needed.
>
>> Please copy paste the Subject: header directly from the message file.
>
> The exact Subject: header (from the file, not notmuch) is:
> Subject: [BIBLIST] Digitaliseringensprojektens skadliga f=?ISO-8859-1?Q?=F6rk=E4rlek_f=F6r_?= PDF-formatet
Is that entirely on one line in the original message file? If not, where
exactly is it split?
Either way, at a glance, it seems like the encoding is malformed. I
think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
should be separated by space to make it an atom. [RFC 2047, RFC 2822].
If you manually move the leading 'f' after the "?Q?" bit, it works as
expected. It looks like the bug is in the sender's user agent.
BR,
Jani.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-09 10:06 ` Jani Nikula
@ 2013-02-10 8:30 ` Albin Stjerna
2013-02-10 12:21 ` Jani Nikula
0 siblings, 1 reply; 7+ messages in thread
From: Albin Stjerna @ 2013-02-10 8:30 UTC (permalink / raw)
To: Jani Nikula, notmuch
Jani Nikula wrote:
> Is that entirely on one line in the original message file? If not, where
> exactly is it split?
It's in one line.
> Either way, at a glance, it seems like the encoding is malformed. I
> think the encoded-word ("=?" charset "?" encoding "?" encoded-text "?=")
> should be separated by space to make it an atom. [RFC 2047, RFC 2822].
> If you manually move the leading 'f' after the "?Q?" bit, it works as
> expected. It looks like the bug is in the sender's user agent.
Hm. So I should report this to Thunderbird? I tried searching through
their bug reports but didn't find anything.
I didn't think it was a bug, since Gmail rendered it just fine.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-10 8:30 ` Albin Stjerna
@ 2013-02-10 12:21 ` Jani Nikula
2013-02-10 12:56 ` Albin Stjerna
0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2013-02-10 12:21 UTC (permalink / raw)
To: Albin Stjerna, notmuch
On Sun, 10 Feb 2013, Albin Stjerna <albin.stjerna@gmail.com> wrote:
> Hm. So I should report this to Thunderbird? I tried searching through
> their bug reports but didn't find anything.
I tried that too; there were other RFC 2047 related header bugs but
could not find this one. And the other bugs were old and fixed. Judging
by the User-Agent header this is fairly up-to-date Thunderbird.
It seems to be list mail. It would not surprise me that some ill-advised
mailing list manager would decode and re-encode the subject. One could
try sending the same message directly and through the list, and see if
there's a difference.
> I didn't think it was a bug, since Gmail rendered it just fine.
It's possible they interpret the RFC in a more relaxed way. AFAICS
notmuch relies on gmime to handle this, so I think we would have to go
out of our way to work around this in notmuch. A quick search did not
bring up anything gmime related about this, so I don't know if this has
been discussed in the gime context.
BR,
Jani.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Bug: problem decoding some non-ascii characters in subjects
2013-02-10 12:21 ` Jani Nikula
@ 2013-02-10 12:56 ` Albin Stjerna
0 siblings, 0 replies; 7+ messages in thread
From: Albin Stjerna @ 2013-02-10 12:56 UTC (permalink / raw)
To: Jani Nikula, notmuch
Jani Nikula wrote:
> It seems to be list mail. It would not surprise me that some ill-advised
> mailing list manager would decode and re-encode the subject. One could
> try sending the same message directly and through the list, and see if
> there's a difference.
Possibly, though I don't have the original message since I got it from a
list. It's actually from Biblist (a Swedish mailing list for librarians
and such), and it seems to run LISTSERV 15.5 (see
http://segate.sunet.se/cgi-bin/wa?A0=BIBLIST). According to itself, it's
»industry-standard«, which would indeed support your thesis.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-02-11 4:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-08 10:22 Bug: problem decoding some non-ascii characters in subjects Albin Stjerna
2013-02-08 11:21 ` Jani Nikula
2013-02-09 9:04 ` Albin Stjerna
2013-02-09 10:06 ` Jani Nikula
2013-02-10 8:30 ` Albin Stjerna
2013-02-10 12:21 ` Jani Nikula
2013-02-10 12:56 ` Albin Stjerna
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).