unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* UTF-8 in mail headers (namely FROM) sent by bugzilla
@ 2013-07-23  8:55 Franz Fellner
  2013-07-23  9:30 ` Eric Abrahamsen
  2013-10-05 10:43 ` Jani Nikula
  0 siblings, 2 replies; 12+ messages in thread
From: Franz Fellner @ 2013-07-23  8:55 UTC (permalink / raw)
  To: notmuch

Hi,

I have a problem with notmuch-vim (now: git master from 10 min. ago) (also with alot and ner, not with 'notmuch show' or notmuch-emacs). UTF-8-encoded From: (at least) does not show Umlauts but a weird encoded-string.
Example:
"Thomas Lübking" as one of the KWin devs comes in replies as
=?UTF-8?Q?Thomas=20L=C3=BCbking=20?=<thomas.luebking@gmail.com>
In private conversations with Thomas everything is fine, so it seems to depend on the way bugzilla encodes the "From:".
Subject is absolutely fine, btw.

Thx
Franz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23  8:55 UTF-8 in mail headers (namely FROM) sent by bugzilla Franz Fellner
@ 2013-07-23  9:30 ` Eric Abrahamsen
  2013-07-23 10:55   ` Franz Fellner
  2013-10-05 10:43 ` Jani Nikula
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Abrahamsen @ 2013-07-23  9:30 UTC (permalink / raw)
  To: notmuch

Franz Fellner <alpine.art.de@gmail.com>
writes:

> Hi,
>
> I have a problem with notmuch-vim (now: git master from 10 min. ago)
> (also with alot and ner, not with 'notmuch show' or notmuch-emacs).
> UTF-8-encoded From: (at least) does not show Umlauts but a weird
> encoded-string.
> Example:
> "Thomas Lübking" as one of the KWin devs comes in replies as
> =?UTF-8?Q?Thomas=20L=C3=BCbking=20?=<thomas.luebking@gmail.com>
> In private conversations with Thomas everything is fine, so it seems to depend on the way bugzilla encodes the "From:".
> Subject is absolutely fine, btw.

Looks like rfc 2047, which is a way of encoding non-ASCII characters in
message headers. Gmail does the same thing, and I've had to work around
that in emacs/gnus.

http://www.ietf.org/rfc/rfc2047.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23  9:30 ` Eric Abrahamsen
@ 2013-07-23 10:55   ` Franz Fellner
  2013-07-23 11:39     ` David Bremner
  2013-07-24  6:44     ` Eric Abrahamsen
  0 siblings, 2 replies; 12+ messages in thread
From: Franz Fellner @ 2013-07-23 10:55 UTC (permalink / raw)
  To: notmuch

On Dienstag, 23. Juli 2013 11:30:28 CEST, Eric Abrahamsen wrote:
>> I have a problem with notmuch-vim (now: git master from 10 min. ago)
>> (also with alot and ner, not with 'notmuch show' or notmuch-emacs).
>> UTF-8-encoded From: (at least) does not show Umlauts but a weird
>> encoded-string. ...
>
> Looks like rfc 2047, which is a way of encoding non-ASCII characters in
> message headers. Gmail does the same thing, and I've had to work around
> that in emacs/gnus.
>
> http://www.ietf.org/rfc/rfc2047.txt

OK, thx. So every app needs to get patched to display those strings properly? Any chance this could be done directly in libnotmuch?
I grepped for "2047" inside te "emacs" subtree, but found nothing (had the hope for a comment for the workaround). Would be interesting to see how this is done, so I can at least try to create a patch (though my ruby is quite basic).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23 10:55   ` Franz Fellner
@ 2013-07-23 11:39     ` David Bremner
  2013-07-26 10:16       ` Jani Nikula
  2013-07-29 19:46       ` Daniel Kahn Gillmor
  2013-07-24  6:44     ` Eric Abrahamsen
  1 sibling, 2 replies; 12+ messages in thread
From: David Bremner @ 2013-07-23 11:39 UTC (permalink / raw)
  To: Franz Fellner, notmuch

Franz Fellner <alpine.art.de@gmail.com> writes:

>
> OK, thx. So every app needs to get patched to display those strings
> properly? Any chance this could be done directly in libnotmuch?  I
> grepped for "2047" inside te "emacs" subtree, but found nothing (had
> the hope for a comment for the workaround). Would be interesting to
> see how this is done, so I can at least try to create a patch (though
> my ruby is quite basic).

In general notmuch relies on libgmime for rfc2047 parsing.  I'm not sure
of all the details now, but some of the filtering does happen in the
CLI, not the lib.  You could start by looking at
gmime-filter-headers.[ch] in the top directory.

d

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23 10:55   ` Franz Fellner
  2013-07-23 11:39     ` David Bremner
@ 2013-07-24  6:44     ` Eric Abrahamsen
  1 sibling, 0 replies; 12+ messages in thread
From: Eric Abrahamsen @ 2013-07-24  6:44 UTC (permalink / raw)
  To: notmuch

Franz Fellner <alpine.art.de@gmail.com>
writes:

> On Dienstag, 23. Juli 2013 11:30:28 CEST, Eric Abrahamsen wrote:
>>> I have a problem with notmuch-vim (now: git master from 10 min. ago)
>>> (also with alot and ner, not with 'notmuch show' or notmuch-emacs).
>>> UTF-8-encoded From: (at least) does not show Umlauts but a weird
>>> encoded-string. ...
>>
>> Looks like rfc 2047, which is a way of encoding non-ASCII characters in
>> message headers. Gmail does the same thing, and I've had to work around
>> that in emacs/gnus.
>>
>> http://www.ietf.org/rfc/rfc2047.txt
>
> OK, thx. So every app needs to get patched to display those strings properly? Any chance this could be done directly in libnotmuch?
> I grepped for "2047" inside te "emacs" subtree, but found nothing (had
> the hope for a comment for the workaround). Would be interesting to
> see how this is done, so I can at least try to create a patch (though
> my ruby is quite basic).

The version of gnus I'm using (git) comes with a rfc2047.el file, with
all the appropriate functions. That might be of interest, even if the
solution ends up being in the basic library...

http://git.gnus.org/cgit/gnus.git/tree/lisp/rfc2047.el

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23 11:39     ` David Bremner
@ 2013-07-26 10:16       ` Jani Nikula
  2013-07-29 19:46         ` Daniel Kahn Gillmor
  2013-07-29 19:46       ` Daniel Kahn Gillmor
  1 sibling, 1 reply; 12+ messages in thread
From: Jani Nikula @ 2013-07-26 10:16 UTC (permalink / raw)
  To: David Bremner, Franz Fellner, notmuch

On Tue, 23 Jul 2013, David Bremner <david@tethera.net> wrote:
> Franz Fellner <alpine.art.de@gmail.com> writes:
>
>>
>> OK, thx. So every app needs to get patched to display those strings
>> properly? Any chance this could be done directly in libnotmuch?  I
>> grepped for "2047" inside te "emacs" subtree, but found nothing (had
>> the hope for a comment for the workaround). Would be interesting to
>> see how this is done, so I can at least try to create a patch (though
>> my ruby is quite basic).
>
> In general notmuch relies on libgmime for rfc2047 parsing.  I'm not sure
> of all the details now, but some of the filtering does happen in the
> CLI, not the lib.  You could start by looking at
> gmime-filter-headers.[ch] in the top directory.

I'm experiencing a similar problem with the Subject: headers in bugzilla
mail. Per RFC 2047,

    Ordinary ASCII text and 'encoded-word's may appear together in the
    same header field.  However, an 'encoded-word' that appears in a
    header field defined as '*text' MUST be separated from any adjacent
    'encoded-word' or 'text' by 'linear-white-space'.

In the problematic mails, the encoded-word begins immediately after
preceding text, i.e. without linear-white-space. Manually adding that
space in the message file makes the subject display as expected.

The decoding is done in the cli using g_mime_message_get_subject(). I'm
not sure if there's much that can be done about it within notmuch.

BR,
Jani.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-26 10:16       ` Jani Nikula
@ 2013-07-29 19:46         ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Kahn Gillmor @ 2013-07-29 19:46 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

On 07/26/2013 06:16 AM, Jani Nikula wrote:
> I'm experiencing a similar problem with the Subject: headers in bugzilla
> mail. Per RFC 2047,
> 
>     Ordinary ASCII text and 'encoded-word's may appear together in the
>     same header field.  However, an 'encoded-word' that appears in a
>     header field defined as '*text' MUST be separated from any adjacent
>     'encoded-word' or 'text' by 'linear-white-space'.
> 
> In the problematic mails, the encoded-word begins immediately after
> preceding text, i.e. without linear-white-space. Manually adding that
> space in the message file makes the subject display as expected.
> 
> The decoding is done in the cli using g_mime_message_get_subject(). I'm
> not sure if there's much that can be done about it within notmuch.

I think asking gmime to deliberately mis-parse the subject line would
probably be a bad idea, because that would cause some strings to be
impossible to represent in the subject.

So Jani's report here sounds like a bug in bugzilla itself.  Jani, have
you reported this problem to that project?  I don't see it on their
bugtracker [0], though maybe my scan of the list wasn't as thorough as
it should be.

Regards,

	--dkg

[0]
https://bugzilla.mozilla.org/buglist.cgi?product=Bugzilla&component=Email%20Notifications&resolution=---


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1027 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23 11:39     ` David Bremner
  2013-07-26 10:16       ` Jani Nikula
@ 2013-07-29 19:46       ` Daniel Kahn Gillmor
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Kahn Gillmor @ 2013-07-29 19:46 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1759 bytes --]

On 07/23/2013 07:39 AM, David Bremner wrote:
> Franz Fellner <alpine.art.de@gmail.com> writes:
> 
>>
>> OK, thx. So every app needs to get patched to display those strings
>> properly? Any chance this could be done directly in libnotmuch?  I
>> grepped for "2047" inside te "emacs" subtree, but found nothing (had
>> the hope for a comment for the workaround). Would be interesting to
>> see how this is done, so I can at least try to create a patch (though
>> my ruby is quite basic).
> 
> In general notmuch relies on libgmime for rfc2047 parsing.  I'm not sure
> of all the details now, but some of the filtering does happen in the
> CLI, not the lib.  You could start by looking at
> gmime-filter-headers.[ch] in the top directory.

I agree this should be handled properly by gmime.  If it turns out that
the library is misbehaving (i.e. that notmuch is using it sensibly and
we're still getting bad data out of well-formed strings), it should be
reported and fixed there.

Just a note that other MUAs are struggling with this sort of thing too:

http://blog.steve.org.uk/international_character_sets_and_encodings_are_hard_.html

Steve Kemp (author of lumail) has good engineering skills and instincts;
anyone actively working on trying to get this fixed "right" within
notmuch (or underlying libraries) could probably drop him an e-mail and
collaborate.  With a decent diagnostic of the specific problems and use
cases, plus a recommendation for where the fix should be and how it
should be done, the two projects together could probably exert
sufficient influence on underlying libraries and toolchains to get them
to address any issues.

sorry to just provide links and not any actual analysis and code.

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1027 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
       [not found] <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net>
@ 2013-08-09 18:04 ` Jani Nikula
  2013-08-09 18:38   ` Jeffrey Stedfast
  0 siblings, 1 reply; 12+ messages in thread
From: Jani Nikula @ 2013-08-09 18:04 UTC (permalink / raw)
  To: stedfast, Daniel Kahn Gillmor; +Cc: Eric Abrahamsen, Notmuch Mail

On Fri, 09 Aug 2013, stedfast@comcast.net wrote:
> Hi guys, 
>
> ( I'm the author of GMime for those that don't know) 
>
> I just came across the notmuch thread (with the referenced Subject)
> but unfortunately am not subscribed to the mailing list and so am
> unable to reply to the list (hopefully no one minds me emailing them
> directly!). I wanted to reach out and offer a possible solution to the
> problem being discussed.

Thanks for your mail; hopefully you don't mind me replying to the list!

> Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init()
> *should* solve the decoding problem mentioned in the thread. This flag
> should be safe to pass into g_mime_init() without any bad side effects
> and my unit tests do test that code-path.

Many thanks, this solves my issue with the subject lines.

This is the quick patch I tried:

diff --git a/notmuch.c b/notmuch.c
index 78d29a8..7300c21 100644
--- a/notmuch.c
+++ b/notmuch.c
@@ -264,7 +264,7 @@ main (int argc, char *argv[])
 
     local = talloc_new (NULL);
 
-    g_mime_init (0);
+    g_mime_init (GMIME_ENABLE_RFC2047_WORKAROUNDS);
 #if !GLIB_CHECK_VERSION(2, 35, 1)
     g_type_init ();
 #endif

We'll need to look into using this in the lib too.

BR,
Jani.


> I took a look at gmime-filter-headers.[c,h] as well and I suspect that
> it was written back when GMime brokenly did not guarantee UTF-8
> decoded strings from functions like g_mime_message_get_subject() and
> the like. This was fixed a while back. From a quick grep of the
> ChangeLog it looks like this was probably fixed in 2.5.9 or so (but
> possibly as late as 2.6.3 as there were some other charset rfc2047
> decoder fixes around then).
>
> I know for sure that the 2.4.x series didn't guarantee UTF-8-safe
> strings, but it's been the goal of 2.6.x to make that guarantee (minus
> any bugs that may exist, but if you find any cases of that, let me
> know!)
>
> (Note: raw header values from g_mime_object_get_header() are not
> guaranteed to be UTF-8 but if you call
> g_mime_utils_header_decode_text/phrase() on them, the results are
> guaranteed to be valid UTF-8)
>
> Hope that helps, 
>
> Jeff 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-08-09 18:04 ` Jani Nikula
@ 2013-08-09 18:38   ` Jeffrey Stedfast
  0 siblings, 0 replies; 12+ messages in thread
From: Jeffrey Stedfast @ 2013-08-09 18:38 UTC (permalink / raw)
  To: Jani Nikula; +Cc: Eric Abrahamsen, Notmuch Mail, Daniel Kahn Gillmor

On 8/9/2013 2:04 PM, Jani Nikula wrote:
> On Fri, 09 Aug 2013, stedfast@comcast.net wrote:
>> Hi guys,
>>
>> ( I'm the author of GMime for those that don't know)
>>
>> I just came across the notmuch thread (with the referenced Subject)
>> but unfortunately am not subscribed to the mailing list and so am
>> unable to reply to the list (hopefully no one minds me emailing them
>> directly!). I wanted to reach out and offer a possible solution to the
>> problem being discussed.
> Thanks for your mail; hopefully you don't mind me replying to the list!

Don't mind at all!

>
>> Passing the GMIME_ENABLE_RFC2047_WORKAROUNDS flag to g_mime_init()
>> *should* solve the decoding problem mentioned in the thread. This flag
>> should be safe to pass into g_mime_init() without any bad side effects
>> and my unit tests do test that code-path.
> Many thanks, this solves my issue with the subject lines.

No problem! Glad it worked!

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-07-23  8:55 UTF-8 in mail headers (namely FROM) sent by bugzilla Franz Fellner
  2013-07-23  9:30 ` Eric Abrahamsen
@ 2013-10-05 10:43 ` Jani Nikula
  2013-10-05 13:38   ` Daniel Kahn Gillmor
  1 sibling, 1 reply; 12+ messages in thread
From: Jani Nikula @ 2013-10-05 10:43 UTC (permalink / raw)
  To: Franz Fellner, notmuch

On Tue, 23 Jul 2013, Franz Fellner <alpine.art.de@gmail.com> wrote:
> Hi,
>
> I have a problem with notmuch-vim (now: git master from 10 min. ago) (also with alot and ner, not with 'notmuch show' or notmuch-emacs). UTF-8-encoded From: (at least) does not show Umlauts but a weird encoded-string.
> Example:
> "Thomas Lübking" as one of the KWin devs comes in replies as
> =?UTF-8?Q?Thomas=20L=C3=BCbking=20?=<thomas.luebking@gmail.com>
> In private conversations with Thomas everything is fine, so it seems to depend on the way bugzilla encodes the "From:".
> Subject is absolutely fine, btw.

Please try the current master and see if it helps. Thanks.

BR,
Jani.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: UTF-8 in mail headers (namely FROM) sent by bugzilla
  2013-10-05 10:43 ` Jani Nikula
@ 2013-10-05 13:38   ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Kahn Gillmor @ 2013-10-05 13:38 UTC (permalink / raw)
  To: Jani Nikula; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On Tue, 23 Jul 2013, Franz Fellner <alpine.art.de@gmail.com> wrote:
> "Thomas Lübking" as one of the KWin devs comes in replies as
> =?UTF-8?Q?Thomas=20L=C3=BCbking=20?=<thomas.luebking@gmail.com>

if the above string is what bugzilla is emitting exactly (look at the
source of the message to be sure) then please also consider submitting a
bug against bugzilla -- they are are misinterpreting RFC 2047 -- encoded
strings are expected to be separated from non-encoded strings by linear
whitespace.  the above string has no space between "=20?=" and
"<thomas", even though it should.

Regards,

	--dkg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 1027 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-10-05 13:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-23  8:55 UTF-8 in mail headers (namely FROM) sent by bugzilla Franz Fellner
2013-07-23  9:30 ` Eric Abrahamsen
2013-07-23 10:55   ` Franz Fellner
2013-07-23 11:39     ` David Bremner
2013-07-26 10:16       ` Jani Nikula
2013-07-29 19:46         ` Daniel Kahn Gillmor
2013-07-29 19:46       ` Daniel Kahn Gillmor
2013-07-24  6:44     ` Eric Abrahamsen
2013-10-05 10:43 ` Jani Nikula
2013-10-05 13:38   ` Daniel Kahn Gillmor
     [not found] <289881190.1977918.1376058260231.JavaMail.root@sz0152a.westchester.pa.mail.comcast.net>
2013-08-09 18:04 ` Jani Nikula
2013-08-09 18:38   ` Jeffrey Stedfast

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).