unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Sebastian Poeplau <sebastian.poeplau@eurecom.fr>
To: Jeffrey Stedfast <jestedfa@microsoft.com>,
	"notmuch\@notmuchmail.org" <notmuch@notmuchmail.org>
Subject: Re: Handling mislabeled emails encoded with Windows-1252
Date: Mon, 30 Jul 2018 09:47:55 +0200	[thread overview]
Message-ID: <87tvohxiz8.fsf@eurecom.fr> (raw)
In-Reply-To: <87wotdxjuu.fsf@eurecom.fr>

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

Hi,

>> As an added optimization, you could try limiting that block of code to
>> just when the charset is one of the iso-8859-* charsets.
>>
>> The following code snippet should help with that:
>>
>> charset = charset ? g_mime_charset_canon_name (charset) : NULL;
>> if (wrapper && charset && g_ascii_strncasecmp (charset, "iso-8859-", 9)) {
>>     ...
>>
>> The reason you need to use g_mime_charset_canon_name (if you decide to
>> add the optimization) is that mail software does not always use the
>> canonical form of the various charset names that they use. Often you
>> will get stuff like "latin1" or "iso_8859-1".
>
> Nice, I'll add it.

Updated patch attached.

Cheers,
Sebastian



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix_windows_charsets.patch --]
[-- Type: text/x-patch, Size: 2426 bytes --]

diff -ura notmuch-0.27/notmuch-show.c notmuch-0.27-patched/notmuch-show.c
--- notmuch-0.27/notmuch-show.c	2018-06-13 03:42:34.000000000 +0200
+++ notmuch-0.27-patched/notmuch-show.c	2018-07-30 09:41:05.491636418 +0200
@@ -272,6 +272,7 @@
     GMimeContentType *content_type = g_mime_object_get_content_type (GMIME_OBJECT (part));
     GMimeStream *stream_filter = NULL;
     GMimeFilter *crlf_filter = NULL;
+    GMimeFilter *windows_filter = NULL;
     GMimeDataWrapper *wrapper;
     const char *charset;
 
@@ -282,13 +283,37 @@
     if (stream_out == NULL)
 	return;
 
+    charset = g_mime_object_get_content_type_parameter (part, "charset");
+    charset = charset ? g_mime_charset_canon_name (charset) : NULL;
+    wrapper = g_mime_part_get_content_object (GMIME_PART (part));
+    if (wrapper && charset && !g_ascii_strncasecmp (charset, "iso-8859-", 9)) {
+	GMimeStream *null_stream = NULL;
+	GMimeStream *null_stream_filter = NULL;
+
+	/* Check for mislabeled Windows encoding */
+	null_stream = g_mime_stream_null_new ();
+	null_stream_filter = g_mime_stream_filter_new (null_stream);
+	windows_filter = g_mime_filter_windows_new (charset);
+	g_mime_stream_filter_add(GMIME_STREAM_FILTER (null_stream_filter),
+				 windows_filter);
+	g_mime_data_wrapper_write_to_stream (wrapper, null_stream_filter);
+	charset = g_mime_filter_windows_real_charset(
+	    (GMimeFilterWindows *) windows_filter);
+
+	if (null_stream_filter)
+	    g_object_unref (null_stream_filter);
+	if (null_stream)
+	    g_object_unref (null_stream);
+	/* Keep a reference to windows_filter in order to prevent the
+	 * charset string from deallocation. */
+    }
+
     stream_filter = g_mime_stream_filter_new (stream_out);
     crlf_filter = g_mime_filter_crlf_new (false, false);
     g_mime_stream_filter_add(GMIME_STREAM_FILTER (stream_filter),
 			     crlf_filter);
     g_object_unref (crlf_filter);
 
-    charset = g_mime_object_get_content_type_parameter (part, "charset");
     if (charset) {
 	GMimeFilter *charset_filter;
 	charset_filter = g_mime_filter_charset_new (charset, "UTF-8");
@@ -313,11 +338,12 @@
 	}
     }
 
-    wrapper = g_mime_part_get_content_object (GMIME_PART (part));
     if (wrapper && stream_filter)
 	g_mime_data_wrapper_write_to_stream (wrapper, stream_filter);
     if (stream_filter)
 	g_object_unref(stream_filter);
+    if (windows_filter)
+	g_object_unref (windows_filter);
 }
 
 static const char*

  reply	other threads:[~2018-07-30  7:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-14 12:40 Handling mislabeled emails encoded with Windows-1252 Sebastian Poeplau
2018-07-24  1:49 ` David Bremner
2018-07-24  8:00   ` Sebastian Poeplau
2018-07-24 13:55     ` Sebastian Poeplau
2018-07-24 14:09   ` Jeffrey Stedfast
2018-07-24 14:19     ` Sebastian Poeplau
2018-07-28 11:22       ` Sebastian Poeplau
2018-07-28 12:25         ` Jeffrey Stedfast
2018-07-30  7:28           ` Sebastian Poeplau
2018-07-30  7:47             ` Sebastian Poeplau [this message]
2018-07-31  9:07               ` David Bremner
2018-07-31  9:49                 ` Sebastian Poeplau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tvohxiz8.fsf@eurecom.fr \
    --to=sebastian.poeplau@eurecom.fr \
    --cc=jestedfa@microsoft.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).