From: Sebastian Poeplau <sebastian.poeplau@eurecom.fr>
To: Jeffrey Stedfast <jestedfa@microsoft.com>,
"notmuch\@notmuchmail.org" <notmuch@notmuchmail.org>
Subject: Re: Handling mislabeled emails encoded with Windows-1252
Date: Mon, 30 Jul 2018 09:47:55 +0200 [thread overview]
Message-ID: <87tvohxiz8.fsf@eurecom.fr> (raw)
In-Reply-To: <87wotdxjuu.fsf@eurecom.fr>
[-- Attachment #1: Type: text/plain, Size: 692 bytes --]
Hi,
>> As an added optimization, you could try limiting that block of code to
>> just when the charset is one of the iso-8859-* charsets.
>>
>> The following code snippet should help with that:
>>
>> charset = charset ? g_mime_charset_canon_name (charset) : NULL;
>> if (wrapper && charset && g_ascii_strncasecmp (charset, "iso-8859-", 9)) {
>> ...
>>
>> The reason you need to use g_mime_charset_canon_name (if you decide to
>> add the optimization) is that mail software does not always use the
>> canonical form of the various charset names that they use. Often you
>> will get stuff like "latin1" or "iso_8859-1".
>
> Nice, I'll add it.
Updated patch attached.
Cheers,
Sebastian
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: fix_windows_charsets.patch --]
[-- Type: text/x-patch, Size: 2426 bytes --]
diff -ura notmuch-0.27/notmuch-show.c notmuch-0.27-patched/notmuch-show.c
--- notmuch-0.27/notmuch-show.c 2018-06-13 03:42:34.000000000 +0200
+++ notmuch-0.27-patched/notmuch-show.c 2018-07-30 09:41:05.491636418 +0200
@@ -272,6 +272,7 @@
GMimeContentType *content_type = g_mime_object_get_content_type (GMIME_OBJECT (part));
GMimeStream *stream_filter = NULL;
GMimeFilter *crlf_filter = NULL;
+ GMimeFilter *windows_filter = NULL;
GMimeDataWrapper *wrapper;
const char *charset;
@@ -282,13 +283,37 @@
if (stream_out == NULL)
return;
+ charset = g_mime_object_get_content_type_parameter (part, "charset");
+ charset = charset ? g_mime_charset_canon_name (charset) : NULL;
+ wrapper = g_mime_part_get_content_object (GMIME_PART (part));
+ if (wrapper && charset && !g_ascii_strncasecmp (charset, "iso-8859-", 9)) {
+ GMimeStream *null_stream = NULL;
+ GMimeStream *null_stream_filter = NULL;
+
+ /* Check for mislabeled Windows encoding */
+ null_stream = g_mime_stream_null_new ();
+ null_stream_filter = g_mime_stream_filter_new (null_stream);
+ windows_filter = g_mime_filter_windows_new (charset);
+ g_mime_stream_filter_add(GMIME_STREAM_FILTER (null_stream_filter),
+ windows_filter);
+ g_mime_data_wrapper_write_to_stream (wrapper, null_stream_filter);
+ charset = g_mime_filter_windows_real_charset(
+ (GMimeFilterWindows *) windows_filter);
+
+ if (null_stream_filter)
+ g_object_unref (null_stream_filter);
+ if (null_stream)
+ g_object_unref (null_stream);
+ /* Keep a reference to windows_filter in order to prevent the
+ * charset string from deallocation. */
+ }
+
stream_filter = g_mime_stream_filter_new (stream_out);
crlf_filter = g_mime_filter_crlf_new (false, false);
g_mime_stream_filter_add(GMIME_STREAM_FILTER (stream_filter),
crlf_filter);
g_object_unref (crlf_filter);
- charset = g_mime_object_get_content_type_parameter (part, "charset");
if (charset) {
GMimeFilter *charset_filter;
charset_filter = g_mime_filter_charset_new (charset, "UTF-8");
@@ -313,11 +338,12 @@
}
}
- wrapper = g_mime_part_get_content_object (GMIME_PART (part));
if (wrapper && stream_filter)
g_mime_data_wrapper_write_to_stream (wrapper, stream_filter);
if (stream_filter)
g_object_unref(stream_filter);
+ if (windows_filter)
+ g_object_unref (windows_filter);
}
static const char*
next prev parent reply other threads:[~2018-07-30 7:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-14 12:40 Handling mislabeled emails encoded with Windows-1252 Sebastian Poeplau
2018-07-24 1:49 ` David Bremner
2018-07-24 8:00 ` Sebastian Poeplau
2018-07-24 13:55 ` Sebastian Poeplau
2018-07-24 14:09 ` Jeffrey Stedfast
2018-07-24 14:19 ` Sebastian Poeplau
2018-07-28 11:22 ` Sebastian Poeplau
2018-07-28 12:25 ` Jeffrey Stedfast
2018-07-30 7:28 ` Sebastian Poeplau
2018-07-30 7:47 ` Sebastian Poeplau [this message]
2018-07-31 9:07 ` David Bremner
2018-07-31 9:49 ` Sebastian Poeplau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tvohxiz8.fsf@eurecom.fr \
--to=sebastian.poeplau@eurecom.fr \
--cc=jestedfa@microsoft.com \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).