* [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding [not found] <id:87k1pbg4eg.fsf@tethera.net> @ 2018-08-07 12:48 ` Sebastian Poeplau 2018-08-07 12:48 ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau 2018-08-29 9:42 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner 0 siblings, 2 replies; 4+ messages in thread From: Sebastian Poeplau @ 2018-08-07 12:48 UTC (permalink / raw) To: notmuch Messages that contain Windows-1252 are frequently mislabeled as ISO 8859-1, which may result in non-printable characters when displaying the message. The test asserts that such characters (in this case curved quotes) are displayed correctly. --- test/T300-encoding.sh | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/test/T300-encoding.sh b/test/T300-encoding.sh index 2c656a1e..4a6bfd2f 100755 --- a/test/T300-encoding.sh +++ b/test/T300-encoding.sh @@ -44,4 +44,27 @@ add_message '[subject]="=?utf-8?q?encoded?=word without=?utf-8?q?space?=" ' output=$(notmuch search id:${gen_msg_id} 2>&1 | notmuch_show_sanitize) test_expect_equal "$output" "thread:0000000000000005 2001-01-05 [1/1] Notmuch Test Suite; encodedword withoutspace (inbox unread)" +test_begin_subtest "Mislabeled Windows-1252 encoding" +test_subtest_known_broken +add_message '[content-type]="text/plain; charset=iso-8859-1"' \ + "[body]=$'This text contains \x93Windows-1252\x94 character codes.'" +cat <<EOF > EXPECTED +\fmessage{ id:XXXXX depth:0 match:1 excluded:0 filename:XXXXX +\fheader{ +Notmuch Test Suite <test_suite@notmuchmail.org> (2001-01-05) (inbox unread) +Subject: Mislabeled Windows-1252 encoding +From: Notmuch Test Suite <test_suite@notmuchmail.org> +To: Notmuch Test Suite <test_suite@notmuchmail.org> +Date: GENERATED_DATE +\fheader} +\fbody{ +\fpart{ ID: 1, Content-type: text/plain +This text contains “Windows-1252” character codes. +\fpart} +\fbody} +\fmessage} +EOF +notmuch show id:${gen_msg_id} 2>&1 | notmuch_show_sanitize_all > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.18.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] lib: detect mislabeled Windows-1252 parts 2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau @ 2018-08-07 12:48 ` Sebastian Poeplau 2018-08-07 12:52 ` Sebastian Poeplau 2018-08-29 9:42 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner 1 sibling, 1 reply; 4+ messages in thread From: Sebastian Poeplau @ 2018-08-07 12:48 UTC (permalink / raw) To: notmuch Use GMime functionality to detect mislabeled messages and apply the correct (Windows) encoding instead. --- notmuch-show.c | 30 ++++++++++++++++++++++++++++-- test/T300-encoding.sh | 1 - 2 files changed, 28 insertions(+), 3 deletions(-) diff --git a/notmuch-show.c b/notmuch-show.c index 1072ea55..c3a3783a 100644 --- a/notmuch-show.c +++ b/notmuch-show.c @@ -272,6 +272,7 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out, GMimeContentType *content_type = g_mime_object_get_content_type (GMIME_OBJECT (part)); GMimeStream *stream_filter = NULL; GMimeFilter *crlf_filter = NULL; + GMimeFilter *windows_filter = NULL; GMimeDataWrapper *wrapper; const char *charset; @@ -282,13 +283,37 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out, if (stream_out == NULL) return; + charset = g_mime_object_get_content_type_parameter (part, "charset"); + charset = charset ? g_mime_charset_canon_name (charset) : NULL; + wrapper = g_mime_part_get_content_object (GMIME_PART (part)); + if (wrapper && charset && !g_ascii_strncasecmp (charset, "iso-8859-", 9)) { + GMimeStream *null_stream = NULL; + GMimeStream *null_stream_filter = NULL; + + /* Check for mislabeled Windows encoding */ + null_stream = g_mime_stream_null_new (); + null_stream_filter = g_mime_stream_filter_new (null_stream); + windows_filter = g_mime_filter_windows_new (charset); + g_mime_stream_filter_add(GMIME_STREAM_FILTER (null_stream_filter), + windows_filter); + g_mime_data_wrapper_write_to_stream (wrapper, null_stream_filter); + charset = g_mime_filter_windows_real_charset( + (GMimeFilterWindows *) windows_filter); + + if (null_stream_filter) + g_object_unref (null_stream_filter); + if (null_stream) + g_object_unref (null_stream); + /* Keep a reference to windows_filter in order to prevent the + * charset string from deallocation. */ + } + stream_filter = g_mime_stream_filter_new (stream_out); crlf_filter = g_mime_filter_crlf_new (false, false); g_mime_stream_filter_add(GMIME_STREAM_FILTER (stream_filter), crlf_filter); g_object_unref (crlf_filter); - charset = g_mime_object_get_content_type_parameter (part, "charset"); if (charset) { GMimeFilter *charset_filter; charset_filter = g_mime_filter_charset_new (charset, "UTF-8"); @@ -313,11 +338,12 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out, } } - wrapper = g_mime_part_get_content_object (GMIME_PART (part)); if (wrapper && stream_filter) g_mime_data_wrapper_write_to_stream (wrapper, stream_filter); if (stream_filter) g_object_unref(stream_filter); + if (windows_filter) + g_object_unref (windows_filter); } static const char* diff --git a/test/T300-encoding.sh b/test/T300-encoding.sh index 4a6bfd2f..1e9d2a3d 100755 --- a/test/T300-encoding.sh +++ b/test/T300-encoding.sh @@ -45,7 +45,6 @@ output=$(notmuch search id:${gen_msg_id} 2>&1 | notmuch_show_sanitize) test_expect_equal "$output" "thread:0000000000000005 2001-01-05 [1/1] Notmuch Test Suite; encodedword withoutspace (inbox unread)" test_begin_subtest "Mislabeled Windows-1252 encoding" -test_subtest_known_broken add_message '[content-type]="text/plain; charset=iso-8859-1"' \ "[body]=$'This text contains \x93Windows-1252\x94 character codes.'" cat <<EOF > EXPECTED -- 2.18.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] lib: detect mislabeled Windows-1252 parts 2018-08-07 12:48 ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau @ 2018-08-07 12:52 ` Sebastian Poeplau 0 siblings, 0 replies; 4+ messages in thread From: Sebastian Poeplau @ 2018-08-07 12:52 UTC (permalink / raw) To: notmuch Sorry, I failed to paste the message ID for the In-Reply-To header correctly :( This patch series is in reply to id:87k1pbg4eg.fsf@tethera.net. Cheers, Sebastian ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding 2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau 2018-08-07 12:48 ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau @ 2018-08-29 9:42 ` David Bremner 1 sibling, 0 replies; 4+ messages in thread From: David Bremner @ 2018-08-29 9:42 UTC (permalink / raw) To: Sebastian Poeplau, notmuch Sebastian Poeplau <sebastian.poeplau@eurecom.fr> writes: > Messages that contain Windows-1252 are frequently mislabeled as ISO > 8859-1, which may result in non-printable characters when displaying > the message. The test asserts that such characters (in this case > curved quotes) are displayed correctly. Series pushed to master d ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-08-29 9:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <id:87k1pbg4eg.fsf@tethera.net> 2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau 2018-08-07 12:48 ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau 2018-08-07 12:52 ` Sebastian Poeplau 2018-08-29 9:42 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).