unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding
       [not found] <id:87k1pbg4eg.fsf@tethera.net>
@ 2018-08-07 12:48 ` Sebastian Poeplau
  2018-08-07 12:48   ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau
  2018-08-29  9:42   ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner
  0 siblings, 2 replies; 4+ messages in thread
From: Sebastian Poeplau @ 2018-08-07 12:48 UTC (permalink / raw)
  To: notmuch

Messages that contain Windows-1252 are frequently mislabeled as ISO
8859-1, which may result in non-printable characters when displaying
the message. The test asserts that such characters (in this case
curved quotes) are displayed correctly.
---
 test/T300-encoding.sh | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/test/T300-encoding.sh b/test/T300-encoding.sh
index 2c656a1e..4a6bfd2f 100755
--- a/test/T300-encoding.sh
+++ b/test/T300-encoding.sh
@@ -44,4 +44,27 @@ add_message '[subject]="=?utf-8?q?encoded?=word without=?utf-8?q?space?=" '
 output=$(notmuch search id:${gen_msg_id} 2>&1 | notmuch_show_sanitize)
 test_expect_equal "$output" "thread:0000000000000005   2001-01-05 [1/1] Notmuch Test Suite; encodedword withoutspace (inbox unread)"
 
+test_begin_subtest "Mislabeled Windows-1252 encoding"
+test_subtest_known_broken
+add_message '[content-type]="text/plain; charset=iso-8859-1"'                           \
+            "[body]=$'This text contains \x93Windows-1252\x94 character codes.'"
+cat <<EOF > EXPECTED
+\fmessage{ id:XXXXX depth:0 match:1 excluded:0 filename:XXXXX
+\fheader{
+Notmuch Test Suite <test_suite@notmuchmail.org> (2001-01-05) (inbox unread)
+Subject: Mislabeled Windows-1252 encoding
+From: Notmuch Test Suite <test_suite@notmuchmail.org>
+To: Notmuch Test Suite <test_suite@notmuchmail.org>
+Date: GENERATED_DATE
+\fheader}
+\fbody{
+\fpart{ ID: 1, Content-type: text/plain
+This text contains “Windows-1252” character codes.
+\fpart}
+\fbody}
+\fmessage}
+EOF
+notmuch show id:${gen_msg_id} 2>&1 | notmuch_show_sanitize_all > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] lib: detect mislabeled Windows-1252 parts
  2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau
@ 2018-08-07 12:48   ` Sebastian Poeplau
  2018-08-07 12:52     ` Sebastian Poeplau
  2018-08-29  9:42   ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner
  1 sibling, 1 reply; 4+ messages in thread
From: Sebastian Poeplau @ 2018-08-07 12:48 UTC (permalink / raw)
  To: notmuch

Use GMime functionality to detect mislabeled messages and apply the
correct (Windows) encoding instead.
---
 notmuch-show.c        | 30 ++++++++++++++++++++++++++++--
 test/T300-encoding.sh |  1 -
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/notmuch-show.c b/notmuch-show.c
index 1072ea55..c3a3783a 100644
--- a/notmuch-show.c
+++ b/notmuch-show.c
@@ -272,6 +272,7 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out,
     GMimeContentType *content_type = g_mime_object_get_content_type (GMIME_OBJECT (part));
     GMimeStream *stream_filter = NULL;
     GMimeFilter *crlf_filter = NULL;
+    GMimeFilter *windows_filter = NULL;
     GMimeDataWrapper *wrapper;
     const char *charset;
 
@@ -282,13 +283,37 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out,
     if (stream_out == NULL)
 	return;
 
+    charset = g_mime_object_get_content_type_parameter (part, "charset");
+    charset = charset ? g_mime_charset_canon_name (charset) : NULL;
+    wrapper = g_mime_part_get_content_object (GMIME_PART (part));
+    if (wrapper && charset && !g_ascii_strncasecmp (charset, "iso-8859-", 9)) {
+	GMimeStream *null_stream = NULL;
+	GMimeStream *null_stream_filter = NULL;
+
+	/* Check for mislabeled Windows encoding */
+	null_stream = g_mime_stream_null_new ();
+	null_stream_filter = g_mime_stream_filter_new (null_stream);
+	windows_filter = g_mime_filter_windows_new (charset);
+	g_mime_stream_filter_add(GMIME_STREAM_FILTER (null_stream_filter),
+				 windows_filter);
+	g_mime_data_wrapper_write_to_stream (wrapper, null_stream_filter);
+	charset = g_mime_filter_windows_real_charset(
+	    (GMimeFilterWindows *) windows_filter);
+
+	if (null_stream_filter)
+	    g_object_unref (null_stream_filter);
+	if (null_stream)
+	    g_object_unref (null_stream);
+	/* Keep a reference to windows_filter in order to prevent the
+	 * charset string from deallocation. */
+    }
+
     stream_filter = g_mime_stream_filter_new (stream_out);
     crlf_filter = g_mime_filter_crlf_new (false, false);
     g_mime_stream_filter_add(GMIME_STREAM_FILTER (stream_filter),
 			     crlf_filter);
     g_object_unref (crlf_filter);
 
-    charset = g_mime_object_get_content_type_parameter (part, "charset");
     if (charset) {
 	GMimeFilter *charset_filter;
 	charset_filter = g_mime_filter_charset_new (charset, "UTF-8");
@@ -313,11 +338,12 @@ show_text_part_content (GMimeObject *part, GMimeStream *stream_out,
 	}
     }
 
-    wrapper = g_mime_part_get_content_object (GMIME_PART (part));
     if (wrapper && stream_filter)
 	g_mime_data_wrapper_write_to_stream (wrapper, stream_filter);
     if (stream_filter)
 	g_object_unref(stream_filter);
+    if (windows_filter)
+	g_object_unref (windows_filter);
 }
 
 static const char*
diff --git a/test/T300-encoding.sh b/test/T300-encoding.sh
index 4a6bfd2f..1e9d2a3d 100755
--- a/test/T300-encoding.sh
+++ b/test/T300-encoding.sh
@@ -45,7 +45,6 @@ output=$(notmuch search id:${gen_msg_id} 2>&1 | notmuch_show_sanitize)
 test_expect_equal "$output" "thread:0000000000000005   2001-01-05 [1/1] Notmuch Test Suite; encodedword withoutspace (inbox unread)"
 
 test_begin_subtest "Mislabeled Windows-1252 encoding"
-test_subtest_known_broken
 add_message '[content-type]="text/plain; charset=iso-8859-1"'                           \
             "[body]=$'This text contains \x93Windows-1252\x94 character codes.'"
 cat <<EOF > EXPECTED
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] lib: detect mislabeled Windows-1252 parts
  2018-08-07 12:48   ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau
@ 2018-08-07 12:52     ` Sebastian Poeplau
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Poeplau @ 2018-08-07 12:52 UTC (permalink / raw)
  To: notmuch

Sorry, I failed to paste the message ID for the In-Reply-To header
correctly :( This patch series is in reply to
id:87k1pbg4eg.fsf@tethera.net.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding
  2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau
  2018-08-07 12:48   ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau
@ 2018-08-29  9:42   ` David Bremner
  1 sibling, 0 replies; 4+ messages in thread
From: David Bremner @ 2018-08-29  9:42 UTC (permalink / raw)
  To: Sebastian Poeplau, notmuch

Sebastian Poeplau <sebastian.poeplau@eurecom.fr> writes:

> Messages that contain Windows-1252 are frequently mislabeled as ISO
> 8859-1, which may result in non-printable characters when displaying
> the message. The test asserts that such characters (in this case
> curved quotes) are displayed correctly.

Series pushed to master

d

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-29  9:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <id:87k1pbg4eg.fsf@tethera.net>
2018-08-07 12:48 ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding Sebastian Poeplau
2018-08-07 12:48   ` [PATCH 2/2] lib: detect mislabeled Windows-1252 parts Sebastian Poeplau
2018-08-07 12:52     ` Sebastian Poeplau
2018-08-29  9:42   ` [PATCH 1/2] test: add known broken test for mislabeled Windows-1252 encoding David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).