* [PATCH 0/3] attempt to display text/plain with bogus charsets
@ 2016-08-18 1:39 Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
Thomas Ferris Nicolaisen reported a problem with Dscho's
Git for Windows 2.9.3 announcement not rendering text inline:
https://public-inbox.org/git/alpine.DEB.2.20.1608131214070.4924@virtualbox/
This was caused by the bogus charset=X-UNKNOWN set by Alpine.
Attempt to handle it as UTF-8, but fall back to blindly showing
it to the user. In either case, warn users about the mangled
text.
Eric Wong (3):
view: attach_link uses string concatentation
view: try to display bogus charsets for text/plain
view: try assuming UTF-8 for bogus charsets
lib/PublicInbox/View.pm | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
--
EW
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] view: attach_link uses string concatentation
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
@ 2016-08-18 1:39 ` Eric Wong
2016-08-18 1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
2016-08-18 1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
There is no point in using an array to join on an
empty string (my original intention was probably to
join on "\n").
This is only preparation for the next change to show
a warning to in the attachment link.
---
lib/PublicInbox/View.pm | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 6f79f60..3057221 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -427,10 +427,10 @@ sub attach_link ($$$$) {
} else {
$sfn = 'a.bin';
}
- my @ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
+ my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
my $ts = "Type: $ct, Size: $size bytes";
- push(@ret, ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]");
- join('', @ret, "</a>\n");
+ $ret .= ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]";
+ $ret .= "</a>\n";
}
sub add_text_body {
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] view: try to display bogus charsets for text/plain
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
@ 2016-08-18 1:39 ` Eric Wong
2016-08-18 1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text,
which causes Email::MIME::body_str to fail as X-UNKNOWN
is not a valid encoding. So, blindly display the body
as plain-text but warn users about possibly mangled text.
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
lib/PublicInbox/View.pm | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3057221..3f0e122 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -407,14 +407,17 @@ sub flush_quote {
$$s .= qq(<span\nclass="q">) . $rv . '</span>'
}
-sub attach_link ($$$$) {
- my ($upfx, $ct, $p, $fn) = @_;
+sub attach_link ($$$$;$) {
+ my ($upfx, $ct, $p, $fn, $err) = @_;
my ($part, $depth, @idx) = @$p;
my $nl = $idx[-1] > 1 ? "\n" : '';
my $idx = join('.', @idx);
my $size = bytes::length($part->body);
$ct ||= 'text/plain';
- $ct =~ s/;.*//; # no attributes
+
+ # hide attributes normally, unless we want to aid users in
+ # spotting MUA problems:
+ $ct =~ s/;.*// unless $err;
$ct = ascii_html($ct);
my $desc = $part->header('Content-Description');
$desc = $fn unless defined $desc;
@@ -427,7 +430,12 @@ sub attach_link ($$$$) {
} else {
$sfn = 'a.bin';
}
- my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
+ my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">);
+ if ($err) {
+ $ret .=
+"[-- Warning: decoded text below may be mangled --]\n";
+ }
+ $ret .= "[-- Attachment #$idx: ";
my $ts = "Type: $ct, Size: $size bytes";
$ret .= ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]";
$ret .= "</a>\n";
@@ -446,12 +454,20 @@ sub add_text_body {
my $s = eval { $part->body_str };
# badly-encoded message? tell the world about it!
- return attach_link($upfx, $ct, $p, $fn) if $@;
+ my $err = $@;
+ if ($err) {
+ if ($ct =~ m!\btext/plain\b!i) {
+ # attach_link will warn further down...
+ $s = $part->body;
+ } else {
+ return attach_link($upfx, $ct, $p, $fn);
+ }
+ }
my @lines = split(/^/m, $s);
$s = '';
- if (defined($fn) || $depth > 0) {
- $s .= attach_link($upfx, $ct, $p, $fn);
+ if (defined($fn) || $depth > 0 || $err) {
+ $s .= attach_link($upfx, $ct, $p, $fn, $err);
$s .= "\n";
}
my @quot;
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] view: try assuming UTF-8 for bogus charsets
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
2016-08-18 1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
@ 2016-08-18 1:39 ` Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.
At least this change renders
<alpine.DEB.2.20.1608131214070.4924@virtualbox>
properly by showing "•" (•) instead of
"⠢" (•)
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
lib/PublicInbox/View.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3f0e122..6997c1c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -457,8 +457,14 @@ sub add_text_body {
my $err = $@;
if ($err) {
if ($ct =~ m!\btext/plain\b!i) {
+ # Try to assume UTF-8 because Alpine seems to
+ # do wacky things and set charset=X-UNKNOWN
+ $part->charset_set('UTF-8');
+ $s = eval { $part->body_str };
+
+ # If forcing charset=UTF-8 failed,
# attach_link will warn further down...
- $s = $part->body;
+ $s = $part->body if $@;
} else {
return attach_link($upfx, $ct, $p, $fn);
}
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-08-18 1:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
2016-08-18 1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
2016-08-18 1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).