* [PATCH 2/3] view: try to display bogus charsets for text/plain
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
@ 2016-08-18 1:39 ` Eric Wong
2016-08-18 1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text,
which causes Email::MIME::body_str to fail as X-UNKNOWN
is not a valid encoding. So, blindly display the body
as plain-text but warn users about possibly mangled text.
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
lib/PublicInbox/View.pm | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3057221..3f0e122 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -407,14 +407,17 @@ sub flush_quote {
$$s .= qq(<span\nclass="q">) . $rv . '</span>'
}
-sub attach_link ($$$$) {
- my ($upfx, $ct, $p, $fn) = @_;
+sub attach_link ($$$$;$) {
+ my ($upfx, $ct, $p, $fn, $err) = @_;
my ($part, $depth, @idx) = @$p;
my $nl = $idx[-1] > 1 ? "\n" : '';
my $idx = join('.', @idx);
my $size = bytes::length($part->body);
$ct ||= 'text/plain';
- $ct =~ s/;.*//; # no attributes
+
+ # hide attributes normally, unless we want to aid users in
+ # spotting MUA problems:
+ $ct =~ s/;.*// unless $err;
$ct = ascii_html($ct);
my $desc = $part->header('Content-Description');
$desc = $fn unless defined $desc;
@@ -427,7 +430,12 @@ sub attach_link ($$$$) {
} else {
$sfn = 'a.bin';
}
- my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
+ my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">);
+ if ($err) {
+ $ret .=
+"[-- Warning: decoded text below may be mangled --]\n";
+ }
+ $ret .= "[-- Attachment #$idx: ";
my $ts = "Type: $ct, Size: $size bytes";
$ret .= ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]";
$ret .= "</a>\n";
@@ -446,12 +454,20 @@ sub add_text_body {
my $s = eval { $part->body_str };
# badly-encoded message? tell the world about it!
- return attach_link($upfx, $ct, $p, $fn) if $@;
+ my $err = $@;
+ if ($err) {
+ if ($ct =~ m!\btext/plain\b!i) {
+ # attach_link will warn further down...
+ $s = $part->body;
+ } else {
+ return attach_link($upfx, $ct, $p, $fn);
+ }
+ }
my @lines = split(/^/m, $s);
$s = '';
- if (defined($fn) || $depth > 0) {
- $s .= attach_link($upfx, $ct, $p, $fn);
+ if (defined($fn) || $depth > 0 || $err) {
+ $s .= attach_link($upfx, $ct, $p, $fn, $err);
$s .= "\n";
}
my @quot;
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] view: try assuming UTF-8 for bogus charsets
2016-08-18 1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
2016-08-18 1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
@ 2016-08-18 1:39 ` Eric Wong
2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.
At least this change renders
<alpine.DEB.2.20.1608131214070.4924@virtualbox>
properly by showing "•" (•) instead of
"⠢" (•)
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
lib/PublicInbox/View.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3f0e122..6997c1c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -457,8 +457,14 @@ sub add_text_body {
my $err = $@;
if ($err) {
if ($ct =~ m!\btext/plain\b!i) {
+ # Try to assume UTF-8 because Alpine seems to
+ # do wacky things and set charset=X-UNKNOWN
+ $part->charset_set('UTF-8');
+ $s = eval { $part->body_str };
+
+ # If forcing charset=UTF-8 failed,
# attach_link will warn further down...
- $s = $part->body;
+ $s = $part->body if $@;
} else {
return attach_link($upfx, $ct, $p, $fn);
}
--
EW
^ permalink raw reply related [flat|nested] 4+ messages in thread