From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <e@80x24.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN:  
X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from localhost (dcvr.yhbt.net [127.0.0.1])
	by dcvr.yhbt.net (Postfix) with ESMTP id A8AD91F4B5;
	Tue, 12 Nov 2019 23:10:36 +0000 (UTC)
Date: Tue, 12 Nov 2019 23:10:36 +0000
From: Eric Wong <e@80x24.org>
To: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Cc: meta@public-inbox.org, Florian Weimer <fw@deneb.enyo.de>
Subject: Re: Archiving HTML mail
Message-ID: <20191112231036.GB15037@dcvr>
References: <87r22ddxly.fsf@mid.deneb.enyo.de>
 <20191112210923.GA9729@dcvr>
 <874kz8eqwf.fsf@mid.deneb.enyo.de>
 <20191112215307.GA20307@dcvr>
 <871rucda03.fsf@mid.deneb.enyo.de>
 <20191112222932.GA9643@dcvr>
 <20191112224421.wnxdxz72xjxtvsjm@chatter.i7.local>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20191112224421.wnxdxz72xjxtvsjm@chatter.i7.local>
List-Id: <meta.public-inbox.org>

Konstantin Ryabitsev <konstantin@linuxfoundation.org> wrote:
> On Tue, Nov 12, 2019 at 10:29:32PM +0000, Eric Wong wrote:
> > > You have to rewrite the HTML parts anyway, to resolve RFC 2392 cid:
> > > links, prior to handing them to web browsers.  I don't think web
> > > browsers support them.  Neither over HTTP, nor browsing locally.
> > 
> > Yeah.  I guess it could be done on-the-fly at the WWW layer.
> > Parsing HTML is crazy expensive, though :<
> 
> Someone I spoke with in recent past lamented that there is no mechanism 
> to properly render markdown-formatted emails. I wonder if that's 
> something that can be snuck in on the public-inbox level. :) Most email 
> is already properly formatted markdown (paragraphs and blockquotes), so 
> it's not *that* crazy of an idea.
> 
> Just an off-the-cuff remark.

I don't want public-inbox to be leading the charge on that,
(especially given all the flavors of Markdown to choose from).
More MUAs (and "git <log|show>" would have to start supporting
it, first).

And I do value syntax highlighting, so I have nothing against
adding syntax highlighting support for Markdown, HTML, Perl,
Make or any attached source files the same way(*) it's currently
done for git blobs.

Perhaps the biggest problem with phishing in HTML (and AFAIK
Markdown) is being able to obscure the URL from users who don't
check URLs before following them.  e.g.:

  href="https://scam.example.com/">https://legit.example.com/</a>

Not being able to obscure URLs is big reason I favor plain-text
and MUA-level linkification.

> > Fwiw, the admins of that server do get the original HTML messages
> > in ~/.public-inbox/emergency/ (or whatever PI_EMERGENCY is).
> > 
> > emergency/ could be considered a "moderation queue" so the
> > admins could send personalized replies to legitimate senders who
> > got rejected.  Such a message could be easier-to-digest than
> > whatever postfix sends, even with the PublicInbox::Filter::Base
> > rejection message.
> 
> Now that public-inbox-mda supports list-id (THANK YOU!), my life 
> moderating PI_EMERGENCY is much easier. For lore.kernel.org, emergency 
> collects about a thousand messages a week. My Friday afternoon routine 
> is usually to fire mutt, delete spam, and re-feed the remainder to 
> public-inbox-mda with --no-precheck.

Good to know :>

Btw, "public-inbox-learn ham" could be better for your case than
"public-inbox-mda --no-precheck" in that it also trains
SpamAssassin so future messages are less likely to end up in
emergency.

(*) and supporting pygments via subprocess and/or GNU
    source-highlight in addition to the not-in-CentOS
    highlight.pm