From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A8AD91F4B5; Tue, 12 Nov 2019 23:10:36 +0000 (UTC) Date: Tue, 12 Nov 2019 23:10:36 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org, Florian Weimer Subject: Re: Archiving HTML mail Message-ID: <20191112231036.GB15037@dcvr> References: <87r22ddxly.fsf@mid.deneb.enyo.de> <20191112210923.GA9729@dcvr> <874kz8eqwf.fsf@mid.deneb.enyo.de> <20191112215307.GA20307@dcvr> <871rucda03.fsf@mid.deneb.enyo.de> <20191112222932.GA9643@dcvr> <20191112224421.wnxdxz72xjxtvsjm@chatter.i7.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191112224421.wnxdxz72xjxtvsjm@chatter.i7.local> List-Id: Konstantin Ryabitsev wrote: > On Tue, Nov 12, 2019 at 10:29:32PM +0000, Eric Wong wrote: > > > You have to rewrite the HTML parts anyway, to resolve RFC 2392 cid: > > > links, prior to handing them to web browsers. I don't think web > > > browsers support them. Neither over HTTP, nor browsing locally. > > > > Yeah. I guess it could be done on-the-fly at the WWW layer. > > Parsing HTML is crazy expensive, though :< > > Someone I spoke with in recent past lamented that there is no mechanism > to properly render markdown-formatted emails. I wonder if that's > something that can be snuck in on the public-inbox level. :) Most email > is already properly formatted markdown (paragraphs and blockquotes), so > it's not *that* crazy of an idea. > > Just an off-the-cuff remark. I don't want public-inbox to be leading the charge on that, (especially given all the flavors of Markdown to choose from). More MUAs (and "git " would have to start supporting it, first). And I do value syntax highlighting, so I have nothing against adding syntax highlighting support for Markdown, HTML, Perl, Make or any attached source files the same way(*) it's currently done for git blobs. Perhaps the biggest problem with phishing in HTML (and AFAIK Markdown) is being able to obscure the URL from users who don't check URLs before following them. e.g.: href="https://scam.example.com/">https://legit.example.com/ Not being able to obscure URLs is big reason I favor plain-text and MUA-level linkification. > > Fwiw, the admins of that server do get the original HTML messages > > in ~/.public-inbox/emergency/ (or whatever PI_EMERGENCY is). > > > > emergency/ could be considered a "moderation queue" so the > > admins could send personalized replies to legitimate senders who > > got rejected. Such a message could be easier-to-digest than > > whatever postfix sends, even with the PublicInbox::Filter::Base > > rejection message. > > Now that public-inbox-mda supports list-id (THANK YOU!), my life > moderating PI_EMERGENCY is much easier. For lore.kernel.org, emergency > collects about a thousand messages a week. My Friday afternoon routine > is usually to fire mutt, delete spam, and re-feed the remainder to > public-inbox-mda with --no-precheck. Good to know :> Btw, "public-inbox-learn ham" could be better for your case than "public-inbox-mda --no-precheck" in that it also trains SpamAssassin so future messages are less likely to end up in emergency. (*) and supporting pygments via subprocess and/or GNU source-highlight in addition to the not-in-CentOS highlight.pm