From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 1DC241F4B5; Tue, 12 Nov 2019 21:09:24 +0000 (UTC) Date: Tue, 12 Nov 2019 21:09:23 +0000 From: Eric Wong To: Florian Weimer Cc: meta@public-inbox.org Subject: Re: Archiving HTML mail Message-ID: <20191112210923.GA9729@dcvr> References: <87r22ddxly.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87r22ddxly.fsf@mid.deneb.enyo.de> List-Id: Florian Weimer wrote: > New contributors tend to send text/html. We are currently rejecting > such email, which is proving more and more problematic. I think a > change would be easier to justify if I can show that this will not > break our mailing list archives (in the sense that they become > incomplete). We currently use mhonarc, and I don't think it copes > well with such mail. It certainly doesn't do the subdomain split. > > Is it possible to archive such mail as well, possibly under separate > subdomains to avoid XSS issues? You can use "publicinbox.$NAME.filter = PublicInbox::Filter::Mirror" in the config to blindly mirror everything, which I use for public-inbox-watch. I also added "--no-precheck" to public-inbox-mda recently which disables the last of the mda-specific checks: https://public-inbox.org/meta/20191016003956.13269-1-e@80x24.org/ text/html is currently shown inline as raw HTML since https://public-inbox.org/meta/20191031031220.21048-2-e@80x24.org/ But maybe the HTML part shouldn't be shown inline at all in multiparts parents. Optionally piping HTML to lynx(1) or similar could be considered, too (but definitely an option which is off by default) FWIW, I suggest keeping your lists text-only so contributors can flow between different projects more easily and not get blocked by spam filters. It's significantly more expensive to do spam processing on HTML mail and less accurate IME. Better to teach contributors to optimize for low-end computers and limited bandwidth situations :) Also, public-inbox-watch is designed to work in parallel with existing mailing lists. I archive several lists (including libc-alpha@sourceware and git@vger) this way with no special permissions or access aside from being a regular subscriber.