On Tue, Dec 22, 2020 at 08:37:04AM +0100, Uwe Kleine-König wrote: > I found that Konstantin Ryabitsev's tool to prepare an initial archive > from an already existing mailing list[1] filters some of these out, but > the instance on kernel.org has some of these details, too. (See for > example > https://lore.kernel.org/lkml/20201013082132.661993-1-u.kleine-koenig@pengutronix.de/raw; > there are Return-Path: and also some Received: headers that I consider > not-so-nice as they were added after the mail was processed by the > mailing list tool on vger.kernel.org.) > > Is it considerd bad to filter these out? Or is it just that nobody > wanted this kind of cleanliness before in such a setup? The reason we don't do any filtering after receiving the mail on the archiver system is two-fold: 1. we don't know if any of the Received: lines are part of any DKIM/ARC signatures (they shouldn't be -- it's wrong to include them, but I've seen this happen). 2. the goal of lore.kernel.org is maximum transparency, so we include everything that our own systems add to the headers in an attempt to show that "there's nothing up our sleeves" > I could handcraft a preprocessor[2] but I assume that a solution in > public-inbox itself would find some users?! I don't know if this should be part of public-inbox -- a simple procmail script would work. I know procmail isn't very actively developed these days, but it's also extremely robust and handles almost anything you can throw at it, which is an important advantage when it comes to a format like email. -K