unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: meta@public-inbox.org
Subject: Re: About header filtering
Date: Tue, 22 Dec 2020 23:11:26 +0000	[thread overview]
Message-ID: <20201222231126.GA14850@dcvr> (raw)
In-Reply-To: <20201222222118.i4bioeo7l6iuf3pk@pengutronix.de>

Uwe Kleine-König <u.kleine-koenig@pengutronix.de> wrote:
> Hello Konstantin,
> 
> On Tue, Dec 22, 2020 at 11:28:28AM -0500, Konstantin Ryabitsev wrote:
> > On Tue, Dec 22, 2020 at 08:37:04AM +0100, Uwe Kleine-König wrote:
> > > I found that Konstantin Ryabitsev's tool to prepare an initial archive
> > > from an already existing mailing list[1] filters some of these out, but
> > > the instance on kernel.org has some of these details, too. (See for
> > > example
> > > https://lore.kernel.org/lkml/20201013082132.661993-1-u.kleine-koenig@pengutronix.de/raw;
> > > there are Return-Path: and also some Received: headers that I consider
> > > not-so-nice as they were added after the mail was processed by the
> > > mailing list tool on vger.kernel.org.)
> > > 
> > > Is it considerd bad to filter these out? Or is it just that nobody
> > > wanted this kind of cleanliness before in such a setup?
> > 
> > The reason we don't do any filtering after receiving the mail on the archiver
> > system is two-fold:
> > 
> > 1. we don't know if any of the Received: lines are part of any DKIM/ARC
> >    signatures (they shouldn't be -- it's wrong to include them, but I've seen
> >    this happen).
> 
> Note I don't intend to throw away all Received lines, only the ones
> concerning the hops after the mailing list server. These cannot be
> signed using DKIM unless the mailing list subscription goes to an
> address that is forwarded and the forwarding server signs the Received
> lines.

Fwiw, you should be able to use either Email::MIME or
PublicInbox::Eml to shift off the latest (topmost) Received
header:

----8<----
#!/usr/bin/perl -w
use strict;
use PublicInbox::Eml;
my $eml = PublicInbox::Eml->new(do { local $/; <STDIN> });
my @rcvd = $eml->header_raw('Received'); # array context for all instances
shift @rcvd; # remove topmost
$eml->header_set('Received', @rcvd); # set to keep remaining
print $eml->as_string;
----8<----

s/PublicInbox::Eml/Email::MIME/ works, too, but PublicInbox::Eml
won't endlessly recurse multipart mails like Email::MIME does.
Otherwise the header_raw, header_set, as_string APIs should
behave the same.

  reply	other threads:[~2020-12-22 23:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-22  7:37 About header filtering Uwe Kleine-König
2020-12-22 16:28 ` Konstantin Ryabitsev
2020-12-22 22:21   ` Uwe Kleine-König
2020-12-22 23:11     ` Eric Wong [this message]
2020-12-23 17:57     ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201222231126.GA14850@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).