From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 5DF731F9FC for ; Fri, 12 Feb 2021 07:05:52 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 1/3] filter/vger: kill trailing newlines aggressively Date: Fri, 12 Feb 2021 00:05:50 -0700 Message-Id: <20210212070552.13901-2-e@80x24.org> In-Reply-To: <20210212070552.13901-1-e@80x24.org> References: <20210212070552.13901-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last trailing newline, not every single trailing newline like InboxWritable->import_mbox does. Testing PublicInbox::MboxReader->mboxrd (next commit) with scripts/import_vger_from_mbox on the LKML archive I got 2018 for v2 development; this difference was responsible for a single spam message(*) from out of 2722831 not being filtered correctly and returning a different result. (*) dated 2014-08-25 --- lib/PublicInbox/Filter/Vger.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/Filter/Vger.pm b/lib/PublicInbox/Filter/Vger.pm index 0b1f5dd3..5b3c0277 100644 --- a/lib/PublicInbox/Filter/Vger.pm +++ b/lib/PublicInbox/Filter/Vger.pm @@ -24,7 +24,7 @@ sub scrub { # the vger appender seems to only work on the raw string, # so in multipart (e.g. GPG-signed) messages, the list trailer # becomes invisible to MIME-aware email clients. - if ($s =~ s/$l0\n$l1\n$l2\n$l3\n($l4\n)?\z//os) { + if ($s =~ s/$l0\n$l1\n$l2\n$l3\n(?:$l4\n)?\n*\z//os) { $mime = PublicInbox::Eml->new(\$s); } $self->ACCEPT($mime);