From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 2ECC51F934 for ; Wed, 6 Oct 2021 11:19:36 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] msg_iter: split_quotes adds trailing "\n" Date: Wed, 6 Oct 2021 11:19:36 +0000 Message-Id: <20211006111936.11670-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: The regexp in split_quotes relies on the presence of a final "\n", so add it wherever we need to instead of making it the responsibility of every caller. This probably doesn't matter in practice since every email seems to have a "\n" as the final byte (due to the way SMTP works), but maybe there's some odd ones that'll get imported via lei. --- lib/PublicInbox/LeiViewText.pm | 1 - lib/PublicInbox/MsgIter.pm | 6 +++++- lib/PublicInbox/View.pm | 3 --- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/PublicInbox/LeiViewText.pm b/lib/PublicInbox/LeiViewText.pm index 1f002ccd..c469d1ea 100644 --- a/lib/PublicInbox/LeiViewText.pm +++ b/lib/PublicInbox/LeiViewText.pm @@ -245,7 +245,6 @@ sub add_text_buf { # callback for Eml->each_part hdr_buf($self, $part) if $part->{is_submsg}; $s =~ s/\r\n/\n/sg; _xs($s); - $s .= "\n" unless substr($s, -1, 1) eq "\n"; my $diff = ($s =~ /^--- [^\n]+\n\+{3} [^\n]+\n@@ /ms); my @sections = PublicInbox::MsgIter::split_quotes($s); undef $s; # free memory diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm index 9c6581cc..dd28417b 100644 --- a/lib/PublicInbox/MsgIter.pm +++ b/lib/PublicInbox/MsgIter.pm @@ -98,12 +98,16 @@ sub msg_part_text ($$) { # returns an array of quoted or unquoted sections sub split_quotes { + # some editors don't put trailing newlines at the end, + # make sure split_quotes can work: + $_[0] .= "\n" if substr($_[0], -1) ne "\n"; + # Quiet "Complex regular subexpression recursion limit" warning # in case an inconsiderate sender quotes 32K of text at once. # The warning from Perl is harmless for us since our callers can # tolerate less-than-ideal matches which work within Perl limits. no warnings 'regexp'; - split(/((?:^>[^\n]*\n)+)/sm, shift); + split(/((?:^>[^\n]*\n)+)/sm, $_[0]); } 1; diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 069b9680..64e73234 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -624,9 +624,6 @@ sub add_text_body { # callback for each_part $ctx->{-spfx} = $spfx; }; - # some editors don't put trailing newlines at the end: - $s .= "\n" unless $s =~ /\n\z/s; - # split off quoted and unquoted blocks: my @sections = PublicInbox::MsgIter::split_quotes($s); undef $s; # free memory