From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 51D6A1F852; Fri, 11 Feb 2022 20:22:17 +0000 (UTC) Date: Fri, 11 Feb 2022 20:22:17 +0000 From: Eric Wong To: Thomas =?utf-8?Q?Wei=C3=9Fschuh?= Cc: meta@public-inbox.org Subject: [PATCH] view: remove all CR before LF Message-ID: <20220211202217.GA19151@dcvr> References: <8d13668f-cac7-4984-bb4e-ad90502dc46d@t-8ch.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8d13668f-cac7-4984-bb4e-ad90502dc46d@t-8ch.de> List-Id: Thomas Weißschuh wrote: > Hi, > > it seems the rendering of \r\n (Windows-style) linebreaks, is a bit suboptimal > on the website. > > The \r are rendered literally. Mutt for example does not. > > Example: https://lore.kernel.org/lkml/20210914093515.260031-1-maxime@cerno.tech/ Thanks for the example. > Raw message: > ... > Content-Type: text/plain; charset="utf-8" > Content-Transfer-Encoding: quoted-printable > ... > > > Hi,=0D > =0D > .... > > Rendered: > > .... > Hi,\r > \r > ... > > > The fix is probably obvious for you, if not I can try to come up with one. Yes, except I remember adding support for CR-LF long ago... The problem here is some messages are CR-CR-LF for some odd reason. Oh well, it's a 1 character fix on our end for the HTML. Not sure if ContentHash (deduplication) and SolverGit (blob regeneration) ought to strip redundant CR, yet... -------8<------- Subject: [PATCH] view: remove all CR before LF While we've rendered CR-LF as LF-only in HTML for many years, some messages end up as CR-CR-LF. So strip ALL all CR bytes preceding LF bytes, while preserving odd CR in the middle of lines. Reported-by: Thomas Weißschuh Link: https://public-inbox.org/meta/8d13668f-cac7-4984-bb4e-ad90502dc46d@t-8ch.de/ --- lib/PublicInbox/View.pm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index 2e9cf705..ca02ae05 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -586,7 +586,7 @@ sub add_text_body { # callback for each_part # makes no difference to browsers, and don't screw up filename # link generation in diffs with the extra '%0D' - $s =~ s/\r\n/\n/sg; + $s =~ s/\r+\n/\n/sg; # will be escaped to `•' in HTML obfuscate_addrs($ibx, $s, "\x{2022}") if $ibx->{obfuscate};