From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.2 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 8B5F71F548; Fri, 6 Sep 2024 23:31:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=80x24.org; s=selector1; t=1725665519; bh=zAGbCztMZMTQzwB23F6Im+lIEKVRe08ZdbcEt0TpF74=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=QPBbCr3f4tKrE3hqH9EyVNtcfQm4U6FJTofMsxSVsBOrngaXlGoRpLhbZaStRAgnf WDKXIf7lfLWbk9GSyDsl/Famiiw2DKX6xpzKVrrIrzxXvOi6wyGn04kQKZfasdKm5u vMruoLXwv+4n/76F6KZminBxybgzrE5tcIEhqVPs= Date: Fri, 6 Sep 2024 23:29:03 +0000 From: Eric Wong To: Filip Hejsek Cc: Konstantin Ryabitsev , meta@public-inbox.org Subject: Re: Occasional web view corruption (extra html escapes) Message-ID: <20240906232903.M567140@dcvr> References: <20240903-brainy-lionfish-of-saturation-71ae1a@lemur> <20240906224206.M417183@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20240906224206.M417183@dcvr> List-Id: Eric Wong wrote: > Coderepos were hiding the bug from me. coderepos + cindex, that is. > So I think adding a guard to prevent using regexp from empty -by_addr > is a first step; but additional checks + preload fixes should > probably be needed... This should fix the immediate issue on its own: -----8<----- Subject: [PATCH] view: fix addr2url mapping corruption We must avoid generating a qr/\b()\b/ regexp which matches every word boundary. This is caused by a particular set of circumstances for WWW instances: 1. extindex must be in use 2. cindex must NOT be in use OR WWW->preload wasn't used (custom .psgi or non-p-i-{httpd,netd} users) 3. first HTTP request hits /$EXTINDEX/$MSGID/ (where $EXTINDEX is typically `all') On extindex-using instances without a cindex configured, the first HTTP request hitting the extindex encounters an empty {-by_addr} hash table. This empty {-by_addr} hash table causes View->addr2urlmap() to return an all-matching regexp which corrupts HTML when attempting address substitutions. cindex-using instances avoid the problem by triggering _fill_all() during PublicInbox::WWW->preload and ensuring {-by_addr} of the PublicInbox::Config object is populated. Thanks to Konstantin for the initial report and Filip for the immensely helpful explanation of the problem. Helped-by: Filip Hejsek Reported-by: Konstantin Ryabitsev Link: https://public-inbox.org/meta/20240903-brainy-lionfish-of-saturation-71ae1a@lemur/ Fixes: 48cbe0c3c8dc4d26 (www: linkify inbox addresses in To/Cc headers, 2024-01-09) --- lib/PublicInbox/View.pm | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm index bc093a20..154e7537 100644 --- a/lib/PublicInbox/View.pm +++ b/lib/PublicInbox/View.pm @@ -203,8 +203,12 @@ sub addr2urlmap ($) { my $tmp = $ctx->{www}->{pi_cfg}->{-addr2urlmap}; my @k = keys %$tmp; # random order delete @$tmp{@k[0..3]} if scalar(@k) > 7; - my $re = join('|', map { quotemeta } keys %addr2url); - $tmp->{$key} = [ qr/\b($re)\b/i, \%addr2url ]; + if (scalar keys %addr2url) { + my $re = join('|', map { quotemeta } keys %addr2url); + $tmp->{$key} = [ qr/\b($re)\b/i, \%addr2url ]; + } else { # nothing? NUL should never match: + [ qr/(\0)/, { "\0" => './' } ]; + } }; @$ent; }