From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS54825 139.178.80.0/21 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C13301F54A for ; Wed, 4 Sep 2024 14:43:04 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.a=rsa-sha256 header.s=korg header.b=RQ1g6P02; dkim-atps=neutral Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id F088D5C5742; Wed, 4 Sep 2024 14:42:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAE54C4CEC2; Wed, 4 Sep 2024 14:43:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1725460983; bh=NTtvm1I5yHBsLb1lByYmu5RUP2fJ8MwlZZPAwS7RNp8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=RQ1g6P02cErY2BOAYZc7mmBIijHplBIjnab4dkl/EOo4lGZRNmntyglSucoZF9hYL 3sUMD3bLxuFngPTy7gqB1s1m2ib2ezPFCWuXJrU3ZFG9I27tCa6k2xI26fsFpA/k5r 4fJJ/dP0fqcsGKyRdHSOq835tSGJH1ewEhDEj8Xw= Date: Wed, 4 Sep 2024 10:42:59 -0400 From: Konstantin Ryabitsev To: Eric Wong Cc: meta@public-inbox.org Subject: Re: Occasional web view corruption (extra html escapes) Message-ID: <20240904-fearless-solid-toucanet-a70afa@meerkat> References: <20240903-brainy-lionfish-of-saturation-71ae1a@lemur> <20240903191151.M126396@dcvr> <20240903-woodoo-airborne-harrier-6733c5@meerkat> <20240904131443.M476652@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20240904131443.M476652@dcvr> List-Id: On Wed, Sep 04, 2024 at 01:14:43PM GMT, Eric Wong wrote: > OK, and nothing malformed or commented out in the address= > fields? And no stray semi-colons or hash marks in any address > fields which comment out only the value? Nope. > Because having an empty address= and empty url= field in my > publicinbox.lkml section gives me something close (but not > exactly) to what you have: > > href="/?t=20240829184845"> & href="/?t=20240829184845">lt href="/?t=20240829184845">; > Note the href has a leading slash in my near-reproduction > but yours did not. > > It's not a double-escaping problem, but the substituion is > is breaking already -escaped `<' and `>' apart with > tags in between characters. but I'm not sure how this is > happening to you if all your address fields look OK. > > I'll filter out blank addresses in our config reader, but > I also wonder if there's anything else going on... I think so, because the timestamp I have on the config file is Aug 20, and the corruption manifested itself on Sep 2 or 3. Previously this happened on Aug 26, which is the last time I would have restarted the public-inbox-httpd process. The only notable thing I currently see about these events is that they were on a Monday 1 week apart, but it could have been an accident. > And all your writes to the config are via git-config? Yes, they are written as part of public-inbox-init. This is the chunk of code where that happens: https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_indexer.py#n164 > Fwiw, the buggy code would be in addr2urlmap called by > _msg_page_prepare in PublicInbox::View. > addr2urlmap() will escape any regexp metacharacters present in > addresses via quotemeta, so there's no chance of regexp > injection from a config. > > (hopefully coherent, running on fumes due to real life things) One thing I can do next time this happens is take that node out of rotation and keep it in that state for additional debugging. Thanks for your help! -K