unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: Occasional web view corruption (extra html escapes)
Date: Wed, 4 Sep 2024 10:42:59 -0400	[thread overview]
Message-ID: <20240904-fearless-solid-toucanet-a70afa@meerkat> (raw)
In-Reply-To: <20240904131443.M476652@dcvr>

On Wed, Sep 04, 2024 at 01:14:43PM GMT, Eric Wong wrote:
> OK, and nothing malformed or commented out in the address=
> fields?  And no stray semi-colons or hash marks in any address
> fields which comment out only the value?

Nope.

> Because having an empty address= and empty url= field in my
> publicinbox.lkml section gives me something close (but not
> exactly) to what you have:
> 
> <a
> href="/?t=20240829184845"></a> &<a
> href="/?t=20240829184845"></a>lt<a
> href="/?t=20240829184845"></a>;<a
> 
> Note the href has a leading slash in my near-reproduction
> but yours did not.
> 
> It's not a double-escaping problem, but the substituion is
> is breaking already -escaped `&lt;' and `&gt;' apart with <a>
> tags in between characters.  but I'm not sure how this is
> happening to you if all your address fields look OK.
> 
> I'll filter out blank addresses in our config reader, but
> I also wonder if there's anything else going on...

I think so, because the timestamp I have on the config file is Aug 20, and the
corruption manifested itself on Sep 2 or 3. Previously this happened on Aug
26, which is the last time I would have restarted the public-inbox-httpd
process. The only notable thing I currently see about these events is that
they were on a Monday 1 week apart, but it could have been an accident.

> And all your writes to the config are via git-config?

Yes, they are written as part of public-inbox-init. This is the chunk of code
where that happens:
https://git.kernel.org/pub/scm/utils/grokmirror/grokmirror.git/tree/grokmirror/pi_indexer.py#n164

> Fwiw, the buggy code would be in addr2urlmap called by
> _msg_page_prepare in PublicInbox::View.
> addr2urlmap() will escape any regexp metacharacters present in
> addresses via quotemeta, so there's no chance of regexp
> injection from a config.
> 
> (hopefully coherent, running on fumes due to real life things)

One thing I can do next time this happens is take that node out of rotation
and keep it in that state for additional debugging.

Thanks for your help!

-K

  reply	other threads:[~2024-09-04 14:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03 17:52 Occasional web view corruption (extra html escapes) Konstantin Ryabitsev
2024-09-03 18:42 ` Eric Wong
2024-09-03 19:18   ` Konstantin Ryabitsev
2024-09-03 19:11 ` Eric Wong
2024-09-03 19:22   ` Konstantin Ryabitsev
2024-09-04 13:14     ` Eric Wong
2024-09-04 14:42       ` Konstantin Ryabitsev [this message]
2024-09-06 22:20 ` Filip Hejsek
2024-09-06 22:42   ` Eric Wong
2024-09-06 23:29     ` Eric Wong
2024-09-07  1:24       ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240904-fearless-solid-toucanet-a70afa@meerkat \
    --to=konstantin@linuxfoundation.org \
    --cc=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).