unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24.org>
To: Filip Hejsek <filip.hejsek@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
	meta@public-inbox.org
Subject: Re: Occasional web view corruption (extra html escapes)
Date: Fri, 6 Sep 2024 22:42:06 +0000	[thread overview]
Message-ID: <20240906224206.M417183@dcvr> (raw)
In-Reply-To: <cd1a31587f513475eb2e6a8bb4da597bd936d6ce.camel@gmail.com>

Filip Hejsek <filip.hejsek@gmail.com> wrote:
> Hello,
> 
> I have figured out why this happens.
> 
> I have reproduced the bug by doing the following:
> 1. setup an instance of public-inbox and import some message into it
> 2. create extindex named all and add it to config
> 3. start public-inbox-httpd
> 4. directly open http://<server-address>/all/<msg-id>/

OK, but I wasn't able to reproduce it on my setup since I
had coderepos configured :x

Coderepos were hiding the bug from me.

> The server will enter the broken state if this is the first page loaded
> from the server.
> 
> The issue occurs because of the following sequence of events:
> 1. WWW->call is called
> 2. after matching the URL, msg_page is called
> 3. msg_page validates the inbox name by calling invalid_inbox_mid
> 4. invalid_inbox_mid calls invalid_inbox
> 5. the name is looked up with lookup_name
> 6. because there is no inbox with that name, undef is returned
> 7. the name is looked uo with lookup_ei
> 8. the lookup succeeds
> 9. get_mid_html is called to generate the HTML page
> 10. addr2urlmap is used to construct a regex of known addresses
> 11. because no inbox has been instantiated, $cfg->{-by_addr} is empty
> 12. because of that, $re will an empty string
> 13. so the final regex is /\b()\b/, which matches every word boundary

Thanks for the diagnosis and excellent explanation!

The reason I didn't hit this problem on my instance was because
configuring coderepos causes CodeSearch->load_coderepos to call
lookup_eidx_key, which calls _lookup_fill and populates {-by_addr}

So I think adding a guard to prevent using regexp from empty -by_addr
is a first step; but additional checks + preload fixes should
probably be needed...

Fixes coming...

  reply	other threads:[~2024-09-06 22:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-03 17:52 Occasional web view corruption (extra html escapes) Konstantin Ryabitsev
2024-09-03 18:42 ` Eric Wong
2024-09-03 19:18   ` Konstantin Ryabitsev
2024-09-03 19:11 ` Eric Wong
2024-09-03 19:22   ` Konstantin Ryabitsev
2024-09-04 13:14     ` Eric Wong
2024-09-04 14:42       ` Konstantin Ryabitsev
2024-09-06 22:20 ` Filip Hejsek
2024-09-06 22:42   ` Eric Wong [this message]
2024-09-06 23:29     ` Eric Wong
2024-09-07  1:24       ` Konstantin Ryabitsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240906224206.M417183@dcvr \
    --to=e@80x24.org \
    --cc=filip.hejsek@gmail.com \
    --cc=konstantin@linuxfoundation.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).