From: Eric Wong <e@80x24.org>
To: Filip Hejsek <filip.hejsek@gmail.com>
Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org>,
meta@public-inbox.org
Subject: Re: Occasional web view corruption (extra html escapes)
Date: Fri, 6 Sep 2024 22:42:06 +0000 [thread overview]
Message-ID: <20240906224206.M417183@dcvr> (raw)
In-Reply-To: <cd1a31587f513475eb2e6a8bb4da597bd936d6ce.camel@gmail.com>
Filip Hejsek <filip.hejsek@gmail.com> wrote:
> Hello,
>
> I have figured out why this happens.
>
> I have reproduced the bug by doing the following:
> 1. setup an instance of public-inbox and import some message into it
> 2. create extindex named all and add it to config
> 3. start public-inbox-httpd
> 4. directly open http://<server-address>/all/<msg-id>/
OK, but I wasn't able to reproduce it on my setup since I
had coderepos configured :x
Coderepos were hiding the bug from me.
> The server will enter the broken state if this is the first page loaded
> from the server.
>
> The issue occurs because of the following sequence of events:
> 1. WWW->call is called
> 2. after matching the URL, msg_page is called
> 3. msg_page validates the inbox name by calling invalid_inbox_mid
> 4. invalid_inbox_mid calls invalid_inbox
> 5. the name is looked up with lookup_name
> 6. because there is no inbox with that name, undef is returned
> 7. the name is looked uo with lookup_ei
> 8. the lookup succeeds
> 9. get_mid_html is called to generate the HTML page
> 10. addr2urlmap is used to construct a regex of known addresses
> 11. because no inbox has been instantiated, $cfg->{-by_addr} is empty
> 12. because of that, $re will an empty string
> 13. so the final regex is /\b()\b/, which matches every word boundary
Thanks for the diagnosis and excellent explanation!
The reason I didn't hit this problem on my instance was because
configuring coderepos causes CodeSearch->load_coderepos to call
lookup_eidx_key, which calls _lookup_fill and populates {-by_addr}
So I think adding a guard to prevent using regexp from empty -by_addr
is a first step; but additional checks + preload fixes should
probably be needed...
Fixes coming...
next prev parent reply other threads:[~2024-09-06 22:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-03 17:52 Occasional web view corruption (extra html escapes) Konstantin Ryabitsev
2024-09-03 18:42 ` Eric Wong
2024-09-03 19:18 ` Konstantin Ryabitsev
2024-09-03 19:11 ` Eric Wong
2024-09-03 19:22 ` Konstantin Ryabitsev
2024-09-04 13:14 ` Eric Wong
2024-09-04 14:42 ` Konstantin Ryabitsev
2024-09-06 22:20 ` Filip Hejsek
2024-09-06 22:42 ` Eric Wong [this message]
2024-09-06 23:29 ` Eric Wong
2024-09-07 1:24 ` Konstantin Ryabitsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240906224206.M417183@dcvr \
--to=e@80x24.org \
--cc=filip.hejsek@gmail.com \
--cc=konstantin@linuxfoundation.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).