From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.3 required=3.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.6 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E51E31F628 for ; Fri, 17 Feb 2023 08:53:01 +0000 (UTC) Received: from drehscheibe.grey.stw.pengutronix.de ([2a0a:edc0:0:c01:1d::a2]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pSwUD-00008n-9O; Fri, 17 Feb 2023 09:52:57 +0100 Received: from [2a0a:edc0:0:900:1d::77] (helo=ptz.office.stw.pengutronix.de) by drehscheibe.grey.stw.pengutronix.de with esmtp (Exim 4.94.2) (envelope-from ) id 1pSwUB-005Xco-7j; Fri, 17 Feb 2023 09:52:56 +0100 Received: from ukl by ptz.office.stw.pengutronix.de with local (Exim 4.94.2) (envelope-from ) id 1pSwUB-004HNT-O9; Fri, 17 Feb 2023 09:52:55 +0100 Date: Fri, 17 Feb 2023 09:52:55 +0100 From: Uwe =?utf-8?Q?Kleine-K=C3=B6nig?= To: Eric Wong Cc: meta@public-inbox.org Subject: Re: Bug related to (maybe?) / in Message-Id Message-ID: <20230217085255.xcsaoozloz2yuxil@pengutronix.de> References: <20230216210546.eo73kyzvtzaxwxko@pengutronix.de> <20230216213628.M187845@dcvr> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="77vv6r5uuyt3l3pp" Content-Disposition: inline In-Reply-To: <20230216213628.M187845@dcvr> X-SA-Exim-Connect-IP: 2a0a:edc0:0:c01:1d::a2 X-SA-Exim-Mail-From: ukl@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: meta@public-inbox.org List-Id: --77vv6r5uuyt3l3pp Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hello Eric, first of all: Thanks for your quick answer. On Thu, Feb 16, 2023 at 09:36:28PM +0000, Eric Wong wrote: > Uwe Kleine-K=C3=B6nig wrote: > > Hello, > >=20 > > The mail by Alexander Dahl that is (currently) the first hit on > > https://lore.ptxdist.org/ptxdist/?q=3Dptxd_make_world_compile_commands_= filter > > results in a 404 when I follow the link. > >=20 > > The original mail has > >=20 > > Message-ID: > >=20 > > and the corresponding link is: > >=20 > > https://lore.ptxdist.org/ptxdist/Y+07h0l%2FzJJAgs9s@falbala.internal.h= ome.lespocky.de/ > >=20 > > I noticed this on public-inbox 1.8.0-1~bpo11+1 from Debian, upgrading to > > 1.9.0-1~bpo11+1 didn't help. > >=20 > > Other mails with / in Message-Id are not accessible either, I tested > > with: > >=20 > > YyHu/412LT8uQTy1@lenoch > > Y0/5xdFZO3u0952+@lenoch >=20 > The TODO file has this: >=20 > * use REQUEST_URI properly for CGI / mod_perl2 compatibility > with Message-IDs which include '%' (done?) >=20 > So I guess it's not done... To deal with '/' in the Message-ID, > $env->{REQUEST_URI} really needs to be the raw, undecoded URI > specified in the PSGI specs[1]. >=20 > I'm not sure how to go about it Apache+CGI or mod_perl2.. >=20 > Fwiw, the recommended configuration is: > (nginx|haproxy) -> varnish -> public-inbox-{httpd,netd} >=20 > Maybe Apache2 mpm_event reverse proxy can work in lieu of > (nginx|haproxy), but /T/, /t/, /t.mbox.gz requests are a bit > faster on -httpd/-netd since 1.6+ on SMP machines. >=20 > > I also wonder why these mails yield the webserver's 404 page and not the > > one provided by the public-inbox cgi?! >=20 > This may be the small size public-inbox's 404 page. I don't > know Apache configs well, but I know nginx did something > similar. > > Is this a problem in public-inbox, or is the apache configuration > > somehow borked? Any hints welcome. >=20 > Do you have access to that server and can show us the configs? > REQUEST_URI really needs to be raw in accordance to PSGI specs. >=20 > This can dump the request $env to stderr and show us > REQUEST_URI, PATH_INFO, SCRIPT_NAME, and anything else > which may enlighten us: >=20 > diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm > index 9ffcb879..f67fe8e6 100644 > --- a/lib/PublicInbox/WWW.pm > +++ b/lib/PublicInbox/WWW.pm > @@ -52,7 +52,8 @@ sub call { > # none of the keys we care about will need escaping > ($k // '', uri_unescape($v // '')) > } split(/[&;]+/, $env->{QUERY_STRING}); > - > + use Data::Dumper; $Data::Dumper::Useqq =3D 1; > + warn Dumper($env); > my $path_info =3D path_info_raw($env); > my $method =3D $env->{REQUEST_METHOD}; I added that patch and for the reported request this didn't trigger, which I assume means that public-inbox isn't called at all. =20 Playing around with slashes got my admin and me on the right trail: https://httpd.apache.org/docs/current/mod/core.html#allowencodedslashes We set that to "On" and now it (mostly) works. Maybe it's worth adding this hint to the documentation even though apache isn't the most recommended setup? Maybe other servers have a similar security setting? I wrote "mostly" because https://lore.ptxdist.org/ptxdist/Y+07h0l%2FzJJAgs9s@falbala.internal.home.= lespocky.de/ https://lore.ptxdist.org/ptxdist/Y+07h0l%2FzJJAgs9s@falbala.internal.home.= lespocky.de https://lore.ptxdist.org/ptxdist/Y+07h0l/zJJAgs9s@falbala.internal.home.le= spocky.de/ work as expected; https://lore.ptxdist.org/ptxdist/Y+07h0l/zJJAgs9s@falbala.internal.home.le= spocky.de however does not, that yields a short "Not Found". With the patch applied the logged stuff for these URLs is mostly identical. REMOTE_PORT differs which is expected. Otherwise only PATH_INFO, PATH_TRANSLATED and REQUEST_URI differ. They are respectively: "PATH_INFO" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespock= y.de/", "PATH_TRANSLATED" =3D> "/usr/lib/cgi-bin/public-inbox.cgi/ptxdist/Y+07h0l/= zJJAgs9s\@falbala.internal.home.lespocky.de/", "REQUEST_URI" =3D> "/ptxdist/Y+07h0l%2FzJJAgs9s\@falbala.internal.home.les= pocky.de/", "PATH_INFO" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespock= y.de", "PATH_TRANSLATED" =3D> "/usr/lib/cgi-bin/public-inbox.cgi/ptxdist/Y+07h0l/= zJJAgs9s\@falbala.internal.home.lespocky.de", "REQUEST_URI" =3D> "/ptxdist/Y+07h0l%2FzJJAgs9s\@falbala.internal.home.les= pocky.de", "PATH_INFO" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespock= y.de/", "PATH_TRANSLATED" =3D> "/usr/lib/cgi-bin/public-inbox.cgi/ptxdist/Y+07h0l/= zJJAgs9s\@falbala.internal.home.lespocky.de/", "REQUEST_URI" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespo= cky.de/", "PATH_INFO" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespock= y.de", "PATH_TRANSLATED" =3D> "/usr/lib/cgi-bin/public-inbox.cgi/ptxdist/Y+07h0l/= zJJAgs9s\@falbala.internal.home.lespocky.de", "REQUEST_URI" =3D> "/ptxdist/Y+07h0l/zJJAgs9s\@falbala.internal.home.lespo= cky.de", which I think is all as expected. In all cases we have "SCRIPT_NAME" =3D> "", =2E Not sure making the last URL work is easily possible (and worth the effort)? If a Message-Id ends in "/T" or similar the result will always be ambigous? One thing I just noticed is: $ curl https://lore.ptxdist.org/ptxdist/Y+07h0l/zJJAgs9s@falbala.internal.h= ome.lespocky.de/T Redirecting to https://lore.ptxdist.org/ptxdist/Y+07h0l/zJJAgs9s@falbala.in= ternal.home.lespocky.de/T which makes Firefox say: "The page isn=E2=80=99t redirecting properly". It = works fine with the / replaced by %2F: $ curl https://lore.ptxdist.org/ptxdist/Y+07h0l%2fzJJAgs9s@falbala.internal= =2Ehome.lespocky.de/T Redirecting to https://lore.ptxdist.org/ptxdist/Y+07h0l%2fzJJAgs9s@falbala.= internal.home.lespocky.de/T/#u Best regards Uwe --=20 Pengutronix e.K. | Uwe Kleine-K=C3=B6nig = | Industrial Linux Solutions | https://www.pengutronix.de/ | --77vv6r5uuyt3l3pp Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEfnIqFpAYrP8+dKQLwfwUeK3K7AkFAmPvQGQACgkQwfwUeK3K 7Al36Qf9FDcKxCSdZ7cS2IwaOAOtzgbcX/GTn3Hnqmlu/eJLLSuDt682DRdr5Ti0 aZKkcdee7Z0cW2W/JT/yQJAcXNR0yGCh2DjZxhEGDJFz0JX9gpRwsTFgR4PX5Tbz NRtA47bkGutz85PSst9G5H5Bkmk9t0i1V7BPaeyCwyavj3pnlso69HsArVUvb6tz WFIg/YqzZQBjecDpgWlYUQeRTaynFXuZN6gJNkRn8FdP9MazbST79SeB+N1hVchW nVV2hbx6fkJld0Oefs658L6Ez/qY5QiTlkVsOOmtFiB4tCqdVSKAJb9ICXl8MuAP f27qw3htfJi+ZRSdZ7unvXx+/JKuMA== =jdzh -----END PGP SIGNATURE----- --77vv6r5uuyt3l3pp--