* [BUG] Unescaped '&' ampersands in atom header links
@ 2023-11-27 8:17 Ricardo Cañuelo
2023-11-27 10:23 ` Eric Wong
0 siblings, 1 reply; 3+ messages in thread
From: Ricardo Cañuelo @ 2023-11-27 8:17 UTC (permalink / raw)
To: meta
Hi,
When parsing outputs from lore.kernel.org with Python3 xml.dom.minidom I
noticed that, for queries that contain '&' characters, they aren't
escaped in the href attributes of the title tags in atom feed headers.
So, for example, for this request:
https://lore.kernel.org/all/?x=A&q=driver+core%3A+Fix+wait_for_device_probe%28%29+%26+deferred_probe_timeout+interaction
The atom header in the output contains:
<title
type="html">driver core: Fix wait_for_device_probe() & deferred_probe_timeout interaction - search results</title><link
rel="alternate"
type="text/html"
href="http://lore.kernel.org/all/?q=driver+core:+Fix+wait_for_device_probe()+&+deferred_probe_timeout+interaction"/><link
rel="self"
href="http://lore.kernel.org/all/?q=driver+core:+Fix+wait_for_device_probe()+&+deferred_probe_timeout+interaction&x=A"/>
where the '&' character is escaped in the text of the <title> tag but
not in the href attributes. Shouldn't these be escaped as well? If so,
the fix should be most likely located in WwwAtomStream.pm:atom_header().
Cheers,
Ricardo
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] Unescaped '&' ampersands in atom header links
2023-11-27 8:17 [BUG] Unescaped '&' ampersands in atom header links Ricardo Cañuelo
@ 2023-11-27 10:23 ` Eric Wong
2023-11-27 21:55 ` Eric Wong
0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2023-11-27 10:23 UTC (permalink / raw)
To: Ricardo Cañuelo; +Cc: meta
Ricardo Cañuelo <ricardo.canuelo@collabora.com> wrote:
<snip>
> where the '&' character is escaped in the text of the <title> tag but
> not in the href attributes. Shouldn't these be escaped as well? If so,
> the fix should be most likely located in WwwAtomStream.pm:atom_header().
Thanks for the bug report. Yes, '&' should be escaped, that's
the job of the ->qs_html call from atom_header().
The below fixes it and also brings a minor improvement in
allowing '/' to be unescaped for dfn: queries. More could be
allowed, actually, but '/' is actually common.
But I'm running on fumes right now, so an extra set of eyes
would be helpful.
deployed on https://yhbt.net/lore/
-----8<-----
Subject: [PATCH] www: qs_html: fix escaping of `q' param
Our use of MID_ESC characters was only intended for the pathname
component of URIs and not appropriate for the query string
component. So use a different $unsafe parameter list for
uri_escape to make the result appropriate for query strings by
disallowing [\&\'\+=] characters. Most notably, this change
also allows us to accept `/' (slash) unescaped to make dfn: queries
nicer to look at.
Finally, we'll also add a ascii_html call on the URI-escaped
result as an extra safety measure even though it's not really
needed.
As far as I can tell, the code without this fix didn't result in
in an HTML injection since all our uses of uri_escape did escape
angle brackets.
Reported-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com>
Link: https://public-inbox.org/meta/87o7ff4nlk.fsf@collabora.com/
---
lib/PublicInbox/MID.pm | 2 +-
lib/PublicInbox/SearchQuery.pm | 10 +++++++---
t/psgi_search.t | 27 ++++++++++++++++++++++++++-
3 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/MID.pm b/lib/PublicInbox/MID.pm
index b1ae9939..97cf3a54 100644
--- a/lib/PublicInbox/MID.pm
+++ b/lib/PublicInbox/MID.pm
@@ -125,7 +125,7 @@ sub uniq_mids ($;$) {
\@ret;
}
-# RFC3986, section 3.3:
+# RFC3986, section 3.3 (pathnames only):
sub MID_ESC () { '^A-Za-z0-9\-\._~!\$\&\'\(\)\*\+,;=:@' }
sub mid_escape ($) { uri_escape_utf8($_[0], MID_ESC) }
diff --git a/lib/PublicInbox/SearchQuery.pm b/lib/PublicInbox/SearchQuery.pm
index 96246c53..747e3249 100644
--- a/lib/PublicInbox/SearchQuery.pm
+++ b/lib/PublicInbox/SearchQuery.pm
@@ -6,7 +6,7 @@ package PublicInbox::SearchQuery;
use strict;
use v5.10.1;
use URI::Escape qw(uri_escape);
-use PublicInbox::MID qw(MID_ESC);
+use PublicInbox::Hval qw(ascii_html);
our $LIM = 200;
sub new {
@@ -35,9 +35,13 @@ sub qs_html {
}
my $qs = '';
if (defined(my $q = $self->{'q'})) {
- $q = uri_escape($q, MID_ESC);
+ # not using MID_ESC since that's for the path component and
+ # this is for the query component. Unlike MID_ESC,
+ # this disallows [\&\'\+=] and allows slash [/] for
+ # nicer looking dfn: queries
+ $q = uri_escape($q, '^A-Za-z0-9\-\._~!\$\(\)\*,;:@/');
$q =~ s/%20/+/g; # improve URL readability
- $qs .= "q=$q";
+ $qs .= 'q='.ascii_html($q);
}
if (my $o = $self->{o}) { # ignore o == 0
$qs .= "&o=$o";
diff --git a/t/psgi_search.t b/t/psgi_search.t
index 289c34e7..8c981c6c 100644
--- a/t/psgi_search.t
+++ b/t/psgi_search.t
@@ -18,7 +18,8 @@ local $ENV{TZ} = 'UTC';
my $digits = '10010260936330';
my $ua = 'Pine.LNX.4.10';
my $mid = "$ua.$digits.2460-100000\@penguin.transmeta.com";
-my $ibx = create_inbox 'git', indexlevel => 'full', tmpdir => "$tmpdir/1", sub {
+my $ibx = create_inbox '26-git', indexlevel => 'full', tmpdir => "$tmpdir/1",
+sub {
my ($im) = @_;
# n.b. these headers are not properly RFC2047-encoded
$im->add(PublicInbox::Eml->new(<<EOF)) or BAIL_OUT;
@@ -48,6 +49,17 @@ Message-ID: <no-subject-at-all@example.com>
From: no subject at all <no-subject-at-all@example.com>
To: git@vger.kernel.org
+EOF
+ $im->add(PublicInbox::Eml->new(<<'EOF')) or BAIL_OUT;
+Message-ID: <ampersand@example.com>
+From: <e@example.com>
+To: git@vger.kernel.org
+Subject: git & ampersand
+
+hi +++ b/foo
+x=y
+s'more
+
EOF
};
@@ -155,6 +167,19 @@ test_psgi(sub { $www->call(@_) }, sub {
is($res->code, 200, 'successful mbox download w/ threads');
gunzip(\($res->content) => \(my $after));
isnt($before, $after);
+
+ $res = $cb->(GET('/test/?q=git+%26+ampersand&x=A'));
+ is $res->code, 200, 'Atom hit with ampersand';
+ unlike $res->content, qr/git\+&\+ampersand/, '& is HTML-escaped';
+
+ $res = $cb->(GET('/test/?q=%22hi+%2b%2b%2b+b/foo%22&x=A'));
+ is $res->code, 200, 'slashes and plusses search hit';
+ like $res->content, qr!q=%22hi\+(?:%2[bB]){3}\+b/foo%22!,
+ '+ and " escaped, but slash not escaped in query';
+
+ $res = $cb->(GET(q{/test/?q=%22s'more%22&x=A}));
+ is $res->code, 200, 'single quote inside phrase';
+ # TODO: more tests and odd cases
});
done_testing();
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [BUG] Unescaped '&' ampersands in atom header links
2023-11-27 10:23 ` Eric Wong
@ 2023-11-27 21:55 ` Eric Wong
0 siblings, 0 replies; 3+ messages in thread
From: Eric Wong @ 2023-11-27 21:55 UTC (permalink / raw)
To: Ricardo Cañuelo; +Cc: meta
Thanks, pushed as commit 577e421a0815e66f965bd4317adad5aeea3cc52a
with your Tested-By (sent privately in <87leaj4ea1.fsf@collabora.com>)
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-11-27 21:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-27 8:17 [BUG] Unescaped '&' ampersands in atom header links Ricardo Cañuelo
2023-11-27 10:23 ` Eric Wong
2023-11-27 21:55 ` Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).