* [PATCH] Import: Be more careful with names in email
@ 2020-07-03 21:53 Eric W. Biederman
2020-07-03 23:30 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2020-07-03 21:53 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
I recent encountered a nasty (in several sense of the word) from line:
From: =?UTF-8?B?DQolLg0KSVwnbSBhIGhvcm55IGJyYXppbGlhbiBhbmQgSVwnbSBuZXcgaGVyZS4gSVwnbSBzZWFyY2hpbmcgZm9yIGEgZ3V5IHRvIGZ1Y2sgbWUgZnJvbSBiZWhpbmQgdGhpcyBldmVuaW5nLiBUZXh0IG1lIHNvb24gaHR0cDovL2NlbnRwZWFkb3RvdGFiLmNmL3JBbmdpZQ0KDQogJHB4Mm9ybTVuam1lDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0=?= <wordpress@practicalhomeschooler.com>
After extract_cmt_info was extracted and decoded the ``name'',
passing that decoded ``name'' to git fast-import caused it
to choke. Something about a missing '>'.
To prevent this happening in the future transform a few more characters
into spaces, and don't use string interpolation, use comma separated
variables instead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
I honestly don't know if I have closed all of the holes
when implementing the code this way. But this changes works for me(tm).
Feel free to treat this as a bug report instead of a patch to apply.
lib/PublicInbox/Import.pm | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 66b359023bd2..df6c78368845 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -310,7 +310,7 @@ sub extract_cmt_info ($;$) {
# ref:
# <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
if (defined $name) {
- $name =~ tr/<>//d;
+ $name =~ tr/\n\r<>$/ /d;
utf8::encode($name);
} else {
$name = '';
@@ -425,9 +425,10 @@ sub add {
print $w "reset $ref\n" or wfail;
}
- print $w "commit $ref\nmark :$commit\n",
- "author $name <$email> $at\n",
- "committer $self->{ident} $ct\n" or wfail;
+ # Be very careful with the strings from the email
+ print $w "commit ", $ref, "\nmark :", $commit, "\n",
+ "author ", $name, " <", $email, "> ", $at, "\n",
+ "committer ", $self->{ident}, " ", $ct, "\n" or wfail;
print $w "data ", (length($subject) + 1), "\n",
$subject, "\n\n" or wfail;
if ($tip ne '') {
--
2.20.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Import: Be more careful with names in email
2020-07-03 21:53 [PATCH] Import: Be more careful with names in email Eric W. Biederman
@ 2020-07-03 23:30 ` Eric Wong
2020-07-04 20:25 ` [PATCH] t/import: test for nasty characters Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-07-03 23:30 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: meta
"Eric W. Biederman" <ebiederm@xmission.com> wrote:
>
> I recent encountered a nasty (in several sense of the word) from line:
>
> From: =?UTF-8?B?DQolLg0KSVwnbSBhIGhvcm55IGJyYXppbGlhbiBhbmQgSVwnbSBuZXcgaGVyZS4gSVwnbSBzZWFyY2hpbmcgZm9yIGEgZ3V5IHRvIGZ1Y2sgbWUgZnJvbSBiZWhpbmQgdGhpcyBldmVuaW5nLiBUZXh0IG1lIHNvb24gaHR0cDovL2NlbnRwZWFkb3RvdGFiLmNmL3JBbmdpZQ0KDQogJHB4Mm9ybTVuam1lDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0=?= <wordpress@practicalhomeschooler.com>
>
> After extract_cmt_info was extracted and decoded the ``name'',
> passing that decoded ``name'' to git fast-import caused it
> to choke. Something about a missing '>'.
>
> To prevent this happening in the future transform a few more characters
> into spaces, and don't use string interpolation, use comma separated
> variables instead.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>
> I honestly don't know if I have closed all of the holes
> when implementing the code this way. But this changes works for me(tm).
> Feel free to treat this as a bug report instead of a patch to apply.
>
> lib/PublicInbox/Import.pm | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
> index 66b359023bd2..df6c78368845 100644
> --- a/lib/PublicInbox/Import.pm
> +++ b/lib/PublicInbox/Import.pm
> @@ -310,7 +310,7 @@ sub extract_cmt_info ($;$) {
> # ref:
> # <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
> if (defined $name) {
> - $name =~ tr/<>//d;
> + $name =~ tr/\n\r<>$/ /d;
Is getting rid of '$' an effort to avoid double interpolation by Perl?
Perl won't recursively expand variables AFAIK.
I agree with dropping \r and \n, though.
> utf8::encode($name);
> } else {
> $name = '';
> @@ -425,9 +425,10 @@ sub add {
> print $w "reset $ref\n" or wfail;
> }
>
> - print $w "commit $ref\nmark :$commit\n",
> - "author $name <$email> $at\n",
> - "committer $self->{ident} $ct\n" or wfail;
> + # Be very careful with the strings from the email
> + print $w "commit ", $ref, "\nmark :", $commit, "\n",
> + "author ", $name, " <", $email, "> ", $at, "\n",
> + "committer ", $self->{ident}, " ", $ct, "\n" or wfail;
I haven't tested, but I'm not seeing this hunk as necessary
once \r\n<> are removed. Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] t/import: test for nasty characters
2020-07-03 23:30 ` Eric Wong
@ 2020-07-04 20:25 ` Eric Wong
2020-07-04 20:28 ` Eric W. Biederman
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-07-04 20:25 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: meta
Eric Wong <e@yhbt.net> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > - $name =~ tr/<>//d;
> > + $name =~ tr/\n\r<>$/ /d;
>
> Is getting rid of '$' an effort to avoid double interpolation by Perl?
> Perl won't recursively expand variables AFAIK.
I'm not seeing the purpose in $ being grouped with the
characters (test below confirms it, I think).
> I agree with dropping \r and \n, though.
Actually, it's already dropped in git since June.
> > - print $w "commit $ref\nmark :$commit\n",
> > - "author $name <$email> $at\n",
> > - "committer $self->{ident} $ct\n" or wfail;
> > + # Be very careful with the strings from the email
> > + print $w "commit ", $ref, "\nmark :", $commit, "\n",
> > + "author ", $name, " <", $email, "> ", $at, "\n",
> > + "committer ", $self->{ident}, " ", $ct, "\n" or wfail;
>
> I haven't tested, but I'm not seeing this hunk as necessary
> once \r\n<> are removed. Thanks.
I've tested now, and those changes to print seem unnecessary.
So I think just testing for these cases is enough.
Thanks again.
---------8<----------
Subject: [PATCH] t/import: test for nasty characters
Spammers may send emails with nasty characters which can throw
off git-fast-import. Users with non-existent or weaker spam
filters may be susceptible to corruption in the fast-import
stream as a result.
This was actually quietly fixed in git on 2020-06-01 by
commit 9ab886546cc89f37819e1ef09cb49fd9325b3a41
("smsg: introduce ->populate method"), but no test case
was created.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Link: https://public-inbox.org/meta/87imf4qn87.fsf@x220.int.ebiederm.org/
Link: https://public-inbox.org/meta/20200601100657.14700-6-e@yhbt.net/
---
t/import.t | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/t/import.t b/t/import.t
index abbc8229d..9491f3374 100644
--- a/t/import.t
+++ b/t/import.t
@@ -11,6 +11,7 @@ use PublicInbox::Spawn qw(spawn);
use Fcntl qw(:DEFAULT SEEK_SET);
use File::Temp qw/tempfile/;
use PublicInbox::TestCommon;
+use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2
my ($dir, $for_destroy) = tmpdir();
my $git = PublicInbox::Git->new($dir);
@@ -103,4 +104,27 @@ eval {
};
ok($@, 'Import->add fails on non-existent dir');
+my @cls = qw(PublicInbox::Eml);
+SKIP: {
+ require_mods('PublicInbox::MIME', 1);
+ push @cls, 'PublicInbox::MIME';
+};
+
+$main::badchars = "\n\0\r";
+my $from = '=?UTF-8?B?'. encode_base64("B\ra\nd\0\$main::badchars", ''). '?=';
+for my $cls (@cls) {
+ my $eml = $cls->new(<<EOF);
+From: $from <spammer\@example.com>
+Message-ID: <$cls\@example.com>
+
+EOF
+ ok($im->add($eml), "added $cls message with nasty char in From");
+}
+$im->done;
+my $bref = $git->cat_file('HEAD');
+like($$bref, qr/^author Ba d \$main::badchars <spammer\@example\.com> /sm,
+ 'latest commit accepted by spammer');
+$git->qx(qw(fsck --no-progress --strict));
+is($?, 0, 'fsck reported no errors');
+
done_testing();
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] t/import: test for nasty characters
2020-07-04 20:25 ` [PATCH] t/import: test for nasty characters Eric Wong
@ 2020-07-04 20:28 ` Eric W. Biederman
2020-07-04 21:24 ` Eric Wong
2020-07-05 14:55 ` Leah Neukirchen
0 siblings, 2 replies; 6+ messages in thread
From: Eric W. Biederman @ 2020-07-04 20:28 UTC (permalink / raw)
To: Eric Wong; +Cc: meta
Eric Wong <e@yhbt.net> writes:
> Eric Wong <e@yhbt.net> wrote:
>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> > - $name =~ tr/<>//d;
>> > + $name =~ tr/\n\r<>$/ /d;
>>
>> Is getting rid of '$' an effort to avoid double interpolation by Perl?
>> Perl won't recursively expand variables AFAIK.
>
> I'm not seeing the purpose in $ being grouped with the
> characters (test below confirms it, I think).
What I think we should be doing is any characters that are not a valid
part of a name (as defined by the appropriate email RFCs) should be
dealt with.
I am pretty certain $ isn't of those characters that is valid in a name.
Otherwise I suspect this will be a game of whack-a-mole as new weird
and strange cases crop up. Maybe I am just paranoid, but right now
the code seems a bit too liberal in what it accepts.
Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] t/import: test for nasty characters
2020-07-04 20:28 ` Eric W. Biederman
@ 2020-07-04 21:24 ` Eric Wong
2020-07-05 14:55 ` Leah Neukirchen
1 sibling, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-07-04 21:24 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: meta
"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@yhbt.net> writes:
>
> > Eric Wong <e@yhbt.net> wrote:
> >> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> > - $name =~ tr/<>//d;
> >> > + $name =~ tr/\n\r<>$/ /d;
> >>
> >> Is getting rid of '$' an effort to avoid double interpolation by Perl?
> >> Perl won't recursively expand variables AFAIK.
> >
> > I'm not seeing the purpose in $ being grouped with the
> > characters (test below confirms it, I think).
>
> What I think we should be doing is any characters that are not a valid
> part of a name (as defined by the appropriate email RFCs) should be
> dealt with.
>
> I am pretty certain $ isn't of those characters that is valid in a name.
*shrug* We'd have to dig through RFCs for that, but practically
anything can be valid once base64-encoded in headers.
This part of the code is only about keeping git-fast-import
happy. '$' is a visible character, and can be used creatively
for "fun" usernames, so I prefer we keep it.
Fwiw, I was able to send an email to a big mail provider w/o
getting flagged as spam using "Ca$h Wong" as my From: name.
mutt didn't even escape or quote it, either.
> Otherwise I suspect this will be a game of whack-a-mole as new weird
> and strange cases crop up. Maybe I am just paranoid, but right now
> the code seems a bit too liberal in what it accepts.
I prefer we keep Import.pm liberal. MDA and spam filters can be
stricter, of course.
I can't see '$' possibly doing any damage unless somebody runs
`eval' on From: headers; but if we're defending against that,
we'd have to disallow words like "rm" and "unlink", too :)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] t/import: test for nasty characters
2020-07-04 20:28 ` Eric W. Biederman
2020-07-04 21:24 ` Eric Wong
@ 2020-07-05 14:55 ` Leah Neukirchen
1 sibling, 0 replies; 6+ messages in thread
From: Leah Neukirchen @ 2020-07-05 14:55 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Eric Wong, meta
ebiederm@xmission.com (Eric W. Biederman) writes:
> Eric Wong <e@yhbt.net> writes:
>
>> Eric Wong <e@yhbt.net> wrote:
>>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>>> > - $name =~ tr/<>//d;
>>> > + $name =~ tr/\n\r<>$/ /d;
>>>
>>> Is getting rid of '$' an effort to avoid double interpolation by Perl?
>>> Perl won't recursively expand variables AFAIK.
>>
>> I'm not seeing the purpose in $ being grouped with the
>> characters (test below confirms it, I think).
>
> What I think we should be doing is any characters that are not a valid
> part of a name (as defined by the appropriate email RFCs) should be
> dealt with.
>
> I am pretty certain $ isn't of those characters that is valid in a name.
These characters are allowed as names without quoting, RFC 2822 3.2.4:
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
In particular, "." is not included.
hth,
--
Leah Neukirchen <leah@vuxu.org> https://leahneukirchen.org/
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-07-05 14:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-03 21:53 [PATCH] Import: Be more careful with names in email Eric W. Biederman
2020-07-03 23:30 ` Eric Wong
2020-07-04 20:25 ` [PATCH] t/import: test for nasty characters Eric Wong
2020-07-04 20:28 ` Eric W. Biederman
2020-07-04 21:24 ` Eric Wong
2020-07-05 14:55 ` Leah Neukirchen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).