[PATCH] Import: Be more careful with names in email

unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed

* [PATCH] Import: Be more careful with names in email
@ 2020-07-03 21:53 Eric W. Biederman
  2020-07-03 23:30 ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2020-07-03 21:53 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta


I recent encountered a nasty (in several sense of the word) from line:

From: =?UTF-8?B?DQolLg0KSVwnbSBhIGhvcm55IGJyYXppbGlhbiBhbmQgSVwnbSBuZXcgaGVyZS4gSVwnbSBzZWFyY2hpbmcgZm9yIGEgZ3V5IHRvIGZ1Y2sgbWUgZnJvbSBiZWhpbmQgdGhpcyBldmVuaW5nLiBUZXh0IG1lIHNvb24gaHR0cDovL2NlbnRwZWFkb3RvdGFiLmNmL3JBbmdpZQ0KDQogJHB4Mm9ybTVuam1lDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0=?= <wordpress@practicalhomeschooler.com>

After extract_cmt_info was extracted and decoded the ``name'',
passing that decoded ``name'' to git fast-import caused it
to choke.  Something about a missing '>'.

To prevent this happening in the future transform a few more characters
into spaces, and don't use string interpolation, use comma separated
variables instead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---

I honestly don't know if I have closed all of the holes
when implementing the code this way.  But this changes works for me(tm).
Feel free to treat this as a bug report instead of a patch to apply.

 lib/PublicInbox/Import.pm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
index 66b359023bd2..df6c78368845 100644
--- a/lib/PublicInbox/Import.pm
+++ b/lib/PublicInbox/Import.pm
@@ -310,7 +310,7 @@ sub extract_cmt_info ($;$) {
 	# ref:
 	# <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
 	if (defined $name) {
-		$name =~ tr/<>//d;
+		$name =~ tr/\n\r<>$/     /d;
 		utf8::encode($name);
 	} else {
 		$name = '';
@@ -425,9 +425,10 @@ sub add {
 		print $w "reset $ref\n" or wfail;
 	}
 
-	print $w "commit $ref\nmark :$commit\n",
-		"author $name <$email> $at\n",
-		"committer $self->{ident} $ct\n" or wfail;
+	# Be very careful with the strings from the email
+	print $w "commit ", $ref, "\nmark :", $commit, "\n",
+		"author ", $name, " <", $email, "> ", $at, "\n",
+		"committer ", $self->{ident}, " ", $ct, "\n" or wfail;
 	print $w "data ", (length($subject) + 1), "\n",
 		$subject, "\n\n" or wfail;
 	if ($tip ne '') {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Import: Be more careful with names in email
  2020-07-03 21:53 [PATCH] Import: Be more careful with names in email Eric W. Biederman
@ 2020-07-03 23:30 ` Eric Wong
  2020-07-04 20:25   ` [PATCH] t/import: test for nasty characters Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-07-03 23:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> 
> I recent encountered a nasty (in several sense of the word) from line:
> 
> From: =?UTF-8?B?DQolLg0KSVwnbSBhIGhvcm55IGJyYXppbGlhbiBhbmQgSVwnbSBuZXcgaGVyZS4gSVwnbSBzZWFyY2hpbmcgZm9yIGEgZ3V5IHRvIGZ1Y2sgbWUgZnJvbSBiZWhpbmQgdGhpcyBldmVuaW5nLiBUZXh0IG1lIHNvb24gaHR0cDovL2NlbnRwZWFkb3RvdGFiLmNmL3JBbmdpZQ0KDQogJHB4Mm9ybTVuam1lDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0=?= <wordpress@practicalhomeschooler.com>
> 
> After extract_cmt_info was extracted and decoded the ``name'',
> passing that decoded ``name'' to git fast-import caused it
> to choke.  Something about a missing '>'.
> 
> To prevent this happening in the future transform a few more characters
> into spaces, and don't use string interpolation, use comma separated
> variables instead.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> 
> I honestly don't know if I have closed all of the holes
> when implementing the code this way.  But this changes works for me(tm).
> Feel free to treat this as a bug report instead of a patch to apply.
> 
>  lib/PublicInbox/Import.pm | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/PublicInbox/Import.pm b/lib/PublicInbox/Import.pm
> index 66b359023bd2..df6c78368845 100644
> --- a/lib/PublicInbox/Import.pm
> +++ b/lib/PublicInbox/Import.pm
> @@ -310,7 +310,7 @@ sub extract_cmt_info ($;$) {
>  	# ref:
>  	# <CAD0k6qSUYANxbjjbE4jTW4EeVwOYgBD=bXkSu=akiYC_CB7Ffw@mail.gmail.com>
>  	if (defined $name) {
> -		$name =~ tr/<>//d;
> +		$name =~ tr/\n\r<>$/     /d;

Is getting rid of '$' an effort to avoid double interpolation by Perl?
Perl won't recursively expand variables AFAIK.

I agree with dropping \r and \n, though.

>  		utf8::encode($name);
>  	} else {
>  		$name = '';
> @@ -425,9 +425,10 @@ sub add {
>  		print $w "reset $ref\n" or wfail;
>  	}
>  
> -	print $w "commit $ref\nmark :$commit\n",
> -		"author $name <$email> $at\n",
> -		"committer $self->{ident} $ct\n" or wfail;
> +	# Be very careful with the strings from the email
> +	print $w "commit ", $ref, "\nmark :", $commit, "\n",
> +		"author ", $name, " <", $email, "> ", $at, "\n",
> +		"committer ", $self->{ident}, " ", $ct, "\n" or wfail;

I haven't tested, but I'm not seeing this hunk as necessary
once \r\n<> are removed.  Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] t/import: test for nasty characters
  2020-07-03 23:30 ` Eric Wong
@ 2020-07-04 20:25   ` Eric Wong
  2020-07-04 20:28     ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-07-04 20:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

Eric Wong <e@yhbt.net> wrote:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> > -		$name =~ tr/<>//d;
> > +		$name =~ tr/\n\r<>$/     /d;
> 
> Is getting rid of '$' an effort to avoid double interpolation by Perl?
> Perl won't recursively expand variables AFAIK.

I'm not seeing the purpose in $ being grouped with the
characters (test below confirms it, I think).

> I agree with dropping \r and \n, though.

Actually, it's already dropped in git since June.

> > -	print $w "commit $ref\nmark :$commit\n",
> > -		"author $name <$email> $at\n",
> > -		"committer $self->{ident} $ct\n" or wfail;
> > +	# Be very careful with the strings from the email
> > +	print $w "commit ", $ref, "\nmark :", $commit, "\n",
> > +		"author ", $name, " <", $email, "> ", $at, "\n",
> > +		"committer ", $self->{ident}, " ", $ct, "\n" or wfail;
> 
> I haven't tested, but I'm not seeing this hunk as necessary
> once \r\n<> are removed.  Thanks.

I've tested now, and those changes to print seem unnecessary.
So I think just testing for these cases is enough.
Thanks again.

---------8<----------
Subject: [PATCH] t/import: test for nasty characters

Spammers may send emails with nasty characters which can throw
off git-fast-import.  Users with non-existent or weaker spam
filters may be susceptible to corruption in the fast-import
stream as a result.

This was actually quietly fixed in git on 2020-06-01 by
commit 9ab886546cc89f37819e1ef09cb49fd9325b3a41
("smsg: introduce ->populate method"), but no test case
was created.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Link: https://public-inbox.org/meta/87imf4qn87.fsf@x220.int.ebiederm.org/
Link: https://public-inbox.org/meta/20200601100657.14700-6-e@yhbt.net/
---
 t/import.t | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/t/import.t b/t/import.t
index abbc8229d..9491f3374 100644
--- a/t/import.t
+++ b/t/import.t
@@ -11,6 +11,7 @@ use PublicInbox::Spawn qw(spawn);
 use Fcntl qw(:DEFAULT SEEK_SET);
 use File::Temp qw/tempfile/;
 use PublicInbox::TestCommon;
+use MIME::Base64 3.05; # Perl 5.10.0 / 5.9.2
 my ($dir, $for_destroy) = tmpdir();
 
 my $git = PublicInbox::Git->new($dir);
@@ -103,4 +104,27 @@ eval {
 };
 ok($@, 'Import->add fails on non-existent dir');
 
+my @cls = qw(PublicInbox::Eml);
+SKIP: {
+	require_mods('PublicInbox::MIME', 1);
+	push @cls, 'PublicInbox::MIME';
+};
+
+$main::badchars = "\n\0\r";
+my $from = '=?UTF-8?B?'. encode_base64("B\ra\nd\0\$main::badchars", ''). '?=';
+for my $cls (@cls) {
+	my $eml = $cls->new(<<EOF);
+From: $from <spammer\@example.com>
+Message-ID: <$cls\@example.com>
+
+EOF
+	ok($im->add($eml), "added $cls message with nasty char in From");
+}
+$im->done;
+my $bref = $git->cat_file('HEAD');
+like($$bref, qr/^author Ba d \$main::badchars <spammer\@example\.com> /sm,
+	 'latest commit accepted by spammer');
+$git->qx(qw(fsck --no-progress --strict));
+is($?, 0, 'fsck reported no errors');
+
 done_testing();

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/import: test for nasty characters
  2020-07-04 20:25   ` [PATCH] t/import: test for nasty characters Eric Wong
@ 2020-07-04 20:28     ` Eric W. Biederman
  2020-07-04 21:24       ` Eric Wong
  2020-07-05 14:55       ` Leah Neukirchen
  0 siblings, 2 replies; 6+ messages in thread
From: Eric W. Biederman @ 2020-07-04 20:28 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@yhbt.net> writes:

> Eric Wong <e@yhbt.net> wrote:
>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> > -		$name =~ tr/<>//d;
>> > +		$name =~ tr/\n\r<>$/     /d;
>> 
>> Is getting rid of '$' an effort to avoid double interpolation by Perl?
>> Perl won't recursively expand variables AFAIK.
>
> I'm not seeing the purpose in $ being grouped with the
> characters (test below confirms it, I think).

What I think we should be doing is any characters that are not a valid
part of a name (as defined by the appropriate email RFCs) should be
dealt with.

I am pretty certain $ isn't of those characters that is valid in a name.

Otherwise I suspect this will be a game of whack-a-mole as new weird
and strange cases crop up.  Maybe I am just paranoid, but right now
the code seems a bit too liberal in what it accepts.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/import: test for nasty characters
  2020-07-04 20:28     ` Eric W. Biederman
@ 2020-07-04 21:24       ` Eric Wong
  2020-07-05 14:55       ` Leah Neukirchen
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-07-04 21:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@yhbt.net> writes:
> 
> > Eric Wong <e@yhbt.net> wrote:
> >> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
> >> > -		$name =~ tr/<>//d;
> >> > +		$name =~ tr/\n\r<>$/     /d;
> >> 
> >> Is getting rid of '$' an effort to avoid double interpolation by Perl?
> >> Perl won't recursively expand variables AFAIK.
> >
> > I'm not seeing the purpose in $ being grouped with the
> > characters (test below confirms it, I think).
> 
> What I think we should be doing is any characters that are not a valid
> part of a name (as defined by the appropriate email RFCs) should be
> dealt with.
> 
> I am pretty certain $ isn't of those characters that is valid in a name.

*shrug*  We'd have to dig through RFCs for that, but practically
anything can be valid once base64-encoded in headers.

This part of the code is only about keeping git-fast-import
happy.  '$' is a visible character, and can be used creatively
for "fun" usernames, so I prefer we keep it.

Fwiw, I was able to send an email to a big mail provider w/o
getting flagged as spam using "Ca$h Wong" as my From: name.
mutt didn't even escape or quote it, either.

> Otherwise I suspect this will be a game of whack-a-mole as new weird
> and strange cases crop up.  Maybe I am just paranoid, but right now
> the code seems a bit too liberal in what it accepts.

I prefer we keep Import.pm liberal.  MDA and spam filters can be
stricter, of course.

I can't see '$' possibly doing any damage unless somebody runs
`eval' on From: headers; but if we're defending against that,
we'd have to disallow words like "rm" and "unlink", too :)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] t/import: test for nasty characters
  2020-07-04 20:28     ` Eric W. Biederman
  2020-07-04 21:24       ` Eric Wong
@ 2020-07-05 14:55       ` Leah Neukirchen
  1 sibling, 0 replies; 6+ messages in thread
From: Leah Neukirchen @ 2020-07-05 14:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Eric Wong, meta

ebiederm@xmission.com (Eric W. Biederman) writes:

> Eric Wong <e@yhbt.net> writes:
>
>> Eric Wong <e@yhbt.net> wrote:
>>> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>>> > -		$name =~ tr/<>//d;
>>> > +		$name =~ tr/\n\r<>$/     /d;
>>> 
>>> Is getting rid of '$' an effort to avoid double interpolation by Perl?
>>> Perl won't recursively expand variables AFAIK.
>>
>> I'm not seeing the purpose in $ being grouped with the
>> characters (test below confirms it, I think).
>
> What I think we should be doing is any characters that are not a valid
> part of a name (as defined by the appropriate email RFCs) should be
> dealt with.
>
> I am pretty certain $ isn't of those characters that is valid in a name.

These characters are allowed as names without quoting, RFC 2822 3.2.4:

atext           =       ALPHA / DIGIT / ; Any character except controls,
                        "!" / "#" /     ;  SP, and specials.
                        "$" / "%" /     ;  Used for atoms
                        "&" / "'" /
                        "*" / "+" /
                        "-" / "/" /
                        "=" / "?" /
                        "^" / "_" /
                        "`" / "{" /
                        "|" / "}" /
                        "~"

In particular, "." is not included.

hth,
-- 
Leah Neukirchen  <leah@vuxu.org>  https://leahneukirchen.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-07-05 14:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-03 21:53 [PATCH] Import: Be more careful with names in email Eric W. Biederman
2020-07-03 23:30 ` Eric Wong
2020-07-04 20:25   ` [PATCH] t/import: test for nasty characters Eric Wong
2020-07-04 20:28     ` Eric W. Biederman
2020-07-04 21:24       ` Eric Wong
2020-07-05 14:55       ` Leah Neukirchen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).