From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS16276 188.165.0.0/16 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id B9EC81F4B4 for ; Fri, 9 Apr 2021 22:45:57 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kyleam.com; s=key1; t=1618008356; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l6GWtLfj582yOmgIfrdT6Hmh8ehwPeAEXqTQHbaLk7A=; b=tdzOKLCJ9vqcM1G41lOf2QljYkNrs35Q7WL/4fivgrsjo/GkePYLdbX4Q7/GuzVjikKnyT 3Wo/e9+1I92XIwITZzrmWidcI+Ge+NzxRzIbjnfRJfkBQx4+VLvO507Th9apQo2b2T+IUs qOTbr3FgkmoWB3rY1jqms2fwSGoNB2VTOESl8J5QZ5q7+bkGKPJrgYib/EM3XNjl0T1SVs Uo0Izg7Sg3mheXUQYMwzxa+lCmP/tPNQL6hfhv4nEhp/mankMQW2DVBV/z9wM77HIWJ4R2 6E0AzvDJML7A64+LvbxZNtev1jCT8BtLoN0EUwZH+awS0Lz4DJyYbTqbwGn1jQ== From: Kyle Meyer To: Eric Wong Cc: meta@public-inbox.org Subject: Re: archive links broken with obfuscate=true In-Reply-To: <20210409102129.GA16787@dcvr> References: <87a6q8p5qa.fsf@kyleam.com> <20210409102129.GA16787@dcvr> Date: Fri, 09 Apr 2021 18:45:53 -0400 Message-ID: <87zgy7rs9q.fsf@kyleam.com> MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: kyle@kyleam.com List-Id: Eric Wong writes: > Oops, I think the following fixes it, but not sure if there's a > better way to accomplish the same thing.... Thanks. Jumping around a bit with that installed, I haven't spotted any remaining issues. > I worry the regexp change is susceptible to performance problems > from malicious inputs. I can't remember if something like this > triggers a pathological case or not, or if I'm confusing this > with another quirk that does (or quirks of another RE engine) Hmm... > diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm > index d20f70ae..6f1a046c 100644 > --- a/lib/PublicInbox/Hval.pm > +++ b/lib/PublicInbox/Hval.pm > @@ -82,15 +82,17 @@ sub obfuscate_addrs ($$;$) { > my $repl = $_[2] // '•'; > my $re = $ibx->{-no_obfuscate_re}; # regex of domains > my $addrs = $ibx->{-no_obfuscate}; # { $address => 1 } > - $_[1] =~ s/(([\w\.\+=\-]+)\@([\w\-]+\.[\w\.\-]+))/ > - my ($addr, $user, $domain) = ($1, $2, $3); > - if ($addrs->{$addr} || ((defined $re && $domain =~ $re))) { > + $_[1] =~ s#(\S*?)(([\w\.\+=\-]+)\@([\w\-]+\.[\w\.\-]+))# > + my ($beg, $addr, $user, $domain) = ($1, $2, $3, $4); ... what about allowing the first match to be {0,N}, where N is some not so huge value? It'd risk incorrectly obfuscating some really long links, but given that it's just the HTML presentation, that seems acceptable.