From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: meta@public-inbox.org Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A8CAF633821; Sun, 1 May 2016 02:12:31 +0000 (UTC) Date: Sun, 1 May 2016 02:12:31 +0000 From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] linkify: match more URL characters [:,\$] and schemes Message-ID: <20160501021231.GA18753@dcvr.yhbt.net> References: <20160501014913.12358-1-e@80x24.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160501014913.12358-1-e@80x24.org> List-Id: Eric Wong wrote: > There's probably some other acceptable characters I'm missing. Yup, just reading the rfc2396 parser in Ruby... ---------8<----------- Subject: [PATCH] linkify: match more URL characters [:,\$] and schemes Adding ':' (colon), ',' (comma), '$' (dollar sign) and supporting TLS-enabled schemes: ftps, nntps variants as well as gopher :D --- lib/PublicInbox/Linkify.pm | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/PublicInbox/Linkify.pm b/lib/PublicInbox/Linkify.pm index 49ab311..25f0b48 100644 --- a/lib/PublicInbox/Linkify.pm +++ b/lib/PublicInbox/Linkify.pm @@ -15,9 +15,9 @@ use warnings; use Digest::SHA qw/sha1_hex/; my $SALT = rand; -my $LINK_RE = qr!\b((?:ftp|https?|nntp):// +my $LINK_RE = qr!\b((?:ftps?|https?|nntps?|gopher):// [\@:\w\.-]+/ - ?[~\@\w\+\&\?\.\%\;/#=-]*)!x; + ?[,:~\$\@\w\+\&\?\.\%\;/#=-]*)!x; sub new { bless {}, shift } @@ -28,8 +28,10 @@ sub linkify_1 { my $end = ''; # it's fairly common to end URLs in messages with - # '.' or ';' to denote the end of a statement. - if ($url =~ s/(\.)\z// || $url =~ s/(;)\z//) { + # '.', ',' or ';' to denote the end of a statement; + # assume the intent was to end the statement/sentence + # in English + if ($url =~ s/([\.,;])\z//) { $end = $1; } -- EW