From: Kyle Meyer <kyle@kyleam.com>
To: Daniele Nicolodi <daniele@grinta.net>
Cc: emacs-orgmode@gnu.org
Subject: Re: Bug in identification of links?
Date: Fri, 12 Jun 2020 01:19:01 +0000 [thread overview]
Message-ID: <874krhqdx6.fsf@kyleam.com> (raw)
In-Reply-To: <5faf0bd7-b114-9723-773e-7f3da16604a0@grinta.net>
Daniele Nicolodi writes:
> org-mode fails to recognize https://doi.org/10.1016/0370-1573(89)90087-2
> as a valid URL, it breaks it after the closing parenthesis ). I don't
> understand why this is the case as I would imagine that if the )
> character is not allowed in URLs the link would be broken before it and
> not after. I haven't tried to find the code responsible for this, thus I
> don't know what exactly is going on. Does anyone have an idea?
The link is matched by org-link-plain-re, which is created by
org-link-make-regexps. The relevant part looks like this:
\\([^][ \t\n()<>]+\\(?:([[:word:]0-9_]+)\\|\\([^[:punct:] \t\n]\\|/\\)\\)\\)
-----------------
The underlined bit is what is matching "(89)". This subpattern
appeared, without the underscore, in facedba05 (Use John Gruber's
regular expression for URL's, 2009-12-09). The commit message links to
an article [0] that has this to say about the parentheses matching:
It attempts to be particularly clever with regard to parentheses,
which, in my experience, only ever seem to occur in the wild in
Wikipedia URLs, and which many URL matching patterns seem to
botch. The pattern looks for a single pair of balanced parentheses
within the URL, which is how it correctly omits the trailing
parenthesis in the following line:
(Something like http://foo.com/blah_blah)
That article also has an update recommending to use an improved variant.
Untested, but it seems like it'd handle your case.
This issue has been around a long time and is minor in that there will
always be cases that fool the regexp and these can be handled by
enclosing the text with <...> or [[...]]. Still, in my view it'd be
worth taking a look at tweaking the regexp after the release of v9.4.
[0] https://daringfireball.net/2009/11/liberal_regex_for_matching_urls
Related thread on mailing list:
https://orgmode.org/list/loom.20091130T200527-783@post.gmane.org/
prev parent reply other threads:[~2020-06-12 1:19 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-11 22:00 Bug in identification of links? Daniele Nicolodi
2020-06-12 1:19 ` Kyle Meyer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874krhqdx6.fsf@kyleam.com \
--to=kyle@kyleam.com \
--cc=daniele@grinta.net \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.