From: Ihor Radchenko <yantar92@posteo.net>
To: Max Nikulin <manikulin@gmail.com>
Cc: emacs-orgmode@gnu.org
Subject: Re: [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)]
Date: Sun, 16 Jun 2024 15:59:34 +0000 [thread overview]
Message-ID: <875xu86fq1.fsf@localhost> (raw)
In-Reply-To: <v4n17e$gnr$1@ciao.gmane.io>
Max Nikulin <manikulin@gmail.com> writes:
>> +*** Trailing =-= is now allowed in plain links
>
> After a look into
>
> 7dcb1afb6 2021-03-24 21:27:24 +0800 Ihor Radchenko: Improve
> org-link-plain-re
>
> I suspect, it worked prior to v9.5. Without a unit test it may be
> accidentally broken again.
No, it did not work.
If you can, please do not make such assertions without testing.
>> +: https://domain/test-
>
> example.org, example.net, example.com are domains reserved for usage in
> examples:
> <https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml>
And so?
>> (or (regexp "[^[:punct:] \t\n]")
>
> I have realized that some Org regexps use [:punct:] *regexp class* and
> others *syntax class*, see latex math regexp. I am in doubts if the
> discrepancy is intentional.
It is not intentional, but using syntax classes can sometimes be
fragile.
> I have noticed that the following change
>
> 09ced6d2c 2024-02-03 15:15:46 +0100 Ihor Radchenko: org-link-plain-re:
> Improve regexp heuristics
>
> that causes
>
> (link http://example.org/a<b)
>
> input is exported as
>
> <p>
> (link <a
> href="http://example.org/a%3Cb)">http://example.org/a%3Cb)</a></p>
>
> I expect that ")" should not be parsed as a part of the link. Balanced
> brackets are tricky with regexps (and it is not possible to match
> arbitrary nested ones).
It is heuristics. We cannot be 100% right. So, it is what it is.
> Perhaps "[^[:punct:] \t\n]" is too strict in respect to spaces. It does
> not allow the recommended workaround with zero width space:
You don't need zero width space for links.
Just use <bracket link>.
> As to the original bug report, while reading it, I noticed that
> thunderbird includes dash into the recognized link for
>
> "https://domain/test-"
>
> I decided to look into its implementation and to my surprise I found:
> ``punctation chars and "-" at the end are stipped off.'' I realized that
> double quotes along with angle brackets are treated as a recommended way
> to mark URLs in plain text. Thunderbird does not consider dash as a part
> of links for e.g. http://example.org/t- It might be an attempt to
> reserve possibility to assemble URLs wrapped into several lines with
> added hyphenation marks, but it has not been implemented (RFC2396
> appendix E warns about accidentally added hyphens).
>
> https://www.bucksch.org/1/projects/mozilla/16507/
> https://searchfox.org/mozilla-central/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#line-243
> mozTXTToHTMLConv::FindURLEnd
>
> Implementation is tricky, I have not noticed anything that may be reused
> to improve heuristics for Org. Nowadays it is likely better to inspect
> autolinking code for GitHub/GitLab or widely used python packages.
If you have concrete proposals, please share them.
> I would consider [:space:] or \s-.
Do you mean "[^[:punct:][:space:]\t\n]"?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
next prev parent reply other threads:[~2024-06-16 15:58 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 13:32 [BUG] Trailing dash is not included in link [9.7.3 (9.7.3-2f1844 @ /home/mwillcock/.emacs.d/elpa/org-9.7.3/)] Morgan Willcock
2024-06-14 14:04 ` Ihor Radchenko
2024-06-16 15:43 ` Max Nikulin
2024-06-16 15:59 ` Ihor Radchenko [this message]
2024-06-20 12:15 ` Max Nikulin
2024-06-22 13:41 ` Ihor Radchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875xu86fq1.fsf@localhost \
--to=yantar92@posteo.net \
--cc=emacs-orgmode@gnu.org \
--cc=manikulin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).