From: Nicolas Goaziou <mail@nicolasgoaziou.fr>
To: Org Mode List <emacs-orgmode@gnu.org>
Subject: [RFC] Fixing link encoding once and for all
Date: Sun, 24 Feb 2019 02:16:52 +0100 [thread overview]
Message-ID: <87tvguyohn.fsf@nicolasgoaziou.fr> (raw)
Hello,
Recently[1], issues about link escaping have resurfaced. I'd like to
solve this once and for all.
As a reminder, the initial issue is that bracket links, i.e., "[[path]]"
or "[[path][description]]", cannot contain square brackets, for obvious
reasons. Therefore, they need to be escaped somehow. For some historical
reason, the "somehow" settled, for the path part[2], on URL encoding.
Therefore [ and ] in a link must appear as, respectively, "%5B" and
"%5D". Of course, the initial link could already contain any of these
strings, so percent signs also need to be escaped, as "%25". Eventually,
consecutive spaces are not very handled very gracefully by
`fill-paragraph' function, so it is also useful, but not mandatory, to
be able to escape white spaces, with "%20". It can sadly be confusing
when Org encoding is applied on top an already encoded URI.
To sum it up, `org-link-escape', by default, URL encodes only square
brackets, percent signs and white spaces. Note that, however,
`org-link-unescape' is not its reciprocal function, despite its
docstring. It URL decodes every percent encoded combination.
Anyway, square brackets in a bracket link almost looks like a solved
problem. Alas, if some links are inserted by helper functions, such as
`org-insert-link', others could have been typed right into the buffer.
Therefore, there is usually no way to know if a link is already
Org-encoded or not. Consequently, there is usually no way to know when
a link needs to be Org-decoded. This is the root of all evil, or at
least, all bugs encountered so far. Some links end up being encoded or
decoded once too many.
To solve this, we must assume that every bracket link is properly
Org-encoded in a buffer. In other words, when typing, or yanking,
a bracket link right into a buffer, users are required to use %5B, %5D,
and %25 in the path part of the link, if necessary. I understand it will
bite some users, but using `org-insert-link' would mitigate the pain. It
is also limited to square brackets, which, I assume, is not the type of
link you usually yank.
With that assumption, the parser can safely Org-decode links
appropriately, and store paths in their decoded form. Consumers, like
export back-ends, need not call `org-link-unescape' anymore. In fact,
the only situation where `org-link-unescape' is still needed is when
extracting the path part of a bracket link from the buffer, e.g.,
through regexp matching.
Of course, the manual should mention this assumption, if we agree on it.
Thoughts?
Regards,
Footnotes:
[1] E.g., <http://lists.gnu.org/r/emacs-orgmode/2019-02/msg00265.html>
or <http://lists.gnu.org/r/emacs-orgmode/2019-02/msg00292.html>.
[2] There is no clear mechanism for the description part.
`org-insert-link' will replace square brackets with curly ones. We could
also use entities, but none of them appears as a square bracket. Anyway,
I'll ignore this issue for the time being.
--
Nicolas Goaziou
next reply other threads:[~2019-02-24 1:17 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-24 1:16 Nicolas Goaziou [this message]
2019-02-24 23:04 ` [RFC] Fixing link encoding once and for all Neil Jerram
2019-02-27 10:48 ` Nicolas Goaziou
2019-02-28 10:24 ` Neil Jerram
2019-03-01 8:14 ` Nicolas Goaziou
2019-03-01 8:30 ` Nicolas Goaziou
2019-03-01 8:40 ` Michael Brand
2019-03-01 8:41 ` Jens Lechtenboerger
2019-03-01 8:56 ` Nicolas Goaziou
2019-03-01 9:40 ` Jens Lechtenboerger
2019-03-03 6:58 ` stardiviner
2019-03-03 8:08 ` Nicolas Goaziou
2019-03-04 23:16 ` Neil Jerram
2019-03-05 0:23 ` Nicolas Goaziou
2019-03-05 16:27 ` Neil Jerram
2019-03-05 16:36 ` Robert Pluim
2019-02-25 8:54 ` stardiviner
2019-02-27 8:07 ` Jens Lechtenboerger
2019-02-27 11:25 ` Nicolas Goaziou
2019-02-27 12:57 ` Jens Lechtenboerger
2019-02-28 10:51 ` Nicolas Goaziou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tvguyohn.fsf@nicolasgoaziou.fr \
--to=mail@nicolasgoaziou.fr \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.