From: David Maus <dmaus@ictsoc.de>
To: org-mode <emacs-orgmode@gnu.org>, bastien.guerry@wikimedia.fr
Subject: Improve percent escaping links in Org mode (pull request / OK to push)
Date: Sun, 02 Jan 2011 20:37:24 +0100 [thread overview]
Message-ID: <87lj33f66j.wl%dmaus@ictsoc.de> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 3241 bytes --]
This is a pull request or push announcement for the first set of
patches to improve Org mode's percent escaping functions. This set of
changes solves the problems with percent escaping non-ascii
characters.
git@github.com:dmj/dmj-org-mode.git feature/org-percent-escaping
I do have commit access but because this set of changes might break
things seriously I'd like to get an "OK to push" or someone who pulls
and reviews the changeset.
The problem:
Current implementation of percent escaping URIs uses a whitelist
approach, e.g. only percent escapes characters that are in
`org-link-escape-chars' or in a user supplied list. This is a problem
because using this function requires knowledge about all possible
characters that could occur in a URI -- and URIs are limited to plain
ASCII, meaning a call to the function must list literally all possible
characters and their escapings to get a properly percent escaped
string.
The changes:
- `org-link-escape' percent escapes every character that matches one
of the following conditiions:
* equal 37 (percent sign)
* equal 127 (DEL, control character)
* below 32 (control character)
* above 127 (non-ASCII character)
* a character in the escaping table (e.g. `org-link-escape-chars')
The character in question is first encoded in UTF-8, then all bytes
of the resulting character are percent escaped. If converting to
UTF-8 fails, Org throws an error indicating this problem.
The function got a optional third argument which can be set to merge
to user defined table with the default escaping table.
- `org-link-unescape' unescapes every percent-escape sequence. It is
no longer possible to supply a list of characters that should be
unescaped. No function in core used `org-link-unescape' with a
unescaping table.
Internally the `org-protocol-unhex-*' functions were renamend to
`org-link-unescape-*', moved to org.el and refactored (thanks to
Vincent Belaïche for suggesting some of the changes). They are
declared obsolete and aliased per 2010-11-21.
The unescaping function is backward compatible and unescapes the old
percent escape format for non-ASCII characters (thanks to Sebastian
Rose).
It is possible that the new implementation will break links in at
least this (known) case: If the user stored a link to a file or
directory containing a percent sign. Currently Org mode does not
percent escape the percent sign and subsequently the new variant of
`org-link-unescape' will try to unescpae the alleged percent escape
sequence.[1]
- `org-link-escape-chars' format changed. It's just a list of
characters to escape, the percent escape sequence is implied by the
character.
Functions in core that used a custom escaping table are changed
accordingly to use the new table format.
What is next:
- check if we can fall back to use `url-hexify-string' and
`url-unhex-string' instead our own functions
- check if the recent problems with percent escaping are solved
Best,
-- David
[1] Not escaping the percent sign is actually a glitch: Try to store
and open a link to a file literally called "foo%20baz.org".
[-- Attachment #1.2: Type: application/pgp-signature, Size: 230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 201 bytes --]
_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode
next reply other threads:[~2011-01-02 19:39 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-02 19:37 David Maus [this message]
2011-02-12 22:17 ` Improve percent escaping links in Org mode (pull request / OK to push) Bastien
2011-02-13 12:01 ` David Maus
2011-02-13 13:41 ` Bastien
2011-02-14 6:38 ` David Maus
2011-02-14 10:09 ` Bastien
2011-02-13 12:01 ` [PATCH 01/16] Decode single byte sequence if decoding unicode failed David Maus
2011-02-13 12:01 ` [PATCH 02/16] New unicode aware percent encoding algorithm David Maus
2011-02-13 12:01 ` [PATCH 03/16] New format of percent escape table David Maus
2011-02-13 12:01 ` [PATCH 04/16] Fixup doc string David Maus
2011-02-13 12:01 ` [PATCH 05/16] New optional argument: Merge user table with default table David Maus
2011-02-13 12:01 ` [PATCH 06/16] Inline function to properly decode utf8 characters in Emacs 22 David Maus
2011-02-13 12:01 ` [PATCH 07/16] Unescape functions moved and renamed from org-protocol.el David Maus
2011-02-13 12:01 ` [PATCH 08/16] Declare obsolete & alias to respective org-link-unescape-* functions David Maus
2011-02-13 12:01 ` [PATCH 09/16] Remove obsolete argument in call to org-link-unescape David Maus
2011-02-13 12:01 ` [PATCH 10/16] Use new percent escape character table format David Maus
2011-02-13 12:01 ` [PATCH 11/16] Add percent sign to list of escape chars David Maus
2011-02-13 12:01 ` [PATCH 12/16] Rename lambda argument David Maus
2011-02-13 12:01 ` [PATCH 13/16] Refactor unescaping functions David Maus
2011-02-13 12:01 ` [PATCH 14/16] Always percent escape the percent sign David Maus
2011-02-13 12:01 ` [PATCH 15/16] Use `org-link-unescape' instead of obsolete unhex string function David Maus
2011-02-13 12:01 ` [PATCH 16/16] Throw error if encoding character in utf8 fails David Maus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lj33f66j.wl%dmaus@ictsoc.de \
--to=dmaus@ictsoc.de \
--cc=bastien.guerry@wikimedia.fr \
--cc=emacs-orgmode@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.