* Re: Improve percent escaping links in Org mode (pull request / OK to push)
2011-02-12 22:17 ` Bastien
@ 2011-02-13 12:01 ` David Maus
2011-02-13 13:41 ` Bastien
2011-02-13 12:01 ` [PATCH 01/16] Decode single byte sequence if decoding unicode failed David Maus
` (15 subsequent siblings)
16 siblings, 1 reply; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry
Hi Bastien,
> Wow... how could I missed this email? Thanks for the thorough details
> about this change, which is most than welcome (I read Vincent's emails
> about this.)
> I hope you can rebase this on current head without too much headache,
> and provide a set of patches. I'd rather read patches than just test
> from a branch...
Rebased to current head and here we go.
Best,
-- David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Improve percent escaping links in Org mode (pull request / OK to push)
2011-02-13 12:01 ` David Maus
@ 2011-02-13 13:41 ` Bastien
2011-02-14 6:38 ` David Maus
0 siblings, 1 reply; 22+ messages in thread
From: Bastien @ 2011-02-13 13:41 UTC (permalink / raw)
To: David Maus; +Cc: emacs-orgmode
Hi David,
David Maus <dmaus@ictsoc.de> writes:
> Rebased to current head and here we go.
Wow, great work -- thanks for the perfect changelogs!
I've been through the patches, everythings looks good, feel
free to push (and to mark patches as "accepted" in patchwork.)
You mentioned some possible backward compatibility issues with
a few existing links before in this thead, any update on this?
Thanks a lot to you, Sebastian -- and Vincent B. for bringing
up this issue!
--
Bastien
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Improve percent escaping links in Org mode (pull request / OK to push)
2011-02-13 13:41 ` Bastien
@ 2011-02-14 6:38 ` David Maus
2011-02-14 10:09 ` Bastien
0 siblings, 1 reply; 22+ messages in thread
From: David Maus @ 2011-02-14 6:38 UTC (permalink / raw)
To: Bastien; +Cc: David Maus, emacs-orgmode
[-- Attachment #1.1: Type: text/plain, Size: 1130 bytes --]
At Sun, 13 Feb 2011 14:41:14 +0100,
Bastien wrote:
>
> Hi David,
>
> David Maus <dmaus@ictsoc.de> writes:
>
> > Rebased to current head and here we go.
>
> Wow, great work -- thanks for the perfect changelogs!
>
> I've been through the patches, everythings looks good, feel
> free to push (and to mark patches as "accepted" in patchwork.)
Thanks for the quick review. I won't be available until wednesday so I
most likely push wednesday or thursday evening with a short warning
notice.
> You mentioned some possible backward compatibility issues with
> a few existing links before in this thead, any update on this?
Nope, but it just occured to me that we might provide a small elisp
command that users can run in a buffer to check for possible problems?
The elisp could check each link for a substring that matches the def
of a percent escaped character (%[a-zA-Z]{2}) and is not in the old
`org-link-escape-chars' list. Such links might pose a problem because
the new unescaping function will unescape this sequence.
Best,
-- David
--
OpenPGP... 0x99ADB83B5A4478E6
Jabber.... dmjena@jabber.org
Email..... dmaus@ictsoc.de
[-- Attachment #1.2: Type: application/pgp-signature, Size: 230 bytes --]
[-- Attachment #2: Type: text/plain, Size: 201 bytes --]
_______________________________________________
Emacs-orgmode mailing list
Please use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Improve percent escaping links in Org mode (pull request / OK to push)
2011-02-14 6:38 ` David Maus
@ 2011-02-14 10:09 ` Bastien
0 siblings, 0 replies; 22+ messages in thread
From: Bastien @ 2011-02-14 10:09 UTC (permalink / raw)
To: David Maus; +Cc: emacs-orgmode
Hi David,
David Maus <dmaus@ictsoc.de> writes:
> Thanks for the quick review. I won't be available until wednesday so I
> most likely push wednesday or thursday evening with a short warning
> notice.
Looks good, thanks.
>> You mentioned some possible backward compatibility issues with
>> a few existing links before in this thead, any update on this?
>
> Nope, but it just occured to me that we might provide a small elisp
> command that users can run in a buffer to check for possible problems?
>
> The elisp could check each link for a substring that matches the def
> of a percent escaped character (%[a-zA-Z]{2}) and is not in the old
> `org-link-escape-chars' list. Such links might pose a problem because
> the new unescaping function will unescape this sequence.
Good idea!
--
Bastien
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 01/16] Decode single byte sequence if decoding unicode failed.
2011-02-12 22:17 ` Bastien
2011-02-13 12:01 ` David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 02/16] New unicode aware percent encoding algorithm David Maus
` (14 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry
From: Sebastian Rose <sebastian_rose@gmx.de>
* org-protocol.el (org-protocol-unhex-single-byte-sequence): New
function. Decode hex-encoded singly byte sequences.
(org-protocol-unhex-compound): Use new function if decoding sequence
as unicode character failed.
---
lisp/org-protocol.el | 26 +++++++++++++++++++++++---
1 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/lisp/org-protocol.el b/lisp/org-protocol.el
index 1c501f3..33878a8 100644
--- a/lisp/org-protocol.el
+++ b/lisp/org-protocol.el
@@ -305,7 +305,7 @@ part."
(defun org-protocol-unhex-string(str)
"Unhex hexified unicode strings as returned from the JavaScript function
-encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ü'."
+encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
(setq str (or str ""))
(let ((tmp "")
(case-fold-search t))
@@ -321,7 +321,9 @@ encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ü'."
(defun org-protocol-unhex-compound (hex)
- "Unhexify unicode hex-chars. E.g. `%C3%B6' is the German Umlaut `ü'."
+ "Unhexify unicode hex-chars. E.g. `%C3%B6' is the German Umlaut `ö'.
+Note: this function also decodes single byte encodings like
+`%E1' (\"á\") if not followed by another `%[A-F0-9]{2}' group."
(let* ((bytes (remove "" (split-string hex "%")))
(ret "")
(eat 0)
@@ -353,12 +355,30 @@ encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ü'."
(setq val (logxor val xor))
(setq sum (+ (lsh sum shift) val))
(if (> eat 0) (setq eat (- eat 1)))
- (when (= 0 eat)
+ (cond
+ ((= 0 eat) ;multi byte
(setq ret (concat ret (org-protocol-char-to-string sum)))
(setq sum 0))
+ ((not bytes) ; single byte(s)
+ (setq ret (org-protocol-unhex-single-byte-sequence hex))))
)) ;; end (while bytes
ret ))
+(defun org-protocol-unhex-single-byte-sequence(hex)
+ "Unhexify hex-encoded single byte character sequences."
+ (let ((bytes (remove "" (split-string hex "%")))
+ (ret ""))
+ (while bytes
+ (let* ((b (pop bytes))
+ (a (elt b 0))
+ (b (elt b 1))
+ (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
+ (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0))))
+ (setq ret
+ (concat ret (char-to-string
+ (+ (lsh c1 4) c2))))))
+ ret))
+
(defun org-protocol-flatten-greedy (param-list &optional strip-path replacement)
"Greedy handlers might receive a list like this from emacsclient:
'( (\"/dir/org-protocol:/greedy:/~/path1\" (23 . 12)) (\"/dir/param\")
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 02/16] New unicode aware percent encoding algorithm
2011-02-12 22:17 ` Bastien
2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 01/16] Decode single byte sequence if decoding unicode failed David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 03/16] New format of percent escape table David Maus
` (13 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape): New unicode aware percent encoding
algorithm.
---
lisp/org.el | 19 ++++++++-----------
1 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 0c46eec..9aeeeda 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8576,17 +8576,14 @@ This is the list that is used before handing over to the browser.")
(if (and org-url-encoding-use-url-hexify (not table))
(url-hexify-string text)
(setq table (or table org-link-escape-chars))
- (when text
- (let ((re (mapconcat (lambda (x) (regexp-quote
- (char-to-string (car x))))
- table "\\|")))
- (while (string-match re text)
- (setq text
- (replace-match
- (cdr (assoc (string-to-char (match-string 0 text))
- table))
- t t text)))
- text))))
+ (mapconcat
+ (lambda (char)
+ (if (or (assoc char table)
+ (< char 32) (> char 126))
+ (mapconcat (lambda (sequence)
+ (format "%%%.2X" sequence))
+ (encode-coding-char char 'utf-8) "")
+ (char-to-string char))) text "")))
(defun org-link-unescape (text &optional table)
"Reverse the action of `org-link-escape'."
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 03/16] New format of percent escape table
2011-02-12 22:17 ` Bastien
` (2 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 02/16] New unicode aware percent encoding algorithm David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 04/16] Fixup doc string David Maus
` (12 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape-chars, org-link-escape-chars-browser): New
format of percent escape table.
(org-link-escape): Use new table format.
Just a plain list with the chars that should be escaped.
---
lisp/org.el | 27 +++++----------------------
1 files changed, 5 insertions(+), 22 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 9aeeeda..7d38907 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8543,32 +8543,15 @@ according to FMT (default from `org-email-link-description-format')."
"]"))
(defconst org-link-escape-chars
- '((?\ . "%20")
- (?\[ . "%5B")
- (?\] . "%5D")
- (?\340 . "%E0") ; `a
- (?\342 . "%E2") ; ^a
- (?\347 . "%E7") ; ,c
- (?\350 . "%E8") ; `e
- (?\351 . "%E9") ; 'e
- (?\352 . "%EA") ; ^e
- (?\356 . "%EE") ; ^i
- (?\364 . "%F4") ; ^o
- (?\371 . "%F9") ; `u
- (?\373 . "%FB") ; ^u
- (?\; . "%3B")
-;; (?? . "%3F")
- (?= . "%3D")
- (?+ . "%2B")
- )
- "Association list of escapes for some characters problematic in links.
+ '(?\ ?\[ ?\] ?\; ?\= ?\+)
+ "List of characters that should be escaped in link.
This is the list that is used for internal purposes.")
(defvar org-url-encoding-use-url-hexify nil)
(defconst org-link-escape-chars-browser
- '((?\ . "%20")) ; 32 for the SPC char
- "Association list of escapes for some characters problematic in links.
+ '(?\ )
+ "List of escapes for characters that are problematic in links.
This is the list that is used before handing over to the browser.")
(defun org-link-escape (text &optional table)
@@ -8578,7 +8561,7 @@ This is the list that is used before handing over to the browser.")
(setq table (or table org-link-escape-chars))
(mapconcat
(lambda (char)
- (if (or (assoc char table)
+ (if (or (member char table)
(< char 32) (> char 126))
(mapconcat (lambda (sequence)
(format "%%%.2X" sequence))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 04/16] Fixup doc string
2011-02-12 22:17 ` Bastien
` (3 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 03/16] New format of percent escape table David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 05/16] New optional argument: Merge user table with default table David Maus
` (11 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape): Fixup doc string.
---
lisp/org.el | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 7d38907..cafb673 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8555,7 +8555,10 @@ This is the list that is used for internal purposes.")
This is the list that is used before handing over to the browser.")
(defun org-link-escape (text &optional table)
- "Escape characters in TEXT that are problematic for links."
+ "Return percent escaped representation of TEXT.
+TEXT is a string with the text to escape.
+Optional argument TABLE is a list with characters that should be
+escaped. When nil, `org-link-escape-chars' is used."
(if (and org-url-encoding-use-url-hexify (not table))
(url-hexify-string text)
(setq table (or table org-link-escape-chars))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 05/16] New optional argument: Merge user table with default table
2011-02-12 22:17 ` Bastien
` (4 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 04/16] Fixup doc string David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 06/16] Inline function to properly decode utf8 characters in Emacs 22 David Maus
` (10 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape): New optional argument. Merge user table
with default table.
---
lisp/org.el | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index cafb673..a29d429 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8554,14 +8554,22 @@ This is the list that is used for internal purposes.")
"List of escapes for characters that are problematic in links.
This is the list that is used before handing over to the browser.")
-(defun org-link-escape (text &optional table)
+(defun org-link-escape (text &optional table merge)
"Return percent escaped representation of TEXT.
TEXT is a string with the text to escape.
Optional argument TABLE is a list with characters that should be
-escaped. When nil, `org-link-escape-chars' is used."
+escaped. When nil, `org-link-escape-chars' is used.
+If optional argument MERGE is set, merge TABLE into
+`org-link-escape-chars'."
(if (and org-url-encoding-use-url-hexify (not table))
(url-hexify-string text)
- (setq table (or table org-link-escape-chars))
+ (cond
+ ((and table merge)
+ (mapc (lambda (defchr)
+ (unless (member defchr table)
+ (setq table (cons defchr table)))) org-link-escape-chars))
+ ((null table)
+ (setq table org-link-escape-chars)))
(mapconcat
(lambda (char)
(if (or (member char table)
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 06/16] Inline function to properly decode utf8 characters in Emacs 22
2011-02-12 22:17 ` Bastien
` (5 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 05/16] New optional argument: Merge user table with default table David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 07/16] Unescape functions moved and renamed from org-protocol.el David Maus
` (9 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org-macs.el (org-char-to-string): Inline function to properly decode
utf8 characters in Emacs 22. Moved and renamed from org-protocol.el.
* org-protocol.el (org-protocol-unhex-compound): Use renamed inline
function.
---
lisp/org-macs.el | 9 ++++++++-
lisp/org-protocol.el | 13 +------------
2 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/lisp/org-macs.el b/lisp/org-macs.el
index 5a56123..4451a54 100644
--- a/lisp/org-macs.el
+++ b/lisp/org-macs.el
@@ -35,7 +35,14 @@
(eval-and-compile
(unless (fboundp 'declare-function)
- (defmacro declare-function (fn file &optional arglist fileonly))))
+ (defmacro declare-function (fn file &optional arglist fileonly)))
+ (if (>= emacs-major-version 23)
+ (defsubst org-char-to-string(c)
+ "Defsubst to decode UTF-8 character values in emacs 23 and beyond."
+ (char-to-string c))
+ (defsubst org-char-to-string (c)
+ "Defsubst to decode UTF-8 character values in emacs 22."
+ (string (decode-char 'ucs c)))))
(declare-function org-add-props "org-compat" (string plist &rest props))
(declare-function org-string-match-p "org-compat" (&rest args))
diff --git a/lisp/org-protocol.el b/lisp/org-protocol.el
index 33878a8..eb77f02 100644
--- a/lisp/org-protocol.el
+++ b/lisp/org-protocol.el
@@ -292,17 +292,6 @@ part."
(mapcar 'org-protocol-unhex-string split-parts))
split-parts)))
-;; This inline function is needed in org-protocol-unhex-compound to do
-;; the right thing to decode UTF-8 char integer values.
-(eval-when-compile
- (if (>= emacs-major-version 23)
- (defsubst org-protocol-char-to-string(c)
- "Defsubst to decode UTF-8 character values in emacs 23 and beyond."
- (char-to-string c))
- (defsubst org-protocol-char-to-string (c)
- "Defsubst to decode UTF-8 character values in emacs 22."
- (string (decode-char 'ucs c)))))
-
(defun org-protocol-unhex-string(str)
"Unhex hexified unicode strings as returned from the JavaScript function
encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
@@ -357,7 +346,7 @@ Note: this function also decodes single byte encodings like
(if (> eat 0) (setq eat (- eat 1)))
(cond
((= 0 eat) ;multi byte
- (setq ret (concat ret (org-protocol-char-to-string sum)))
+ (setq ret (concat ret (org-char-to-string sum)))
(setq sum 0))
((not bytes) ; single byte(s)
(setq ret (org-protocol-unhex-single-byte-sequence hex))))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 07/16] Unescape functions moved and renamed from org-protocol.el
2011-02-12 22:17 ` Bastien
` (6 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 06/16] Inline function to properly decode utf8 characters in Emacs 22 David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 08/16] Declare obsolete & alias to respective org-link-unescape-* functions David Maus
` (8 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-unescape, org-link-unescape-compound)
(org-link-unescape-single-byte-sequence): Functions moved and renamed
from org-protocol.el.
---
lisp/org.el | 90 ++++++++++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 74 insertions(+), 16 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index a29d429..602462d 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8579,22 +8579,80 @@ If optional argument MERGE is set, merge TABLE into
(encode-coding-char char 'utf-8) "")
(char-to-string char))) text "")))
-(defun org-link-unescape (text &optional table)
- "Reverse the action of `org-link-escape'."
- (if (and org-url-encoding-use-url-hexify (not table))
- (url-unhex-string text)
- (setq table (or table org-link-escape-chars))
- (when text
- (let ((case-fold-search t)
- (re (mapconcat (lambda (x) (regexp-quote (downcase (cdr x))))
- table "\\|")))
- (while (string-match re text)
- (setq text
- (replace-match
- (char-to-string (car (rassoc (upcase (match-string 0 text))
- table)))
- t t text)))
- text))))
+(defun org-link-unescape (str)
+ "Unhex hexified unicode strings as returned from the JavaScript function
+encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
+ (setq str (or str ""))
+ (let ((tmp "")
+ (case-fold-search t))
+ (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str)
+ (let* ((start (match-beginning 0))
+ (end (match-end 0))
+ (hex (match-string 0 str))
+ (replacement (org-link-unescape-compound (upcase hex))))
+ (setq tmp (concat tmp (substring str 0 start) replacement))
+ (setq str (substring str end))))
+ (setq tmp (concat tmp str))
+ tmp))
+
+(defun org-link-unescape-compound (hex)
+ "Unhexify unicode hex-chars. E.g. `%C3%B6' is the German Umlaut `ö'.
+Note: this function also decodes single byte encodings like
+`%E1' (\"á\") if not followed by another `%[A-F0-9]{2}' group."
+ (let* ((bytes (remove "" (split-string hex "%")))
+ (ret "")
+ (eat 0)
+ (sum 0))
+ (while bytes
+ (let* ((b (pop bytes))
+ (a (elt b 0))
+ (b (elt b 1))
+ (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
+ (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0)))
+ (val (+ (lsh c1 4) c2))
+ (shift
+ (if (= 0 eat) ;; new byte
+ (if (>= val 252) 6
+ (if (>= val 248) 5
+ (if (>= val 240) 4
+ (if (>= val 224) 3
+ (if (>= val 192) 2 0)))))
+ 6))
+ (xor
+ (if (= 0 eat) ;; new byte
+ (if (>= val 252) 252
+ (if (>= val 248) 248
+ (if (>= val 240) 240
+ (if (>= val 224) 224
+ (if (>= val 192) 192 0)))))
+ 128)))
+ (if (>= val 192) (setq eat shift))
+ (setq val (logxor val xor))
+ (setq sum (+ (lsh sum shift) val))
+ (if (> eat 0) (setq eat (- eat 1)))
+ (cond
+ ((= 0 eat) ;multi byte
+ (setq ret (concat ret (org-char-to-string sum)))
+ (setq sum 0))
+ ((not bytes) ; single byte(s)
+ (setq ret (org-link-unescape-single-byte-sequence hex))))
+ )) ;; end (while bytes
+ ret ))
+
+(defun org-link-unescape-single-byte-sequence (hex)
+ "Unhexify hex-encoded single byte character sequences."
+ (let ((bytes (remove "" (split-string hex "%")))
+ (ret ""))
+ (while bytes
+ (let* ((b (pop bytes))
+ (a (elt b 0))
+ (b (elt b 1))
+ (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
+ (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0))))
+ (setq ret
+ (concat ret (char-to-string
+ (+ (lsh c1 4) c2))))))
+ ret))
(defun org-xor (a b)
"Exclusive or."
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 08/16] Declare obsolete & alias to respective org-link-unescape-* functions
2011-02-12 22:17 ` Bastien
` (7 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 07/16] Unescape functions moved and renamed from org-protocol.el David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 09/16] Remove obsolete argument in call to org-link-unescape David Maus
` (7 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org-protocol.el (org-protocol-unhex-string)
(org-protocol-unhex-compound)
(org-protocol-unhex-single-byte-sequence): Declare obsolete and
alias to respective org-link-unescape-* functions.
---
lisp/org-protocol.el | 88 +++++++-------------------------------------------
1 files changed, 12 insertions(+), 76 deletions(-)
diff --git a/lisp/org-protocol.el b/lisp/org-protocol.el
index eb77f02..078905a 100644
--- a/lisp/org-protocol.el
+++ b/lisp/org-protocol.el
@@ -130,6 +130,18 @@
(filename &optional up))
(declare-function server-edit "server" (&optional arg))
+(define-obsolete-function-alias
+ 'org-protocol-unhex-compound 'org-link-unescape-compound
+ "2010-11-21")
+
+(define-obsolete-function-alias
+ 'org-protocol-unhex-string 'org-link-unescape
+ "2010-11-21")
+
+(define-obsolete-function-alias
+ 'org-protocol-unhex-single-byte-sequence
+ 'org-link-unescape-single-byte-sequence
+ "2011-11-21")
(defgroup org-protocol nil
"Intercept calls from emacsclient to trigger custom actions.
@@ -292,82 +304,6 @@ part."
(mapcar 'org-protocol-unhex-string split-parts))
split-parts)))
-(defun org-protocol-unhex-string(str)
- "Unhex hexified unicode strings as returned from the JavaScript function
-encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
- (setq str (or str ""))
- (let ((tmp "")
- (case-fold-search t))
- (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str)
- (let* ((start (match-beginning 0))
- (end (match-end 0))
- (hex (match-string 0 str))
- (replacement (org-protocol-unhex-compound (upcase hex))))
- (setq tmp (concat tmp (substring str 0 start) replacement))
- (setq str (substring str end))))
- (setq tmp (concat tmp str))
- tmp))
-
-
-(defun org-protocol-unhex-compound (hex)
- "Unhexify unicode hex-chars. E.g. `%C3%B6' is the German Umlaut `ö'.
-Note: this function also decodes single byte encodings like
-`%E1' (\"á\") if not followed by another `%[A-F0-9]{2}' group."
- (let* ((bytes (remove "" (split-string hex "%")))
- (ret "")
- (eat 0)
- (sum 0))
- (while bytes
- (let* ((b (pop bytes))
- (a (elt b 0))
- (b (elt b 1))
- (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
- (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0)))
- (val (+ (lsh c1 4) c2))
- (shift
- (if (= 0 eat) ;; new byte
- (if (>= val 252) 6
- (if (>= val 248) 5
- (if (>= val 240) 4
- (if (>= val 224) 3
- (if (>= val 192) 2 0)))))
- 6))
- (xor
- (if (= 0 eat) ;; new byte
- (if (>= val 252) 252
- (if (>= val 248) 248
- (if (>= val 240) 240
- (if (>= val 224) 224
- (if (>= val 192) 192 0)))))
- 128)))
- (if (>= val 192) (setq eat shift))
- (setq val (logxor val xor))
- (setq sum (+ (lsh sum shift) val))
- (if (> eat 0) (setq eat (- eat 1)))
- (cond
- ((= 0 eat) ;multi byte
- (setq ret (concat ret (org-char-to-string sum)))
- (setq sum 0))
- ((not bytes) ; single byte(s)
- (setq ret (org-protocol-unhex-single-byte-sequence hex))))
- )) ;; end (while bytes
- ret ))
-
-(defun org-protocol-unhex-single-byte-sequence(hex)
- "Unhexify hex-encoded single byte character sequences."
- (let ((bytes (remove "" (split-string hex "%")))
- (ret ""))
- (while bytes
- (let* ((b (pop bytes))
- (a (elt b 0))
- (b (elt b 1))
- (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
- (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0))))
- (setq ret
- (concat ret (char-to-string
- (+ (lsh c1 4) c2))))))
- ret))
-
(defun org-protocol-flatten-greedy (param-list &optional strip-path replacement)
"Greedy handlers might receive a list like this from emacsclient:
'( (\"/dir/org-protocol:/greedy:/~/path1\" (23 . 12)) (\"/dir/param\")
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 09/16] Remove obsolete argument in call to org-link-unescape
2011-02-12 22:17 ` Bastien
` (8 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 08/16] Declare obsolete & alias to respective org-link-unescape-* functions David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 10/16] Use new percent escape character table format David Maus
` (6 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org-mobile.el (org-mobile-locate-entry): Remove obsolete argument in
call to org-link-unescape.
`org-link-unescape' always unescapes all percent escaped sequences.
---
lisp/org-mobile.el | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/lisp/org-mobile.el b/lisp/org-mobile.el
index a278fb1..6616876 100644
--- a/lisp/org-mobile.el
+++ b/lisp/org-mobile.el
@@ -969,11 +969,10 @@ is currently a noop.")
(if (not (string-match "\\`olp:\\(.*?\\):\\(.*\\)$" link))
nil
(let ((file (match-string 1 link))
- (path (match-string 2 link))
- (table '((?: . "%3a") (?\[ . "%5b") (?\] . "%5d") (?/ . "%2f"))))
- (setq file (org-link-unescape file table))
+ (path (match-string 2 link)))
+ (setq file (org-link-unescape file))
(setq file (expand-file-name file org-directory))
- (setq path (mapcar (lambda (x) (org-link-unescape x table))
+ (setq path (mapcar 'org-link-unescape
(org-split-string path "/")))
(org-find-olp (cons file path))))))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 10/16] Use new percent escape character table format
2011-02-12 22:17 ` Bastien
` (9 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 09/16] Remove obsolete argument in call to org-link-unescape David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 11/16] Add percent sign to list of escape chars David Maus
` (5 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org-mobile.el (org-mobile-escape-olp): Use new percent escape
character table format.
---
lisp/org-mobile.el | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/lisp/org-mobile.el b/lisp/org-mobile.el
index 6616876..fe0a287 100644
--- a/lisp/org-mobile.el
+++ b/lisp/org-mobile.el
@@ -660,7 +660,7 @@ The table of checksums is written to the file mobile-checksums."
(org-mobile-escape-olp (nth 4 (org-heading-components))))))
(defun org-mobile-escape-olp (s)
- (let ((table '((?: . "%3a") (?\[ . "%5b") (?\] . "%5d") (?/ . "%2f"))))
+ (let ((table '(?: ?/)))
(org-link-escape s table)))
;;;###autoload
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 11/16] Add percent sign to list of escape chars
2011-02-12 22:17 ` Bastien
` (10 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 10/16] Use new percent escape character table format David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 12/16] Rename lambda argument David Maus
` (4 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape-chars-browser, org-link-escape-chars): Add
percent sign to list of escape chars.
---
lisp/org.el | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 602462d..370109b 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8543,14 +8543,14 @@ according to FMT (default from `org-email-link-description-format')."
"]"))
(defconst org-link-escape-chars
- '(?\ ?\[ ?\] ?\; ?\= ?\+)
+ '(?\ ?\[ ?\] ?\; ?\= ?\+ ?\%)
"List of characters that should be escaped in link.
This is the list that is used for internal purposes.")
(defvar org-url-encoding-use-url-hexify nil)
(defconst org-link-escape-chars-browser
- '(?\ )
+ '(?\ ?\%)
"List of escapes for characters that are problematic in links.
This is the list that is used before handing over to the browser.")
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 12/16] Rename lambda argument
2011-02-12 22:17 ` Bastien
` (11 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 11/16] Add percent sign to list of escape chars David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 13/16] Refactor unescaping functions David Maus
` (3 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-escape): Rename lambda argument.
---
lisp/org.el | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 1b5c3a8..8d49c05 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8576,8 +8576,8 @@ If optional argument MERGE is set, merge TABLE into
(lambda (char)
(if (or (member char table)
(< char 32) (> char 126))
- (mapconcat (lambda (sequence)
- (format "%%%.2X" sequence))
+ (mapconcat (lambda (sequence-element)
+ (format "%%%.2X" sequence-element))
(encode-coding-char char 'utf-8) "")
(char-to-string char))) text "")))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 13/16] Refactor unescaping functions
2011-02-12 22:17 ` Bastien
` (12 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 12/16] Rename lambda argument David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 14/16] Always percent escape the percent sign David Maus
` (2 subsequent siblings)
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* org.el (org-link-unescape): Simpler algorithm for replacing percent
escapes.
(org-link-unescape-compound): Use cond statements instead of nested
if, convert hex string with string-to-number, save match data.
(org-link-unescape-single-byte-sequence): Use mapconcat and
string-to-number for unescaping single byte sequence.
---
lisp/org.el | 102 ++++++++++++++++++++++------------------------------------
1 files changed, 39 insertions(+), 63 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index fcd421f..f35f898 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8584,77 +8584,53 @@ If optional argument MERGE is set, merge TABLE into
(defun org-link-unescape (str)
"Unhex hexified unicode strings as returned from the JavaScript function
encodeURIComponent. E.g. `%C3%B6' is the german Umlaut `ö'."
- (setq str (or str ""))
- (let ((tmp "")
- (case-fold-search t))
- (while (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str)
- (let* ((start (match-beginning 0))
- (end (match-end 0))
- (hex (match-string 0 str))
- (replacement (org-link-unescape-compound (upcase hex))))
- (setq tmp (concat tmp (substring str 0 start) replacement))
- (setq str (substring str end))))
- (setq tmp (concat tmp str))
- tmp))
+ (unless (and (null str) (string= "" str))
+ (let ((pos 0) (case-fold-search t) unhexed)
+ (while (setq pos (string-match "\\(%[0-9a-f][0-9a-f]\\)+" str pos))
+ (setq unhexed (org-link-unescape-compound (match-string 0 str)))
+ (setq str (replace-match unhexed t t str))
+ (setq pos (+ pos (length unhexed))))))
+ str)
(defun org-link-unescape-compound (hex)
"Unhexify unicode hex-chars. E.g. `%C3%B6' is the German Umlaut `ö'.
Note: this function also decodes single byte encodings like
`%E1' (\"á\") if not followed by another `%[A-F0-9]{2}' group."
- (let* ((bytes (remove "" (split-string hex "%")))
- (ret "")
- (eat 0)
- (sum 0))
- (while bytes
- (let* ((b (pop bytes))
- (a (elt b 0))
- (b (elt b 1))
- (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
- (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0)))
- (val (+ (lsh c1 4) c2))
- (shift
- (if (= 0 eat) ;; new byte
- (if (>= val 252) 6
- (if (>= val 248) 5
- (if (>= val 240) 4
- (if (>= val 224) 3
- (if (>= val 192) 2 0)))))
- 6))
- (xor
- (if (= 0 eat) ;; new byte
- (if (>= val 252) 252
- (if (>= val 248) 248
- (if (>= val 240) 240
- (if (>= val 224) 224
- (if (>= val 192) 192 0)))))
- 128)))
- (if (>= val 192) (setq eat shift))
- (setq val (logxor val xor))
- (setq sum (+ (lsh sum shift) val))
- (if (> eat 0) (setq eat (- eat 1)))
- (cond
- ((= 0 eat) ;multi byte
- (setq ret (concat ret (org-char-to-string sum)))
- (setq sum 0))
- ((not bytes) ; single byte(s)
- (setq ret (org-link-unescape-single-byte-sequence hex))))
- )) ;; end (while bytes
- ret ))
+ (save-match-data
+ (let* ((bytes (cdr (split-string hex "%")))
+ (ret "")
+ (eat 0)
+ (sum 0))
+ (while bytes
+ (let* ((val (string-to-number (pop bytes) 16))
+ (shift-xor
+ (if (= 0 eat)
+ (cond
+ ((>= val 252) (cons 6 252))
+ ((>= val 248) (cons 5 248))
+ ((>= val 240) (cons 4 240))
+ ((>= val 224) (cons 3 224))
+ ((>= val 192) (cons 2 192))
+ (t (cons 0 0)))
+ (cons 6 128))))
+ (if (>= val 192) (setq eat (car shift-xor)))
+ (setq val (logxor val (cdr shift-xor)))
+ (setq sum (+ (lsh sum (car shift-xor)) val))
+ (if (> eat 0) (setq eat (- eat 1)))
+ (cond
+ ((= 0 eat) ;multi byte
+ (setq ret (concat ret (org-char-to-string sum)))
+ (setq sum 0))
+ ((not bytes) ; single byte(s)
+ (setq ret (org-link-unescape-single-byte-sequence hex))))
+ )) ;; end (while bytes
+ ret )))
(defun org-link-unescape-single-byte-sequence (hex)
"Unhexify hex-encoded single byte character sequences."
- (let ((bytes (remove "" (split-string hex "%")))
- (ret ""))
- (while bytes
- (let* ((b (pop bytes))
- (a (elt b 0))
- (b (elt b 1))
- (c1 (if (> a ?9) (+ 10 (- a ?A)) (- a ?0)))
- (c2 (if (> b ?9) (+ 10 (- b ?A)) (- b ?0))))
- (setq ret
- (concat ret (char-to-string
- (+ (lsh c1 4) c2))))))
- ret))
+ (mapconcat (lambda (byte)
+ (char-to-string (string-to-number byte 16)))
+ (cdr (split-string hex "%")) ""))
(defun org-xor (a b)
"Exclusive or."
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 14/16] Always percent escape the percent sign
2011-02-12 22:17 ` Bastien
` (13 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 13/16] Refactor unescaping functions David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 15/16] Use `org-link-unescape' instead of obsolete unhex string function David Maus
2011-02-13 12:01 ` [PATCH 16/16] Throw error if encoding character in utf8 fails David Maus
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* lisp/org.el (org-link-escape, org-link-escape-chars-browser)
(org-link-escape-chars): Always percent escape the percent sign.
---
lisp/org.el | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 8fcb9c4..1415eb1 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8565,14 +8565,14 @@ according to FMT (default from `org-email-link-description-format')."
"]"))
(defconst org-link-escape-chars
- '(?\ ?\[ ?\] ?\; ?\= ?\+ ?\%)
+ '(?\ ?\[ ?\] ?\; ?\= ?\+)
"List of characters that should be escaped in link.
This is the list that is used for internal purposes.")
(defvar org-url-encoding-use-url-hexify nil)
(defconst org-link-escape-chars-browser
- '(?\ ?\%)
+ '(?\ )
"List of escapes for characters that are problematic in links.
This is the list that is used before handing over to the browser.")
@@ -8595,7 +8595,7 @@ If optional argument MERGE is set, merge TABLE into
(mapconcat
(lambda (char)
(if (or (member char table)
- (< char 32) (> char 126))
+ (< char 32) (= char 37) (> char 126))
(mapconcat (lambda (sequence-element)
(format "%%%.2X" sequence-element))
(encode-coding-char char 'utf-8) "")
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 15/16] Use `org-link-unescape' instead of obsolete unhex string function
2011-02-12 22:17 ` Bastien
` (14 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 14/16] Always percent escape the percent sign David Maus
@ 2011-02-13 12:01 ` David Maus
2011-02-13 12:01 ` [PATCH 16/16] Throw error if encoding character in utf8 fails David Maus
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* lisp/org-protocol.el (org-protocol-split-data) (org-protocol-open-source):
Use `org-link-unescape' instead of obsolete unhex string function.
---
lisp/org-protocol.el | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lisp/org-protocol.el b/lisp/org-protocol.el
index 46441db..b1ad0a9 100644
--- a/lisp/org-protocol.el
+++ b/lisp/org-protocol.el
@@ -301,7 +301,7 @@ part."
(if unhexify
(if (fboundp unhexify)
(mapcar unhexify split-parts)
- (mapcar 'org-protocol-unhex-string split-parts))
+ (mapcar 'org-link-unescape split-parts))
split-parts)))
(defun org-protocol-flatten-greedy (param-list &optional strip-path replacement)
@@ -476,7 +476,7 @@ The location for a browser's bookmark should look like this:
;; As we enter this function for a match on our protocol, the return value
;; defaults to nil.
(let ((result nil)
- (f (org-protocol-unhex-string fname)))
+ (f (org-link-unescape fname)))
(catch 'result
(dolist (prolist org-protocol-project-alist)
(let* ((base-url (plist-get (cdr prolist) :base-url))
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 16/16] Throw error if encoding character in utf8 fails
2011-02-12 22:17 ` Bastien
` (15 preceding siblings ...)
2011-02-13 12:01 ` [PATCH 15/16] Use `org-link-unescape' instead of obsolete unhex string function David Maus
@ 2011-02-13 12:01 ` David Maus
16 siblings, 0 replies; 22+ messages in thread
From: David Maus @ 2011-02-13 12:01 UTC (permalink / raw)
To: emacs-orgmode, bastien.guerry; +Cc: David Maus
* lisp/org.el (org-link-escape): Throw error if encoding character in
utf8 fails.
---
lisp/org.el | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lisp/org.el b/lisp/org.el
index 1415eb1..0eb3a2b 100644
--- a/lisp/org.el
+++ b/lisp/org.el
@@ -8598,8 +8598,10 @@ If optional argument MERGE is set, merge TABLE into
(< char 32) (= char 37) (> char 126))
(mapconcat (lambda (sequence-element)
(format "%%%.2X" sequence-element))
- (encode-coding-char char 'utf-8) "")
- (char-to-string char))) text "")))
+ (or (encode-coding-char char 'utf-8)
+ (error "Unable to percent escape character: %s"
+ (char-to-string char))) "")
+ (char-to-string char))) text "")))
(defun org-link-unescape (str)
"Unhex hexified unicode strings as returned from the JavaScript function
--
1.7.2.3
^ permalink raw reply related [flat|nested] 22+ messages in thread