From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Maus Subject: Re: [bug] Org link dialog escapes URL spaces incorrectly Date: Sun, 06 Nov 2011 15:48:44 +0100 Message-ID: <87hb2hcnn7.wl%dmaus@ictsoc.de> References: <23807.1320424380@alphaville.dokosmarshall.org> <24425.1320425559@alphaville.dokosmarshall.org> <26969.1320431142@alphaville.dokosmarshall.org> <87mxcazmvj.wl%dmaus@ictsoc.de> <23704.1320507536@alphaville.dokosmarshall.org> Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: multipart/signed; boundary="pgp-sign-Multipart_Sun_Nov__6_15:48:44_2011-1"; micalg=pgp-sha256; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([140.186.70.92]:41745) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RN41l-00033b-2P for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 09:48:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RN41i-0002Af-WE for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 09:48:53 -0500 Received: from app1a.xlhost.de ([213.202.242.161]:60151) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RN41i-0002AZ-F8 for emacs-orgmode@gnu.org; Sun, 06 Nov 2011 09:48:50 -0500 In-Reply-To: <23704.1320507536@alphaville.dokosmarshall.org> List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org Sender: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane.org@gnu.org To: nicholas.dokos@hp.com Cc: David Maus , Jeff Horn , Org-mode ml --pgp-sign-Multipart_Sun_Nov__6_15:48:44_2011-1 Content-Type: text/plain; charset=US-ASCII At Sat, 05 Nov 2011 11:38:56 -0400, Nick Dokos wrote: > > David Maus wrote: > > > At Fri, 04 Nov 2011 14:25:42 -0400, > > Nick Dokos wrote: > > > > > > Nick Dokos wrote: > > > > > > > It probably does, but that's probably not the best place to do it: it might be > > > > better to do it in the (setq link on line 9090 or thereabouts. Otherwise, in > > > > the *other* case (editing the link at point), we'll end up unescaping twice: > > > > probably not a problem, since unescaping should be idempotent (in contrast to > > > > escaping ;-) ) but why do it twice? > > > > > > > > > > Brian Wightman pointed out to me that the idempotent part of the > > > statement above is definitely wrong (d'oh). The original URL that Jeff > > > Horn posted, when unescaped once, would be completely free of % signs. > > > But if the second (doubly-escaped) form is pasted into the minibuffer, > > > then unescaping once would not be enough. So I presume the thing to do > > > is to take the URL and unescape it repeatedly until it loses all > > > escapes, and then escape it *once* before inserting it in the org > > > buffer. > > > > > > Sounds icky, kludgy, dirty. The question is: 1) is it a solution? > > > and 2) is there a better one? > > > > No, this wouldn't be a solution. Consider a link with the sequence > > %2525 -- Unescape until no more escapes (or rather "escapes") will > > produce a single `%', not %25. Either escape once, or not at all. > > > > What roughly happens is this: > > > > 1. The user enters a link via `org-insert-link' > > 2. Org escapes the link and writes it to the buffer > > 3. The user opens the link with `org-open-at-poin' > > 4. Org reads the link from the buffer and unescapes it > > 5. The link gets escaped and passed to the cosuming application (i.e. browser) > > > > For steps 2 and 4 it is guaranteed that > > > > (string= link (org-link-unescape (org-link-escape link))) > > > > Thus, the problem is not in 2 or 4, but in 1 or 5. > > > > Step 5 assumes, that a link entered by the user in step 1 was an > > unescaped link and thus needs escaping before it is passed to the > > cosuming application. If you enter a link in step 1 that already is > > escaped, this assumption fails and you'll end up with a double-escaped > > link that is passed to the consumer. > > > > In other words, the question is: How to decide whether an arbitrary > > URL is percent-escaped or not? > > > > Now here's the problem: You can't. Is > > > > "http://example.tld/foo%40bar" > > > > already escaped or not? You can't tell for sure. It depends on the > > application you copied the link from.[1] > > > > What we could do in step 5 is... guess. If the (unescaped) link > > produced by step 4 does contain characters that need escaping, we > > escape the link. Otherwise we don't. > > > > Not quiet sure about the impact of such a change. > > > > Best, > > -- David > > > > [1] Even worse: It may even depend on /how/ or /where/ you copied the > > link. E.g. the link to a wikipedia page about set theory is copied as > > > > http://de.wikipedia.org/wiki/Menge_%28Mathematik%29 > > > > if C-c'ed from the address bar but copied as > > > > http://de.wikipedia.org/wiki/Menge_(Mathematik) > > > > if C-c'ed via "Copy link to clipboard" at another page (Iceweasel > > 3.6.23). > > What a mess - thanks for the clear explanation. Be that as it may, the > treatment of links in org-insert-link is inconsistent: in one case, the > link is unescaped once, in the other not at all. Hm. I still don't see an inconsistency here: `org-insert-link' reads a link, `org-open-at-point' calls the target application with exactly this link but applies percent escaping. Maybe we could drop this last escaping and push the responsibilty of handling non-conformant URLs to the target application. After all: If the user enters an unescaped link and the target application can't handle it, it's the users fault. > So I presume that a single unescaping is desirable (it would > e.g. deal with both wikipedia links correctly). But there will be > corner cases where this will fail and some manual editing of the > link will be necessary, If you mean unescape the user input in step 1 (`org-insert-link'): This could work, too. Besides the edge cases it requires some care to stored links (via `org-store-links') -- the percent character is a valid character for certain link types (e.g. message ids). Personally I start to like the idea of just dopping the escaping before calling the target application. Best, -- David -- OpenPGP... 0x99ADB83B5A4478E6 Jabber.... dmjena@jabber.org Email..... dmaus@ictsoc.de --pgp-sign-Multipart_Sun_Nov__6_15:48:44_2011-1 Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iF4EABEIAAYFAk62nkwACgkQma24O1pEeOaQngD/XErasQQt2oRTW8K7hXIHvE9o 9bb/aa+joAFPc3ul6+QA/17WaMpVk7Hr2ET/ILcqWGErtSFxvPs6lfELR68mAmHH =88Rq -----END PGP SIGNATURE----- --pgp-sign-Multipart_Sun_Nov__6_15:48:44_2011-1--