From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Spaces in bare URLs? Date: Tue, 17 Mar 2020 22:14:01 -0400 Message-ID: <78f598f3-44e1-63aa-751a-49c2f7208fe7@kli.org> Mime-Version: 1.0 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:35208) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEODb-0004xa-Mi for emacs-orgmode@gnu.org; Tue, 17 Mar 2020 22:14:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEODa-0004qg-Jv for emacs-orgmode@gnu.org; Tue, 17 Mar 2020 22:14:03 -0400 Received: from pi.meson.org ([96.56.207.26]:47084) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jEODa-0004i3-FN for emacs-orgmode@gnu.org; Tue, 17 Mar 2020 22:14:02 -0400 Received: from nagas.meson.org (nagas [192.168.2.101]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by pi.meson.org (Postfix) with ESMTPS id 9C4FD220418 for ; Tue, 17 Mar 2020 22:14:01 -0400 (EDT) Content-Language: en-US List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane-mx.org@gnu.org Sender: "Emacs-orgmode" To: org-mode mailing list

So, in the "new" org-mode, we've done away with standard percent-encoding of URLs, in favor of a more... idiosyncratic method using backslashes.=C2=A0 So... what is one supposed to do ab= out spaces in URLs?=C2=A0 When they're in [[link format]], with or with= out a description, it's no problem, but org-mode has a long tradition of support for "bare" URLs too.=C2=A0 We're used to being able to t= ype a URL or other link format and have it work, right?=C2=A0 And that doesn't seem (to me) to be a thing that we'd want to abandon.


In org-mode 9.1.9, I can type "info:elisp#Syntactic%20Font%20Lock" and it'd work.=C2=A0= (Maybe not the greatest example, since %-encoding is seen more with http-based URIs, but still).=C2=A0 The percent-encoding is well-established and reliable, and you can *count* on it when nothing else works, because you can always fall back on plain ascii.=C2=A0 But that won't work in org-mode 9.3.6.=C2=A0 Nor will "info:elisp#Syntactic Font Lock" or "info:elisp#Syntactic\ Fo= nt\ Lock" or any other variant I've tried, short of putting it insi= de [[]]s or <>s (in other words, no longer using a bare URL).


I think dropping percent-escaping of URLs was a bad idea, in terms of breaking past usage and lack of consistency with the standard used for URLs everywhere else.=C2=A0 But I don't know what impelled the decision to drop it, so I might well be missing something important.=C2=A0 At any rate, it does leave a hole in wha= t org-mode can do, a thing it used to be able to do and can't anymore.=C2=A0 Is there a right way to do this?=C2=A0 (without usin= g delimiters.)


I haven't yet looked at how this interacts with org-protocol's store-link transaction.


~mark

From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Goaziou Subject: Re: Spaces in bare URLs? Date: Wed, 18 Mar 2020 10:43:50 +0100 Message-ID: <875zf2gfqh.fsf@nicolasgoaziou.fr> References: <78f598f3-44e1-63aa-751a-49c2f7208fe7@kli.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:53567) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEVEz-0000OV-Sg for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 05:43:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEVEy-0001cw-6T for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 05:43:57 -0400 Received: from relay10.mail.gandi.net ([217.70.178.230]:49873) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jEVEx-0001PL-W4 for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 05:43:56 -0400 In-Reply-To: <78f598f3-44e1-63aa-751a-49c2f7208fe7@kli.org> (Mark E. Shoulson's message of "Tue, 17 Mar 2020 22:14:01 -0400") List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane-mx.org@gnu.org Sender: "Emacs-orgmode" To: "Mark E. Shoulson" Cc: org-mode mailing list Hello, "Mark E. Shoulson" writes: > So, in the "new" org-mode, we've done away with standard > percent-encoding of URLs, in favor of a more... idiosyncratic method > using backslashes. ... > So... what is one supposed to do about spaces in URLs? > When they're in [[link format]], with or without a description, it's no problem, but org-mode has a long tradition of support for "bare" URLs too. We're used to being able to type a URL or other link format > and have it work, right? And that doesn't seem (to me) to be a thing > that we'd want to abandon. > > In org-mode 9.1.9, I can type "info:elisp#Syntactic%20Font%20Lock" and it'd work. (Maybe not the greatest example, since %-encoding is seen more with http-based URIs, but still). The > percent-encoding is well-established and reliable Unfortunately, that wasn't reliable. As it is not idempotent, you can never know how many times you need to decode an URL before sending it. Imagine I have a file called "foo%2000.org". Should I link it file:foo%252000.org or file:foo%2000.org? You prefer the former. But what if I forget about the rules? Now, what Org is expected to do with file:foo%252000.org ? Decoding it unconditionally lead to bug reports scattered throughout the years. So did ignoring encoding. The thing is URL encoding is not for human consumption, i.e., we shouldn't have to deal with it. > and you can *count* on it when nothing else works, because you can > always fall back on plain ascii. Current backslash escaping is also well established, and as much ASCII-like as anyone would expect. > But that won't work in org-mode 9.3.6. Nor will > "info:elisp#Syntactic Font Lock" or "info:elisp#Syntactic\ Font\ Lock" > or any other variant I've tried, short of putting it inside [[]]s or > <>s (in other words, no longer using a bare URL). True, but that's a minor annoyance. You apparently prefer to encode a URL manually, replacing each space with %20 (and other characters with more baroque escape sequences), rather than adding <...> (or [[...]]) around it and be done with it. Perhaps this one was the bad idea, after all? > I think dropping percent-escaping of URLs was a bad idea, in terms of breaking past usage and lack of consistency with the standard used for URLs everywhere else. But I don't know what impelled the > decision to drop it, so I might well be missing something important. > > At any rate, it does leave a hole in what org-mode can do, a thing it used to be able to do and can't anymore. Is there a right way to do > this? (without using delimiters.) It was not a bad idea. It is not perfect, but it is still better than what we had, because it is unambiguous. You can still use <...> delimiters, or, as you noted, [[...]]. I understand it breaks your workflow, but there is no loss of feature, and no hole either: you can still link to URL with spaces in it. Regards, -- Nicolas Goaziou From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Mark E. Shoulson" Subject: Re: Spaces in bare URLs? Date: Wed, 18 Mar 2020 16:25:15 -0400 Message-ID: References: <78f598f3-44e1-63aa-751a-49c2f7208fe7@kli.org> <875zf2gfqh.fsf@nicolasgoaziou.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:57288) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEfFe-0007tK-9J for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEfFc-00040U-CW for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:18 -0400 Received: from pi.meson.org ([96.56.207.26]:53706) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jEfFc-0003xX-86 for emacs-orgmode@gnu.org; Wed, 18 Mar 2020 16:25:16 -0400 Received: from nagas.meson.org (nagas [192.168.2.101]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by pi.meson.org (Postfix) with ESMTPS id A018222007D for ; Wed, 18 Mar 2020 16:25:15 -0400 (EDT) In-Reply-To: <875zf2gfqh.fsf@nicolasgoaziou.fr> Content-Language: en-US List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+geo-emacs-orgmode=m.gmane-mx.org@gnu.org Sender: "Emacs-orgmode" To: org-mode mailing list On 3/18/20 5:43 AM, Nicolas Goaziou wrote: > Hello, > > "Mark E. Shoulson" writes: > >> So... what is one supposed to do about spaces in URLs? >> When they're in [[link format]], with or without a description, it's n= o problem, but org-mode has a long tradition of support for "bare" URLs t= oo. We're used to being able to type a URL or other link format >> and have it work, right? And that doesn't seem (to me) to be a thing >> that we'd want to abandon. >> >> In org-mode 9.1.9, I can type "info:elisp#Syntactic%20Font%20Lock" and= it'd work. (Maybe not the greatest example, since %-encoding is seen mo= re with http-based URIs, but still). The >> percent-encoding is well-established and reliable > Unfortunately, that wasn't reliable. As it is not idempotent, you can > never know how many times you need to decode an URL before sending it. Well, any form of escaping is pretty much by definition not idempotent.=C2= =A0=20 That's the whole point of escaping: you have something you can't say, so=20 you make some magical character that changes the meaning of nearby=20 characters so you can describe it in characters you can't say.=C2=A0 And = the=20 price you pay is that now you can no longer say your magical character=20 plain, you have to use another form of escaping to express it (usually=20 the same form as the others).=C2=A0 It's like how it's impossible to comp= ress=20 *every* file to make it smaller and some even have to get bigger.=C2=A0 T= he=20 pigeonhole principle shows _why_ it isn't possible, and escaping shows=20 (one way) _how_ it isn't: say you use high-ascii bytes to represent=20 common strings or something.=C2=A0 How do you represent them when they're= =20 really in the text?=C2=A0 You have to escape them... which makes your fil= e=20 *larger*. > The thing is URL encoding is not for human consumption, i.e., we > shouldn't have to deal with it. This is a good point.=C2=A0 While on one hand it makes sense to be able t= o=20 type URLs that have spaces in them without spaces, it is sort of=20 ridiculous to expect users feel "natural" about typing "%20" instead.=C2=A0= =20 (I think this is why the specs say that you can also escape a space by=20 using the "+" character, in order to make it easier for this most-common=20 of characters... but that weird exception has caused all kinds of=20 hassles in code from that day to this; I know from my own experience.) >> and you can *count* on it when nothing else works, because you can >> always fall back on plain ascii. > Current backslash escaping is also well established, and as much > ASCII-like as anyone would expect. Really?=C2=A0 As ASCII-like as I could expect?=C2=A0 What if my URL is=20 https://he.wikipedia.com/=D7=A9=D7=9C=D7=95=D7=9D_=D7=A2=D7=9C=D7=99=D7=9B= =D7=9D ?=C2=A0 If I am in some backward=20 environment (still all too common) where all I can rely on is ASCII, I=20 can percent-encode the UTF-8 representation and it will work.=C2=A0 Can w= e=20 count on being able to backslash-quote things clear down to ASCII?=C2=A0 = I=20 don't see a way in the docs I've seen. >> But that won't work in org-mode 9.3.6. Nor will >> "info:elisp#Syntactic Font Lock" or "info:elisp#Syntactic\ Font\ Lock" >> or any other variant I've tried, short of putting it inside [[]]s or >> <>s (in other words, no longer using a bare URL). > True, but that's a minor annoyance. > > You apparently prefer to encode a URL manually, replacing each space > with %20 (and other characters with more baroque escape sequences), > rather than adding <...> (or [[...]]) around it and be done with it. > Perhaps this one was the bad idea, after all? Yes, using <>s works, as does [[]].=C2=A0 And yes, I do have to concede t= hat=20 claiming it should be "natural" for a user to hand-escape things with=20 %20s is sort of ridiculous.=C2=A0 Having to reprocess all old org-files f= or=20 such a common notation still seems like more trouble than it was worth,=20 but then you didn't ask me (and you were QUITE RIGHT not to do so!)=C2=A0= I=20 guess a converter-script should also enclose bare URLs in <>, at least=20 if they have spaces or other whitespace. Still don't know about org-protocol and store-link, because I'm lazy.=C2=A0= =20 Right now, at least some of the emacsen I'm working with still use=20 org-9.1.9, so I haven't converted anything. ~mark