* Decoding URLs input @ 2021-07-03 9:40 Jean Louis 2021-07-03 9:56 ` Jean Louis 2021-07-03 11:10 ` Yuri Khan 0 siblings, 2 replies; 8+ messages in thread From: Jean Louis @ 2021-07-03 9:40 UTC (permalink / raw) To: Help GNU Emacs Hello, As I am developing Double Opt-In CGI script served by Emacs I am unsure if this function is correct to be used the encoded strings that come from URL GET requests, like http://www.example.com/?message=Hello%20There (rfc2231-decode-encoded-string "Hello%20there") ⇒ "Hello there" If anybody knows or have clues, let me know. In other programming languages I have not been thinking of RFC, I don't know which RFC applies there. Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns In support of Richard M. Stallman https://stallmansupport.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 9:40 Decoding URLs input Jean Louis @ 2021-07-03 9:56 ` Jean Louis 2021-07-03 11:10 ` Yuri Khan 1 sibling, 0 replies; 8+ messages in thread From: Jean Louis @ 2021-07-03 9:56 UTC (permalink / raw) To: Jean Louis; +Cc: Help GNU Emacs Is it maybe (url-unhex-string query-string)? I have started using that function, but I am unsure. -- Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns In support of Richard M. Stallman https://stallmansupport.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 9:40 Decoding URLs input Jean Louis 2021-07-03 9:56 ` Jean Louis @ 2021-07-03 11:10 ` Yuri Khan 2021-07-03 12:04 ` Jean Louis 2021-07-03 19:17 ` Jean Louis 1 sibling, 2 replies; 8+ messages in thread From: Yuri Khan @ 2021-07-03 11:10 UTC (permalink / raw) To: Jean Louis; +Cc: Help GNU Emacs On Sat, 3 Jul 2021 at 16:41, Jean Louis <bugs@gnu.support> wrote: > As I am developing Double Opt-In CGI script served by Emacs I am > unsure if this function is correct to be used the encoded strings that > come from URL GET requests, like http://www.example.com/?message=Hello%20There > > (rfc2231-decode-encoded-string "Hello%20there") ⇒ "Hello there" > > If anybody knows or have clues, let me know. In other programming > languages I have not been thinking of RFC, I don't know which RFC > applies there. Why not look at the RFC referenced in order to see whether it is or is not relevant to your task? https://datatracker.ietf.org/doc/html/rfc2231 It talks about encoding MIME headers, which is not what you’re dealing with; and its encoded strings look like <encoding>'<locale>'<percent-encoded-string>, which is not what you have. What you are dealing with is a URL, specifically, its query string part. These are described in RFC 3986, and its percent-encoding scheme in sections 2.1 and 2.5. (url-unhex-string …) will do half the work for you: It will decode percent-encoded sequences into bytes. By convention, in URLs, characters are UTF-8-encoded before percent-encoding (see RFC 3986 § 2.5), so you’ll need to use: (decode-coding-string (url-unhex-string s) 'utf-8) to get a fully decoded text string. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 11:10 ` Yuri Khan @ 2021-07-03 12:04 ` Jean Louis 2021-07-03 19:17 ` Jean Louis 1 sibling, 0 replies; 8+ messages in thread From: Jean Louis @ 2021-07-03 12:04 UTC (permalink / raw) To: Yuri Khan; +Cc: Help GNU Emacs I appreciate this tip and will test it On July 3, 2021 11:10:47 AM UTC, Yuri Khan <yuri.v.khan@gmail.com> wrote: >On Sat, 3 Jul 2021 at 16:41, Jean Louis <bugs@gnu.support> wrote: > >> As I am developing Double Opt-In CGI script served by Emacs I am >> unsure if this function is correct to be used the encoded strings >that >> come from URL GET requests, like >http://www.example.com/?message=Hello%20There >> >> (rfc2231-decode-encoded-string "Hello%20there") ⇒ "Hello there" >> >> If anybody knows or have clues, let me know. In other programming >> languages I have not been thinking of RFC, I don't know which RFC >> applies there. > >Why not look at the RFC referenced in order to see whether it is or is >not relevant to your task? > >https://datatracker.ietf.org/doc/html/rfc2231 > >It talks about encoding MIME headers, which is not what you’re dealing >with; and its encoded strings look like ><encoding>'<locale>'<percent-encoded-string>, which is not what you >have. > >What you are dealing with is a URL, specifically, its query string >part. These are described in RFC 3986, and its percent-encoding scheme >in sections 2.1 and 2.5. > >(url-unhex-string …) will do half the work for you: It will decode >percent-encoded sequences into bytes. By convention, in URLs, >characters are UTF-8-encoded before percent-encoding (see RFC 3986 § >2.5), so you’ll need to use: > > (decode-coding-string (url-unhex-string s) 'utf-8) > >to get a fully decoded text string. Jean ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 11:10 ` Yuri Khan 2021-07-03 12:04 ` Jean Louis @ 2021-07-03 19:17 ` Jean Louis 2021-07-03 20:16 ` Yuri Khan 1 sibling, 1 reply; 8+ messages in thread From: Jean Louis @ 2021-07-03 19:17 UTC (permalink / raw) To: Yuri Khan; +Cc: Help GNU Emacs * Yuri Khan <yuri.v.khan@gmail.com> [2021-07-03 14:12]: > What you are dealing with is a URL, specifically, its query string > part. These are described in RFC 3986, and its percent-encoding scheme > in sections 2.1 and 2.5. > > (url-unhex-string …) will do half the work for you: It will decode > percent-encoded sequences into bytes. By convention, in URLs, > characters are UTF-8-encoded before percent-encoding (see RFC 3986 § > 2.5), so you’ll need to use: > > (decode-coding-string (url-unhex-string s) 'utf-8) > > to get a fully decoded text string. That is very correct and I have implemented that now. Until now it worked without `decode-coding-string' and I totally forgot UTF-8. When I faced the fact that spaces are replaced with plus `+' I started diggin more. It is not first time to deal with it, who knows which time and each time I stumble upon UTF-8 handlings, this time you were one step ahead of me, I have not stumbled upon it and could not discover what is missing. From docstring of `url-unhex-string' I did not expect it would give just bytes back, then that should be IMHO described there, I am not sure really. Maybe it is assumed for programmer to know that. The docstring is poor, it says like: "Remove %XX embedded spaces, etc in a URL." -- with "remove" I don't expect converting UTF-8 into bytes. I guess now it is clear. I am now solving the issue that spaces are converted to plus sign and that I have to convert + signs maybe before: (decode-coding-string (url-unhex-string "Hello+There") 'utf-8) but maybe not before, maybe I leave it and convert later. Problem I have encountered is that library subr.el does not provide feature 'subr -- and I think I did file report but without acknowledgment and without seeing it being filed under my email. So I wait. So I cannot use in CGI script the function `string-replace' or as it asks for that file but I cannot `require' it, as it is not "provided", so I have to add that line. I would not like really fiddling on server with main Emacs files. -- Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns In support of Richard M. Stallman https://stallmansupport.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 19:17 ` Jean Louis @ 2021-07-03 20:16 ` Yuri Khan 2021-07-03 22:18 ` Jean Louis 2021-07-04 4:22 ` Eli Zaretskii 0 siblings, 2 replies; 8+ messages in thread From: Yuri Khan @ 2021-07-03 20:16 UTC (permalink / raw) To: Yuri Khan, Help GNU Emacs On Sun, 4 Jul 2021 at 02:20, Jean Louis <bugs@gnu.support> wrote: > From docstring of `url-unhex-string' I did not expect it would give > just bytes back, then that should be IMHO described there, I am not > sure really. Maybe it is assumed for programmer to know that. I just fed it some percent-encoded sequences that I knew would result in invalid UTF-8 when decoded. If it were doing a full decode, I expected it to signal an error. It didn’t. > The docstring is poor, it says like: "Remove %XX embedded spaces, etc in a > URL." -- with "remove" I don't expect converting UTF-8 into bytes. Yeah, that is bad. If I see “remove %xx” in a docstring, I expect (string= (f "Hello%20World") "HelloWorld"). > I am now solving the issue that spaces are converted to plus sign and > that I have to convert + signs maybe before: > (decode-coding-string (url-unhex-string "Hello+There") 'utf-8) > but maybe not before, maybe I leave it and convert later. You have to replace them before percent-decoding. If you try it after percent-decoding, you will not be able to distinguish a + that encodes a space from a + that you just decoded from %2B. Luckily, spaces never occur in a valid encoded query string; if they did and had some meaning, you’d have to decode + *at the same time* as %xx. Here, have some test cases: "Hello+There%7DWorld" → "Hello There}World" "Hello%2BThere%7DWorld" → "Hello+There}World" By the way, you’re in for some unspecified amount of pain by trying to implement a web application without a framework. (And by a framework I mean a library that would give you well-tested means to encode/decode URL parts, HTTP headers, gzipped request/response bodies, base64, quoted-printable, application/x-www-form-urlencoded, multipart/form-data, json, …) CGI is not nearly as simple as it initially appears to be when you read a hello-cgi-world tutorial. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 20:16 ` Yuri Khan @ 2021-07-03 22:18 ` Jean Louis 2021-07-04 4:22 ` Eli Zaretskii 1 sibling, 0 replies; 8+ messages in thread From: Jean Louis @ 2021-07-03 22:18 UTC (permalink / raw) To: Yuri Khan; +Cc: Help GNU Emacs * Yuri Khan <yuri.v.khan@gmail.com> [2021-07-03 23:17]: > I just fed it some percent-encoded sequences that I knew would result > in invalid UTF-8 when decoded. If it were doing a full decode, I > expected it to signal an error. It didn’t. > > > The docstring is poor, it says like: "Remove %XX embedded spaces, etc in a > > URL." -- with "remove" I don't expect converting UTF-8 into bytes. > Yeah, that is bad. If I see “remove %xx” in a docstring, I expect > (string= (f "Hello%20World") "HelloWorld"). Yes. Can you correct that docstring maybe? > > I am now solving the issue that spaces are converted to plus sign and > > that I have to convert + signs maybe before: > > (decode-coding-string (url-unhex-string "Hello+There") 'utf-8) > > but maybe not before, maybe I leave it and convert later. > > You have to replace them before percent-decoding. If you try it after > percent-decoding, you will not be able to distinguish a + that encodes > a space from a + that you just decoded from %2B. Luckily, spaces never > occur in a valid encoded query string; if they did and had some > meaning, you’d have to decode + *at the same time* as %xx. Exactly. I just did not yet get into that analysis and thanks for your quick one! Now it is clear I have to do it. > By the way, you’re in for some unspecified amount of pain by trying to > implement a web application without a framework. (And by a framework I > mean a library that would give you well-tested means to encode/decode > URL parts, HTTP headers, gzipped request/response bodies, base64, > quoted-printable, application/x-www-form-urlencoded, > multipart/form-data, json, …) CGI is not nearly as simple as it > initially appears to be when you read a hello-cgi-world tutorial. Definitely not as simple. Though for the specific need it may be very compact. There also exist simple Emacs CGI libraries though nearly not as comprehensive as you mentioned it. The Double Opt-In already works with cosmetic errors. It will soon be perfected with these information. Double Opt-In is to receive subscription, redirect to the subscription confirmation page that in turn could redirect to sales page or be sales page or other page; to send email to subscriber to confirm, to receive confirmation and dispatch the Emacs hash to administrator; to receive unsubscribe request without any hesitations and dispatch to administrator; offer to visitor to subscribe again. I am designing that to be offline just as I have been doing it long time before, for years. Database is not online. No people's data should be ever released online. This is for business secret purposes. One can see that databases leak all the time on raidforums.com And practically it works well, it generates relations. I consider it one of most important scripts. The old Perl Form script type I have long converted to Common Lisp. For my specific need: - encode/decode URL parts is resolved as I only receive URL and dispatch simpler confirmation URLs; - HTTP headers, I just use these and nothing more so far: (defun rcd-cgi-headers (&optional content-type) "Prints basic HTTP headers for HTML" (let ((content-type (or content-type "text/html"))) (princ (format "Content-type: %s\n\n" content-type)))) (defun rcd-cgi-redirect (url) "Redirect to URL." (princ (concat "Location: " url "\n\n")) (unless (eq major-mode 'emacs-lisp-mode) (kill-emacs 0))) - gzipped request/response bodies -- hmm, I think I will not get gzipped request, I have no idea right now. Responses definitely not, as the only response is either error or redirect to a page. I will keep redirect pages in the URL itself to have the script totally on its own without anything much hard coded inside. - I would like to have a line encrypted request. I have used Tiny Encryption in Perl and it worked well. Do you know any single line encryption for emacs? Maybe I can use OpenSSL. I need a stream cipher. https://en.wikipedia.org/wiki/Tiny_Encryption_Algorithm#Versions I don't know how to use this one, but that maybe what I could use: https://github.com/skeeto/emacs-chacha20 as for URL subscribe requests it is best to have just one string encrypted, like doi.cgi?njadsjnasnfdkjsbfbsfbhj that nothing is shown to user. That is how I have been doing it before with Perl. I may simply use external program and encrypt URL requests similar like: echo '(mid 1 eid 2 cid 3 tile "Subscribe to business")' | openssl enc -ChaCha20 -e -k some -pbkdf2 -base64 -A U2FsdGVkX192ic8hOU15mR6zjoYK/rpRA/NkgHohy6eO2A+W8EHuopAigBcc57wKR/sxMYqPV1ESYEY523DS/h0= as the idea is to keep the script free of hard coding. It should only authorize email addresses that server is in charge of. Maybe I can do that with gnutls- functions? I just don't know how. That would be better as not to have external dependencies. - base64 functions exist in Emacs already? Any problem? I will not need it. - quoted-printable - they also exist in Emacs. - application/x-www-form-urlencoded -- yes, definitely, up - multipart/form-data, json -- json functions are there in Emacs, though I think I will not need it. -- Jean Take action in Free Software Foundation campaigns: https://www.fsf.org/campaigns In support of Richard M. Stallman https://stallmansupport.org/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Decoding URLs input 2021-07-03 20:16 ` Yuri Khan 2021-07-03 22:18 ` Jean Louis @ 2021-07-04 4:22 ` Eli Zaretskii 1 sibling, 0 replies; 8+ messages in thread From: Eli Zaretskii @ 2021-07-04 4:22 UTC (permalink / raw) To: help-gnu-emacs > From: Yuri Khan <yuri.v.khan@gmail.com> > Date: Sun, 4 Jul 2021 03:16:37 +0700 > > I just fed it some percent-encoded sequences that I knew would result > in invalid UTF-8 when decoded. If it were doing a full decode, I > expected it to signal an error. It didn’t. That's not a reliable sign that a function returns unibyte strings. Most Emacs APIs that decode strings don't signal errors if they encounter invalid sequences; instead, they decode those into raw bytes. The design principle is to support raw bytes in strings and let the application deal with them if they are not expected. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-07-04 4:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-07-03 9:40 Decoding URLs input Jean Louis 2021-07-03 9:56 ` Jean Louis 2021-07-03 11:10 ` Yuri Khan 2021-07-03 12:04 ` Jean Louis 2021-07-03 19:17 ` Jean Louis 2021-07-03 20:16 ` Yuri Khan 2021-07-03 22:18 ` Jean Louis 2021-07-04 4:22 ` Eli Zaretskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).