* function to decode url percent encoding @ 2010-05-18 18:12 Xah Lee 2010-05-18 18:26 ` Xah Lee 0 siblings, 1 reply; 5+ messages in thread From: Xah Lee @ 2010-05-18 18:12 UTC (permalink / raw) To: help-gnu-emacs is there a function that decode the url percent encoding? e.g. http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem should become http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem that's a EN DASH, unicode 8211. but that just unhex, and generate gibberish if the url contain unicode chars. thanks. Xah ∑ http://xahlee.org/ ☄ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: function to decode url percent encoding 2010-05-18 18:12 function to decode url percent encoding Xah Lee @ 2010-05-18 18:26 ` Xah Lee [not found] ` <jwvvdakrk9t.fsf-monnier+gnu.emacs.help@gnu.org> 0 siblings, 1 reply; 5+ messages in thread From: Xah Lee @ 2010-05-18 18:26 UTC (permalink / raw) To: help-gnu-emacs some missing section from my previous post... I know there's a (require 'gnus-util) gnus-url-unhex-string but that just unhex, and generate gibberish if the url contains unicode chars. some study shows that the “%E2%80%93” are hexdecimals E2 80 93, and is the byte sequence of the en dash char by utf-8 encoding. So, i guess i could parse the url then interpret the %x string as utf-8 hex bytes then turn them back to unicode chars. Any idea if there's built in function that helps this? Xah On May 18, 11:12 am, Xah Lee <xah...@gmail.com> wrote: > is there a function that decode the url percent encoding? > > e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem > > should become > > http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem > > that's a EN DASH, unicode 8211. > > but that just unhex, and generate gibberish if the url contain unicode > chars. > > thanks. > > Xah > ∑http://xahlee.org/ > > ☄ ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <jwvvdakrk9t.fsf-monnier+gnu.emacs.help@gnu.org>]
[parent not found: <878w7gaox3.fsf@fh-trier.de>]
[parent not found: <jwvhbm4oufs.fsf-monnier+gnu.emacs.help@gnu.org>]
* Re: function to decode url percent encoding [not found] ` <jwvhbm4oufs.fsf-monnier+gnu.emacs.help@gnu.org> @ 2010-05-24 14:33 ` Xah Lee 2010-05-24 17:55 ` Xah Lee 2010-05-25 0:26 ` Jason Rumney 0 siblings, 2 replies; 5+ messages in thread From: Xah Lee @ 2010-05-24 14:33 UTC (permalink / raw) To: help-gnu-emacs On May 19, 7:55 am, Stefan Monnier <monn...@iro.umontreal.ca> wrote: > >>> So, i guess i could parse the url then interpret the %x string as > >>> utf-8 hex bytes then turn them back to unicode chars. Any idea if > >>> there's built in function that helps this? > >> There should be one somewhere in the URL package, although I don't know > >> if it does work right. If you find it and discover it doesn't work right, > >> please report it as a bug. > > It's `url-unhex-string', but it does not work, I guess. > > (url-unhex-string (url-hexify-string "ä")) > > Indeed, we have some problems here: > 1- a bug in the implementation makes it unwittingly decode the bytes as > latin-1. > 2- the function actually does not decode the result. > > Point 1 will be fixed in Emacs-23.3. > In the meantime you can revert the accidentalencoding: > > (encode-coding-string (url-unhex-string (url-hexify-string "ä")) 'latin-1) > > As for point 2 you can do that manually after the call: > > (decode-coding-string (url-unhex-string (url-hexify-string "ä")) 'utf-8) > > or if you need to work around point 1 as well (but note that if point > 1 is fixed, then the below won't work right): > > (decode-coding-string (encode-coding-string > (url-unhex-string (url-hexify-string "ä")) > 'latin-1) > 'utf-8) > > As for whether point 2 should be fixed or not, I'm not completely sure > yet (I'd tend to say yes, tho). > > Stefan Thanks all for the answers. José A Romero L has filed a bug report. #6252 http://groups.google.com/group/gnu.emacs.bug/browse_frm/thread/908f7f39589a9014# http://groups.google.com/group/gnu.emacs.bug/browse_frm/thread/e997a39309022a3d# (i cant find the bug in the official site http://emacsbugs.donarmstrong.com/cgi-bin/pkgreport.cgi?package=emacs ) there's some temp solutions from Jose's post and Stefan Monnier's post. Xah ∑ http://xahlee.org/ ☄ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: function to decode url percent encoding 2010-05-24 14:33 ` Xah Lee @ 2010-05-24 17:55 ` Xah Lee 2010-05-25 0:26 ` Jason Rumney 1 sibling, 0 replies; 5+ messages in thread From: Xah Lee @ 2010-05-24 17:55 UTC (permalink / raw) To: help-gnu-emacs did some more testing about this. the issue first of all seems to be that what characters should be percent encoded. i did some testing with browsers, IE, Firefox, Safari, Opera, all behaved a bit differently in what they think should be percent encoded. • URL Percent Encoding and Unicode http://xahlee.org/js/url_encoding_unicode.html text version excerpt follows --------------------- Summary Here's some summary of the behavior as it appears from above tests: * Firefox (v 3.6.3), is the most aggressive in turning characters in url into the percent encoded form. * Google Chrome (4.1.249.1064 (45376)) will change unicode chars into percent encoded form, but not parenthesis chars. * Safari (4.0.5 (531.22.7)) is better, in that it simply show the characters as is, as much as it can. * Opera (v 10.10, build 1893) is the best, it shows unicode and paren and en-dash as is. * IE (8.0.6001.18904), seems to take the approach that it doesn't do anything to the url. Whatever you pasted in, remain unchanged -------------- Xah ∑ http://xahlee.org/ ☄ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: function to decode url percent encoding 2010-05-24 14:33 ` Xah Lee 2010-05-24 17:55 ` Xah Lee @ 2010-05-25 0:26 ` Jason Rumney 1 sibling, 0 replies; 5+ messages in thread From: Jason Rumney @ 2010-05-25 0:26 UTC (permalink / raw) To: help-gnu-emacs On May 24, 10:33 pm, Xah Lee <xah...@gmail.com> wrote: > > (i cant find the bug in the official sitehttp://emacsbugs.donarmstrong.com/cgi-bin/pkgreport.cgi?package=emacs > ) Try http://debbugs.gnu.org/emacs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-05-25 0:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-18 18:12 function to decode url percent encoding Xah Lee 2010-05-18 18:26 ` Xah Lee [not found] ` <jwvvdakrk9t.fsf-monnier+gnu.emacs.help@gnu.org> [not found] ` <878w7gaox3.fsf@fh-trier.de> [not found] ` <jwvhbm4oufs.fsf-monnier+gnu.emacs.help@gnu.org> 2010-05-24 14:33 ` Xah Lee 2010-05-24 17:55 ` Xah Lee 2010-05-25 0:26 ` Jason Rumney
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.