all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* function to decode url percent encoding
@ 2010-05-18 18:12 Xah Lee
  2010-05-18 18:26 ` Xah Lee
  0 siblings, 1 reply; 5+ messages in thread
From: Xah Lee @ 2010-05-18 18:12 UTC (permalink / raw)
  To: help-gnu-emacs

is there a function that decode the url percent encoding?

e.g.
http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem

should become

http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem

that's a EN DASH, unicode 8211.

but that just unhex, and generate gibberish if the url contain unicode
chars.

thanks.

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: function to decode url percent encoding
  2010-05-18 18:12 function to decode url percent encoding Xah Lee
@ 2010-05-18 18:26 ` Xah Lee
       [not found]   ` <jwvvdakrk9t.fsf-monnier+gnu.emacs.help@gnu.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Xah Lee @ 2010-05-18 18:26 UTC (permalink / raw)
  To: help-gnu-emacs

some missing section from my previous post...

I know there's a

  (require 'gnus-util)
 gnus-url-unhex-string

but that just unhex, and generate gibberish if the url contains
unicode chars.

some study shows that the “%E2%80%93” are hexdecimals E2 80 93, and is
the byte sequence of the en dash char by utf-8 encoding.

So, i guess i could parse the url then interpret the %x string as
utf-8 hex bytes then turn them back to unicode chars. Any idea if
there's built in function that helps this?

 Xah


On May 18, 11:12 am, Xah Lee <xah...@gmail.com> wrote:
> is there a function that decode the url percent encoding?
>
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
>
> should become
>
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
>
> that's a EN DASH, unicode 8211.
>
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
>
> thanks.
>
>   Xah
> ∑http://xahlee.org/
>
> ☄



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: function to decode url percent encoding
       [not found]       ` <jwvhbm4oufs.fsf-monnier+gnu.emacs.help@gnu.org>
@ 2010-05-24 14:33         ` Xah Lee
  2010-05-24 17:55           ` Xah Lee
  2010-05-25  0:26           ` Jason Rumney
  0 siblings, 2 replies; 5+ messages in thread
From: Xah Lee @ 2010-05-24 14:33 UTC (permalink / raw)
  To: help-gnu-emacs

On May 19, 7:55 am, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
> >>> So, i guess i could parse the url then interpret the %x string as
> >>> utf-8 hex bytes then turn them back to unicode chars. Any idea if
> >>> there's built in function that helps this?
> >> There should be one somewhere in the URL package, although I don't know
> >> if it does work right.  If you find it and discover it doesn't work right,
> >> please report it as a bug.
> > It's `url-unhex-string', but it does not work, I guess.
> > (url-unhex-string (url-hexify-string "ä"))
>
> Indeed, we have some problems here:
> 1- a bug in the implementation makes it unwittingly decode the bytes as
>    latin-1.
> 2- the function actually does not decode the result.
>
> Point 1 will be fixed in Emacs-23.3.
> In the meantime you can revert the accidentalencoding:
>
>   (encode-coding-string (url-unhex-string (url-hexify-string "ä")) 'latin-1)
>
> As for point 2 you can do that manually after the call:
>
>   (decode-coding-string (url-unhex-string (url-hexify-string "ä")) 'utf-8)
>
> or if you need to work around point 1 as well (but note that if point
> 1 is fixed, then the below won't work right):
>
>   (decode-coding-string (encode-coding-string
>                          (url-unhex-string (url-hexify-string "ä"))
>                          'latin-1)
>                         'utf-8)
>
> As for whether point 2 should be fixed or not, I'm not completely sure
> yet (I'd tend to say yes, tho).
>
>         Stefan

Thanks all for the answers.

José A Romero L has filed a bug report. #6252

http://groups.google.com/group/gnu.emacs.bug/browse_frm/thread/908f7f39589a9014#
http://groups.google.com/group/gnu.emacs.bug/browse_frm/thread/e997a39309022a3d#

(i cant find the bug in the official site
http://emacsbugs.donarmstrong.com/cgi-bin/pkgreport.cgi?package=emacs
)

there's some temp solutions from Jose's post and Stefan Monnier's
post.

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: function to decode url percent encoding
  2010-05-24 14:33         ` Xah Lee
@ 2010-05-24 17:55           ` Xah Lee
  2010-05-25  0:26           ` Jason Rumney
  1 sibling, 0 replies; 5+ messages in thread
From: Xah Lee @ 2010-05-24 17:55 UTC (permalink / raw)
  To: help-gnu-emacs

did some more testing about this.

the issue first of all seems to be that what characters should be
percent encoded.

i did some testing with browsers, IE, Firefox, Safari, Opera, all
behaved a bit differently in what they think should be percent
encoded.

• URL Percent Encoding and Unicode
  http://xahlee.org/js/url_encoding_unicode.html

text version excerpt follows

---------------------
Summary

Here's some summary of the behavior as it appears from above tests:

    * Firefox (v 3.6.3), is the most aggressive in turning characters
in url into the percent encoded form.
    * Google Chrome (4.1.249.1064 (45376)) will change unicode chars
into percent encoded form, but not parenthesis chars.
    * Safari (4.0.5 (531.22.7)) is better, in that it simply show the
characters as is, as much as it can.
    * Opera (v 10.10, build 1893) is the best, it shows unicode and
paren and en-dash as is.
    * IE (8.0.6001.18904), seems to take the approach that it doesn't
do anything to the url. Whatever you pasted in, remain unchanged

--------------

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: function to decode url percent encoding
  2010-05-24 14:33         ` Xah Lee
  2010-05-24 17:55           ` Xah Lee
@ 2010-05-25  0:26           ` Jason Rumney
  1 sibling, 0 replies; 5+ messages in thread
From: Jason Rumney @ 2010-05-25  0:26 UTC (permalink / raw)
  To: help-gnu-emacs

On May 24, 10:33 pm, Xah Lee <xah...@gmail.com> wrote:
>
> (i cant find the bug in the official sitehttp://emacsbugs.donarmstrong.com/cgi-bin/pkgreport.cgi?package=emacs
> )

Try http://debbugs.gnu.org/emacs


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-25  0:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-18 18:12 function to decode url percent encoding Xah Lee
2010-05-18 18:26 ` Xah Lee
     [not found]   ` <jwvvdakrk9t.fsf-monnier+gnu.emacs.help@gnu.org>
     [not found]     ` <878w7gaox3.fsf@fh-trier.de>
     [not found]       ` <jwvhbm4oufs.fsf-monnier+gnu.emacs.help@gnu.org>
2010-05-24 14:33         ` Xah Lee
2010-05-24 17:55           ` Xah Lee
2010-05-25  0:26           ` Jason Rumney

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.