From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Newsgroups: gmane.emacs.bugs Subject: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. Date: Sun, 23 May 2010 01:46:54 +0200 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1274636729 12760 80.91.229.12 (23 May 2010 17:45:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 23 May 2010 17:45:29 +0000 (UTC) To: 6252@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun May 23 19:45:22 2010 connect(): No such file or directory Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OGFEn-0002eF-FE for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 May 2010 19:45:21 +0200 Original-Received: from localhost ([127.0.0.1]:37728 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGFEm-0007HZ-ST for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 May 2010 13:45:20 -0400 Original-Received: from [140.186.70.92] (port=33846 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OFzVA-00056N-BU for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 20:57:13 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OFzV8-0006Cz-Jd for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 20:57:12 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:37122) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OFzV7-0006Ct-TV for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 20:57:09 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1OFzQA-0001Hv-D7; Sat, 22 May 2010 20:52:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 23 May 2010 00:52:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 6252 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.12745759214943 (code B ref -1); Sun, 23 May 2010 00:52:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 23 May 2010 00:52:01 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OFzQ8-0001Hg-9Q for submit@debbugs.gnu.org; Sat, 22 May 2010 20:52:00 -0400 Original-Received: from mx10.gnu.org ([199.232.76.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OFyPG-0000pP-Vp for submit@debbugs.gnu.org; Sat, 22 May 2010 19:47:03 -0400 Original-Received: from lists.gnu.org ([199.232.76.165]:34627) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1OFyPC-0004G3-Tr for submit@debbugs.gnu.org; Sat, 22 May 2010 19:46:58 -0400 Original-Received: from [140.186.70.92] (port=47446 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OFyPB-0006f9-9R for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 19:46:58 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OFyP9-0001Vp-Cz for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 19:46:57 -0400 Original-Received: from mail-wy0-f169.google.com ([74.125.82.169]:33223) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OFyP9-0001TE-8h for bug-gnu-emacs@gnu.org; Sat, 22 May 2010 19:46:55 -0400 Original-Received: by mail-wy0-f169.google.com with SMTP id 22so638388wyf.0 for ; Sat, 22 May 2010 16:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type:content-transfer-encoding; bh=k/xMBlwUvgmMC+xyuKkkj3u+JmB/EJC2eDAgsVng4pI=; b=QoA2FxgBUPoO3iDDM/OXuohabrKxuCVR0n7dyaZ000/+GBjI6JbT8Wjs3waTDdwXJ0 rfdvwVbiwvVSlVVnIuWiRf9VHPlw9zJLcaXEnbCRsOonzg9dLIDpzsvfh2Ly096VNxhe PyEVOzXwzq/a5ONzcXmy939VAKBBIA2Z8nmQM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=cnI+4RLFwN9SZ8pSgAzlLJcS+sX5qecbx7vMZuIglh2wLt++WPDJotYtKLL+Mfl1YH MyeWPS3oRAfUPf7PzDgqubg4UZr0/r7/b1RRQECGjKlNeTVNJRqJkk91r4sDRKNBGXcY 8aR0IRB5u91dpI1cKvTY7JQbyFGBmysK2XmLA= Original-Received: by 10.216.87.146 with SMTP id y18mr2076964wee.57.1274572014233; Sat, 22 May 2010 16:46:54 -0700 (PDT) Original-Received: by 10.216.12.196 with HTTP; Sat, 22 May 2010 16:46:54 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Mailman-Approved-At: Sat, 22 May 2010 20:52:00 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Sat, 22 May 2010 20:52:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Mailman-Approved-At: Sun, 23 May 2010 13:44:12 -0400 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:37213 Archived-At: On May 18, 20:14, Xah Lee wrote: > is there emacs lisp function that decode the url percent encoding? > e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem > should become > http://en.wikipedia.org/wiki/Sylvester=E2=80=93Gallai_theorem > that's a EN DASH (unicode 8211, #o20023, #x2013). > I know there's a > (require 'gnus-util) > gnus-url-unhex-string > but that just unhex, and generate gibberish if the url contain unicode > chars. (...) Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO that is an important hole you have found there. The standard requires that all unreserved characters be encoded/decoded as UTF8 bytes. Even though the encoding part looks OK (in url-util.el), the decoding does not go that last mile to interpret the decoded bytes as UTF-8. Until a proper implementation is done, I guess you could work around the problem with something like this: (decode-coding-string (apply 'unibyte-string (string-to-list (url-unhex-string "http://en.wikipedia.org/wiki/Sylvester %E2%80%93Gallai_theorem"))) 'utf-8) (yes, it's ugly as hell but hey, it's free ;]) I've just sent this very message as a bug report to the Emacs team. Cheers, --=20 Jos=C3=A9 A. Romero L. escherdragon@gmail.com "We who cut mere stones must always be envisioning cathedrals." (Quarry worker's creed)