From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lars Magne Ingebrigtsen Newsgroups: gmane.emacs.bugs Subject: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. Date: Wed, 21 Sep 2011 22:17:52 +0200 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1316637419 17772 80.91.229.12 (21 Sep 2011 20:36:59 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 21 Sep 2011 20:36:59 +0000 (UTC) Cc: 6252@debbugs.gnu.org To: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Sep 21 22:36:54 2011 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1R6TXJ-0006LL-Mw for geb-bug-gnu-emacs@m.gmane.org; Wed, 21 Sep 2011 22:36:53 +0200 Original-Received: from localhost ([::1]:46352 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6TXJ-0002LG-1j for geb-bug-gnu-emacs@m.gmane.org; Wed, 21 Sep 2011 16:36:53 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:53453) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6TXC-0002Eu-Jc for bug-gnu-emacs@gnu.org; Wed, 21 Sep 2011 16:36:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R6TXB-0003Eo-Cu for bug-gnu-emacs@gnu.org; Wed, 21 Sep 2011 16:36:46 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:34362) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6TXB-0003Ek-BP for bug-gnu-emacs@gnu.org; Wed, 21 Sep 2011 16:36:45 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1R6TXR-0006iR-QX for bug-gnu-emacs@gnu.org; Wed, 21 Sep 2011 16:37:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Lars Magne Ingebrigtsen Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 21 Sep 2011 20:37:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6252 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 6252-submit@debbugs.gnu.org id=B6252.131663738525717 (code B ref 6252); Wed, 21 Sep 2011 20:37:01 +0000 Original-Received: (at 6252) by debbugs.gnu.org; 21 Sep 2011 20:36:25 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R6TWp-0006gb-H8 for submit@debbugs.gnu.org; Wed, 21 Sep 2011 16:36:24 -0400 Original-Received: from hermes.netfonds.no ([80.91.224.195]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R6TWo-0006gU-53 for 6252@debbugs.gnu.org; Wed, 21 Sep 2011 16:36:22 -0400 Original-Received: from cm-84.215.51.58.getinternet.no ([84.215.51.58] helo=stories.gnus.org) by hermes.netfonds.no with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1R6TWQ-00068z-Ui; Wed, 21 Sep 2011 22:35:58 +0200 In-Reply-To: ("=?UTF-8?Q?Jos=C3=A9?= A. Romero L."'s message of "Sun, 23 May 2010 01:46:54 +0200") User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux) X-Now-Playing: Colder's _Heat_: "Fade Away" X-MailScanner-ID: 1R6TWQ-00068z-Ui MailScanner-NULL-Check: 1317242159.03077@8oR+/8iq5vtiIvJ4vyY5Hw X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Wed, 21 Sep 2011 16:37:01 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:51597 Archived-At: Jos=E9 A. Romero L. writes: > On May 18, 20:14, Xah Lee wrote: > >> is there emacs lisp function that decode the url percent encoding? >> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem >> should become >> http://en.wikipedia.org/wiki/Sylvester=96Gallai_theorem >> that's a EN DASH (unicode 8211, #o20023, #x2013). >> I know there's a >> (require 'gnus-util) >> gnus-url-unhex-string >> but that just unhex, and generate gibberish if the url contain unicode >> chars. > (...) > > Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO > that is an important hole you have found there. The standard requires > that all unreserved characters be encoded/decoded as UTF8 bytes. Even > though the encoding part looks OK (in url-util.el), the decoding does > not go that last mile to interpret the decoded bytes as UTF-8. I'm not quite sure I understand what the problem is. Do you have a test case that illustrates what url.el does wrong? --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/