From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: YAMAMOTO Mitsuharu Newsgroups: gmane.emacs.bugs Subject: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. Date: Mon, 24 May 2010 12:33:46 +0900 Organization: Faculty of Science, Chiba University Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1274673445 9707 80.91.229.12 (24 May 2010 03:57:25 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 24 May 2010 03:57:25 +0000 (UTC) Cc: 6252@debbugs.gnu.org To: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon May 24 05:57:23 2010 connect(): No such file or directory Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OGOn2-0000P0-La for geb-bug-gnu-emacs@m.gmane.org; Mon, 24 May 2010 05:57:20 +0200 Original-Received: from localhost ([127.0.0.1]:36599 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGOn2-0003Ca-3i for geb-bug-gnu-emacs@m.gmane.org; Sun, 23 May 2010 23:57:20 -0400 Original-Received: from [140.186.70.92] (port=56436 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGOmw-0003Ax-Pf for bug-gnu-emacs@gnu.org; Sun, 23 May 2010 23:57:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGOmu-0004AJ-9Q for bug-gnu-emacs@gnu.org; Sun, 23 May 2010 23:57:14 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:35013) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGOmu-0004AF-7M for bug-gnu-emacs@gnu.org; Sun, 23 May 2010 23:57:12 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1OGOQU-0004tK-MZ; Sun, 23 May 2010 23:34:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: YAMAMOTO Mitsuharu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 24 May 2010 03:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6252 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 6252-submit@debbugs.gnu.org id=B6252.127467203618796 (code B ref 6252); Mon, 24 May 2010 03:34:02 +0000 Original-Received: (at 6252) by debbugs.gnu.org; 24 May 2010 03:33:56 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGOQO-0004t7-8i for submit@debbugs.gnu.org; Sun, 23 May 2010 23:33:56 -0400 Original-Received: from mathmail.math.s.chiba-u.ac.jp ([133.82.132.2]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGOQL-0004t2-4Y for 6252@debbugs.gnu.org; Sun, 23 May 2010 23:33:54 -0400 Original-Received: from church.math.s.chiba-u.ac.jp (church [133.82.132.36]) by mathmail.math.s.chiba-u.ac.jp (Postfix) with ESMTP id 7D3D1C0557; Mon, 24 May 2010 12:33:46 +0900 (JST) In-Reply-To: User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 =?UTF-8?Q?(Shij=C5=8D)?= APEL/10.6 Emacs/22.3 (sparc-sun-solaris2.8) MULE/5.0 (SAKAKI) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Sun, 23 May 2010 23:34:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:37220 Archived-At: >>>>> On Sun, 23 May 2010 01:46:54 +0200, Jos=E9 A. Romero L. said: > Seems that RFC 3986 has not been implemented correctly in > Emacs. IMHO that is an important hole you have found there. The > standard requires that all unreserved characters be encoded/decoded > as UTF8 bytes. If you are referring to the following part of RFC 3986, it doesn't say anything about existing URI schemes (as opposed to "a new URI scheme"), those defining a component that does NOT represent textual data, or even for textual data, those NOT consisting of characters from the Universal Character Sets. When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. (See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.ht= ml) Though returning a multibyte string decoded as UTF-8 would be useful for many cases, I think some "unhex"ing function should also provide a functionality to return a unibyte string. YAMAMOTO Mitsuharu mituharu@math.s.chiba-u.ac.jp