From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Newsgroups: gmane.emacs.bugs Subject: bug#6252: Fwd: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. Date: Tue, 25 May 2010 10:56:36 +0200 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1274792262 26933 80.91.229.12 (25 May 2010 12:57:42 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 25 May 2010 12:57:42 +0000 (UTC) To: 6252@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue May 25 14:57:40 2010 connect(): No such file or directory Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OGthT-0008Oo-Nj for geb-bug-gnu-emacs@m.gmane.org; Tue, 25 May 2010 14:57:40 +0200 Original-Received: from localhost ([127.0.0.1]:33011 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGthT-0005gI-8t for geb-bug-gnu-emacs@m.gmane.org; Tue, 25 May 2010 08:57:39 -0400 Original-Received: from [140.186.70.92] (port=33280 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGthE-0005Ub-U6 for bug-gnu-emacs@gnu.org; Tue, 25 May 2010 08:57:26 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGthD-0002Yw-FW for bug-gnu-emacs@gnu.org; Tue, 25 May 2010 08:57:24 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:52903) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGthD-0002Yq-Cb for bug-gnu-emacs@gnu.org; Tue, 25 May 2010 08:57:23 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.69) (envelope-from ) id 1OGtKc-0003lZ-Ik; Tue, 25 May 2010 08:34:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: =?UTF-8?Q?Jos=C3=A9?= A. Romero L. Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 25 May 2010 12:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6252 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 6252-submit@debbugs.gnu.org id=B6252.127479082514457 (code B ref 6252); Tue, 25 May 2010 12:34:02 +0000 Original-Received: (at 6252) by debbugs.gnu.org; 25 May 2010 12:33:45 +0000 Original-Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGtKK-0003l5-OR for submit@debbugs.gnu.org; Tue, 25 May 2010 08:33:45 -0400 Original-Received: from mail-ew0-f226.google.com ([209.85.219.226]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGpwF-0001Ri-OQ for 6252@debbugs.gnu.org; Tue, 25 May 2010 04:56:40 -0400 Original-Received: by ewy26 with SMTP id 26so395398ewy.5 for <6252@debbugs.gnu.org>; Tue, 25 May 2010 01:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=2xDmtZMOw90WEutiWBnXBAV+ZnsB2BYknbUh9RFw7Y8=; b=n6uOCYBuyxa0G/T/JkI5b3rlleiOSKlLqqWyaH+M/ruG5N79oBU7AFxoLF70xTU/n3 FcfzXTvr0Eo0yLZfbFeb5VJklaEoxynz+O2XLIXitrbkzhqZVh8jwDVuEmBER1iNeSEy TUm9fJz/IOkP2RBUg50SU0zv+Gw++t1yuEpqE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Bo0+HvxQPdrjRAwdrfd2BR6uAgon0qydljIRxdnInGXIQyah6UezVfp5IHkh1c2gsg V1y0IwEUdc64OhBRbop8Jbi2aZ3KjuTpr67sUT82CO/BbDl6n0yw7TKvPYf16vvJfUnl 39TfxSHe0FVZgXXbk7Uljt7s0NHKTnpKUdWuk= Original-Received: by 10.213.3.4 with SMTP id 4mr358023ebl.18.1274777796403; Tue, 25 May 2010 01:56:36 -0700 (PDT) Original-Received: by 10.213.17.148 with HTTP; Tue, 25 May 2010 01:56:36 -0700 (PDT) In-Reply-To: X-Mailman-Approved-At: Tue, 25 May 2010 08:33:41 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Resent-Date: Tue, 25 May 2010 08:34:02 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:37257 Archived-At: (sorry, forgot to fwd this to the bugtrack) ---------- Forwarded message ---------- From: Jos=C3=A9 A. Romero L. Date: 2010/5/24 Subject: Re: bug#6252: Emacs does not implement URL (aka "percent") decoding correctly. To: YAMAMOTO Mitsuharu 2010/5/24 YAMAMOTO Mitsuharu : >>>>>> On Sun, 23 May 2010 01:46:54 +0200, Jos=C3=A9 A. Romero L. said: (...) > If you are referring to the following part of RFC 3986, it doesn't say > anything about existing URI schemes (as opposed to "a new URI > scheme"), those defining a component that does NOT represent textual > data, or even for textual data, those NOT consisting of characters > from the Universal Character Sets. You are right. The standard *doesn't say anything* about existing URI schemes on that matter. Thus =C2=A0the question would be rather whether to make the language more or less useful, especially on the light of the fragment you've just quoted: =C2=A0 =C2=A0 =C2=A0> =C2=A0When a new URI scheme defines a component that = represents textual =C2=A0 =C2=A0 =C2=A0> =C2=A0data consisting of characters from the Universa= l Character Set =C2=A0 =C2=A0 =C2=A0> =C2=A0[UCS], the data should first be encoded as octe= ts according to the =C2=A0 =C2=A0 =C2=A0> =C2=A0UTF-8 character encoding [STD63]; then only tho= se octets that do not =C2=A0 =C2=A0 =C2=A0> =C2=A0correspond to characters in the unreserved set = should be percent- =C2=A0 =C2=A0 =C2=A0> =C2=A0encoded. and the example that immediately follows: =C2=A0 =C2=A0(...) For example, the character A would be represented as "A"= , =C2=A0 =C2=A0the character LATIN CAPITAL LETTER A WITH GRAVE would be repre= sented =C2=A0 =C2=A0as "%C3%80", and the character KATAKANA LETTER A would be repr= esented =C2=A0 =C2=A0as "%E3%82%A2". > > (See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.= html) > > Though returning a multibyte string decoded as UTF-8 would be useful > for many cases, I think some "unhex"ing function should also provide a > functionality to return a unibyte string. (...) That's perfectly valid. OTOH some other "unhex"-ing function (or even the same) could also provide the functionality to return a multi-byte string, and even allow to =C2=A0choose the character encoding (UCS or not) for the resulting string. After =C2=A0all, don't you think there should be a better way to decode a Katakana A than using a kludge like this?: =C2=A0(decode-coding-string =C2=A0 =C2=A0 (apply 'unibyte-string =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(string-to-list =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (url-unhex-string "%E3%82%A2"))) =C2=A0 =C2=A0 'utf-8) Cheers, -- Jos=C3=A9 A. Romero L. escherdragon@gmail.com "We who cut mere stones must always be envisioning cathedrals." (Quarry worker's creed)