From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Jason Rumney Newsgroups: gmane.emacs.devel Subject: Re: [davidsmith@acm.org: [patch] url-hexify-string does not follow W3C spec] Date: Mon, 31 Jul 2006 11:46:18 +0100 Message-ID: <44CDDF7A.8060404@gnu.org> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0302310577==" X-Trace: sea.gmane.org 1154342856 27455 80.91.229.2 (31 Jul 2006 10:47:36 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 31 Jul 2006 10:47:36 +0000 (UTC) Cc: davidsmith@acm.org, YAMAMOTO Mitsuharu , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 31 12:47:18 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1G7VIj-0007qQ-GN for ged-emacs-devel@m.gmane.org; Mon, 31 Jul 2006 12:47:09 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7VIj-0003fw-0W for ged-emacs-devel@m.gmane.org; Mon, 31 Jul 2006 06:47:09 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1G7VIQ-0003dH-Dg for emacs-devel@gnu.org; Mon, 31 Jul 2006 06:46:50 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1G7VIO-0003ar-37 for emacs-devel@gnu.org; Mon, 31 Jul 2006 06:46:48 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7VIN-0003aR-OH for emacs-devel@gnu.org; Mon, 31 Jul 2006 06:46:47 -0400 Original-Received: from [213.86.207.50] (helo=exchange.integrasp.com) by monty-python.gnu.org with esmtp (Exim 4.52) id 1G7VKx-0004za-F9; Mon, 31 Jul 2006 06:49:27 -0400 Original-Received: from [192.168.111.61] (localhost [127.0.0.1]) by exchange.integrasp.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id NMA8FBF6; Mon, 31 Jul 2006 11:38:04 +0100 Original-Received: from 192.168.111.61 ([192.168.111.61] helo=[192.168.111.61]) by ASSP-nospam; 31 Jul 2006 11:38:04 +0100 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.4) Gecko/20060516 Thunderbird/1.5.0.4 Mnenhy/0.7.4.666 Original-To: Thien-Thi Nguyen In-Reply-To: X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:57842 Archived-At: This is a multi-part message in MIME format. --===============0302310577== Content-Type: multipart/alternative; boundary="------------050303030404030500080305" This is a multi-part message in MIME format. --------------050303030404030500080305 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thien-Thi Nguyen wrote: > YAMAMOTO Mitsuharu writes: > > >> This change breaks the following case: >> >> (concat >> "file://localhost" >> (mapconcat 'url-hexify-string >> (split-string >> (encode-coding-string "/SOME/NONASCII/FILE/NAME" >> (or file-name-coding-system >> default-file-name-coding-system)) >> "/") >> "/")) >> >> Maybe suppress encoding with UTF-8 for unibyte strings? >> > > if the result of this expression is to be used as a URI, then that means > the change exposes improper use of `url-hexify-string'; according to the > RFC (as i understand it) URIs require utf-8. > There is a recent RFC that mandates utf-8 encoding for URIs, but previous RFCs either said nothing, or specified Latin-1, so there are many implementations that do not use utf-8. We need some way to interoperate with such implementations. > if we want `url-hexify-string' to handle "URI-like" transformations > (i.e., not strictly produce URI-conformant results), we can add an > optional arg MAKE-UNIBYTE that specifies a function to do the conversion > to unibyte. in most cases, i guess that would be `string-as-unibyte', > but i don't know for sure. > Alternatively, we could add an optional arg ENCODING, for specifying an encoding other than utf-8. That might be a cleaner interface than requiring the user to make the string unibyte before passing it to url-hexify-string. --------------050303030404030500080305 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thien-Thi Nguyen wrote:
YAMAMOTO Mitsuharu <mituharu@math.s.chiba-u.ac.jp> writes:

  
This change breaks the following case:

(concat
 "file://localhost"
 (mapconcat 'url-hexify-string
	    (split-string
	     (encode-coding-string "/SOME/NONASCII/FILE/NAME"
				   (or file-name-coding-system
				       default-file-name-coding-system))
	     "/")
	    "/"))

Maybe suppress encoding with UTF-8 for unibyte strings?
    

if the result of this expression is to be used as a URI, then that means
the change exposes improper use of `url-hexify-string'; according to the
RFC (as i understand it) URIs require utf-8.
  
There is a recent RFC that mandates utf-8 encoding for URIs, but previous RFCs either said nothing, or specified Latin-1, so there are many implementations that do not use utf-8. We need some way to interoperate with such implementations.

if we want `url-hexify-string' to handle "URI-like" transformations
(i.e., not strictly produce URI-conformant results), we can add an
optional arg MAKE-UNIBYTE that specifies a function to do the conversion
to unibyte.  in most cases, i guess that would be `string-as-unibyte',
but i don't know for sure.
  
Alternatively, we could add an optional arg ENCODING, for specifying an encoding other than utf-8. That might be a cleaner interface than requiring the user to make the string unibyte before passing it to url-hexify-string.

--------------050303030404030500080305-- --===============0302310577== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel --===============0302310577==--