From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thien-Thi Nguyen Newsgroups: gmane.emacs.devel Subject: Re: [davidsmith@acm.org: [patch] url-hexify-string does not follow W3C spec] Date: 01 Aug 2006 10:47:07 -0400 Message-ID: References: <44CDDF7A.8060404@gnu.org> <87lkq9ivgf.fsf@acm.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1154443706 8259 80.91.229.2 (1 Aug 2006 14:48:26 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 1 Aug 2006 14:48:26 +0000 (UTC) Cc: David Smith , emacs-devel@gnu.org, Stefan Monnier , Jason Rumney Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 01 16:48:20 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1G7vWq-0000lk-0Z for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 16:47:28 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7vWp-0002Fm-Ju for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 10:47:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1G7vWe-0002FX-LU for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:47:16 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1G7vWd-0002EQ-7p for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:47:16 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7vWc-0002EN-W7 for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:47:15 -0400 Original-Received: from [67.59.132.6] (helo=mail.agora-net.com) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1G7vZT-0006Uz-Ec; Tue, 01 Aug 2006 10:50:11 -0400 Original-Received: from ttn by mail.agora-net.com with local (Exim 4.50) id 1G7vWV-0001F5-GC; Tue, 01 Aug 2006 10:47:07 -0400 Original-To: YAMAMOTO Mitsuharu In-Reply-To: Original-Lines: 39 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:57935 Archived-At: YAMAMOTO Mitsuharu writes: > [review] thanks, that was very pleasant to read. > * Rev 1.14 > The argument is assumed to be either a sequence of characters or a > sequence of octets depending on the multibyteness of the string. > Incompatibility still remains for a multibyte string containing > eight-bit-control or eight-bit-graphic, but usually negligible. > > I'm not sure if encoding with UTF-8 is really useful, but I don't > strongly oppose it if compatibility for the unibyte case is preverved. conversion to utf-8 is per the RFC, which seems to be the primary context for this function; avoiding that conversion means noncompliance w/ the RFC. i think rev 1.14 is almost ok; anything that deviates from the RFC should be under user control (via optional arg) and should be documented. i assume that (a) conversion of multibyte utf-8 is unconditionally desirable (a "negligible" problem is no problem), and (b) that there exist non utf-8 unibyte encodings that which callers wish to "hexify as is". please correct me if these assumptions do not hold. on the other hand, if they do hold, how about: (defun ... (string &optional unibyte-as-is-p) ... (if (or (multibyte-string-p string) (not unibyte-as-is-p)) (encode-coding-string string 'utf-8 t) string) ...) ? this way, RFC-compliance is the default, but suppressing the conversion to utf-8 is still possible for unibyte strings by specifying UNIBYTE-AS-IS-P. thi