From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: [davidsmith@acm.org: [patch] url-hexify-string does not follow W3C spec] Date: Tue, 01 Aug 2006 10:32:05 -0400 Message-ID: References: <44CDDF7A.8060404@gnu.org> <87lkq9ivgf.fsf@acm.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1154442761 4726 80.91.229.2 (1 Aug 2006 14:32:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 1 Aug 2006 14:32:41 +0000 (UTC) Cc: mituharu@math.s.chiba-u.ac.jp, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Aug 01 16:32:39 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1G7vIB-0005b1-Q9 for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 16:32:20 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7vIA-0000aN-GB for ged-emacs-devel@m.gmane.org; Tue, 01 Aug 2006 10:32:18 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1G7vI0-0000aI-P4 for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:32:08 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1G7vHz-0000a5-Ak for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:32:08 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1G7vHz-0000a2-38 for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:32:07 -0400 Original-Received: from [209.226.175.110] (helo=tomts43-srv.bellnexxia.net) by monty-python.gnu.org with esmtp (Exim 4.52) id 1G7vKt-00057l-2t for emacs-devel@gnu.org; Tue, 01 Aug 2006 10:35:07 -0400 Original-Received: from localhost ([70.55.146.253]) by tomts43-srv.bellnexxia.net (InterMail vM.5.01.06.13 201-253-122-130-113-20050324) with ESMTP id <20060801143206.LSPJ1543.tomts43-srv.bellnexxia.net@localhost>; Tue, 1 Aug 2006 10:32:06 -0400 Original-Received: by localhost (Postfix, from userid 20848) id E9A9A6B1D9; Tue, 1 Aug 2006 10:32:05 -0400 (EDT) Original-To: Kenichi Handa In-Reply-To: (Kenichi Handa's message of "Tue, 01 Aug 2006 16:14:30 +0900") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:57934 Archived-At: >>>> What incompatibility? If the string only contains ASCII and >>>> eight-bit-*, then encoding it with utf-8 will return the same string >>>> of bytes (except in a unibyte string rather than multibyte string). >>> Here's an example: >>> (encode-coding-string "\x80" 'utf-8) >>> => "\302\200" >> Duh! Looks like a serious bug to me. >> Handa-san, what's up with that? > ??? \x80 == U+0080 is a valid Unicode character in "C1 > Controls" block. Why was it chosen to represent U+0080 with \x80? The problem with it is that it makes it impossible to reliably carry byte-streams embedded in multibyte strings. Oh well, I guess that ecbdic and friends also make it impossible anyway :-( > However, I agree that the following is very questionable > behaviour: >>> (encode-coding-string (string-as-unibyte "\x80") 'utf-8) >>> => "\302\200" > But, that is a long standing problem, and should be fixed > (if necessary) after the release. It should be fixed by signalling an error: if the string is unibyte it's already encoded. Stefan