From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Dan Jacobson Newsgroups: gmane.emacs.bugs Subject: bites assumed set mid UTF-8 Date: Tue, 07 Mar 2006 02:15:10 +0800 Message-ID: <87r75fignl.fsf@jidanni.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1141681327 18457 80.91.229.2 (6 Mar 2006 21:42:07 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 6 Mar 2006 21:42:07 +0000 (UTC) Cc: handa@etl.go.jp Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Mar 06 22:42:07 2006 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FGNSp-0004Pa-RM for geb-bug-gnu-emacs@m.gmane.org; Mon, 06 Mar 2006 22:42:00 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGNT0-0007qs-NP for geb-bug-gnu-emacs@m.gmane.org; Mon, 06 Mar 2006 16:42:10 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FGMj1-0001nj-Tr for bug-gnu-emacs@gnu.org; Mon, 06 Mar 2006 15:54:40 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FGMiy-0001ma-GH for bug-gnu-emacs@gnu.org; Mon, 06 Mar 2006 15:54:39 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FGMiy-0001mV-Bi for bug-gnu-emacs@gnu.org; Mon, 06 Mar 2006 15:54:36 -0500 Original-Received: from [204.74.68.40] (helo=frodo.hserus.net) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FGMlF-0000yM-9v for bug-gnu-emacs@gnu.org; Mon, 06 Mar 2006 15:56:57 -0500 Original-Received: from tc218-187-23-39.dialup.dynamic.apol.com.tw ([218.187.23.39]:3611 helo=jidanni1) by frodo.hserus.net with esmtpsa (Cipher TLSv1:AES256-SHA:256) (Exim 4.60 #0) id 1FGMic-000Mv9-6z by authid with plain; Tue, 07 Mar 2006 02:24:21 +0530 Original-To: bug-gnu-emacs@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:14943 Archived-At: Bad bad bad. Emacs 21.4.1 shows the same Chinese character ("nuclear") even though the second bit string is not valid UTF-8. Cc'd Handa. 11100110 10100000 10111000 11100110 00100000 10111000 M-x compile this makefile B=perl -wle '$$_=unpack "B*", <>; s/.{8}/$$& /g; print' Q=qp-decode a: for i in =E6=A0=B8 =E6\ =B8; do \ echo $$i|$Q; echo -n $$i|$Q|$B; done #echo =E6=A0=B8|qp-decode #Chinese "nuclear" #echo =E6 =B8 |qp-decode #Not legal Unicode uxterm etc. correctly displays only the first. Save what you see into a file and indeed they are both the first. Reality hits when one first saves it into a file bypassing emacs display, and then find-files it. Thereupon the second version is not coerced into being the same as the first, and emacs guesses the file is latin-1.