From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Werner LEMBERG Newsgroups: gmane.emacs.bugs Subject: bug#12291: [rev 109796] wrong UTF-8 handling Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST) Message-ID: <20120828.212226.458921190.wl@gnu.org> References: <20120828.074720.480105751.wl@gnu.org> <87a9xfdpy4.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1346181794 30440 80.91.229.3 (28 Aug 2012 19:23:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 28 Aug 2012 19:23:14 +0000 (UTC) Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu To: handa@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Aug 28 21:23:13 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T6RNV-0002HZ-0s for geb-bug-gnu-emacs@m.gmane.org; Tue, 28 Aug 2012 21:23:09 +0200 Original-Received: from localhost ([::1]:46415 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T6RNS-00089K-U8 for geb-bug-gnu-emacs@m.gmane.org; Tue, 28 Aug 2012 15:23:06 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:59274) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T6RNP-00089B-Ec for bug-gnu-emacs@gnu.org; Tue, 28 Aug 2012 15:23:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T6RNO-0006hx-DC for bug-gnu-emacs@gnu.org; Tue, 28 Aug 2012 15:23:03 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:45260) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T6RNO-0006ht-9s for bug-gnu-emacs@gnu.org; Tue, 28 Aug 2012 15:23:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1T6ROM-0008Fp-2M for bug-gnu-emacs@gnu.org; Tue, 28 Aug 2012 15:24:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Werner LEMBERG Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 28 Aug 2012 19:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 12291 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 12291-submit@debbugs.gnu.org id=B12291.134618181631693 (code B ref 12291); Tue, 28 Aug 2012 19:24:02 +0000 Original-Received: (at 12291) by debbugs.gnu.org; 28 Aug 2012 19:23:36 +0000 Original-Received: from localhost ([127.0.0.1]:54806 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T6RNv-0008F7-48 for submit@debbugs.gnu.org; Tue, 28 Aug 2012 15:23:36 -0400 Original-Received: from mailout-de.gmx.net ([213.165.64.22]:60433) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1T6RNs-0008Ey-Eq for 12291@debbugs.gnu.org; Tue, 28 Aug 2012 15:23:33 -0400 Original-Received: (qmail invoked by alias); 28 Aug 2012 19:22:31 -0000 Original-Received: from 178-191-182-81.adsl.highway.telekom.at (EHLO localhost) [178.191.182.81] by mail.gmx.net (mp024) with SMTP; 28 Aug 2012 21:22:31 +0200 X-Authenticated: #54312696 X-Provags-ID: V01U2FsdGVkX1/CleBxAkLCrVMCneluIEaDPBJ6PZnTcpP3Q/62Ti nLdaVoymsX4x1G In-Reply-To: <87a9xfdpy4.fsf@gnu.org> X-Mailer: Mew version 6.4rc1 on Emacs 24.2.50.1 / Mule 6.0 (HANACHIRUSATO) X-Y-GMX-Trusted: 0 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:63563 Archived-At: > In both cases, user surely see them. OK. BTW, the real use-case is a bug in emacs 23.x which prevented correct conversion from emacs-mule encoding to utf-8, creating such funnily encoded utf-8 files (I can't repeat this problem with my recently compiled emacs, so it seems that it has been fixed meanwhile). >> Instead, such characters must be converted to correct >> UTF-8. > > ??? I don't understand what you means by "correct UTF-8". Sorry, I've meant correct Unicode. U+1351DE is larger than the largest valid Unicode value. As my example demonstrates, the Chinese character in the file is certainly *neither* a private character nor a character from GB 18030, so it should be converted to a regular Unicode value. > I think the correct behaviour on reading such a file by utf-8 is to > treat each byte as raw-byte. Maybe. I'm not sure how Emacs should behave in reading such files. Werner