From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.bugs Subject: bug#12291: [rev 109796] wrong UTF-8 handling Date: Mon, 03 Sep 2012 09:59:22 +0900 Message-ID: <87392zvs45.fsf@gnu.org> References: <20120828.074720.480105751.wl@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1346634042 8827 80.91.229.3 (3 Sep 2012 01:00:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 3 Sep 2012 01:00:42 +0000 (UTC) Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Sep 03 03:00:43 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1T8L1s-0004u3-1s for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Sep 2012 03:00:40 +0200 Original-Received: from localhost ([::1]:50039 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T8L1p-0005Ua-AN for geb-bug-gnu-emacs@m.gmane.org; Sun, 02 Sep 2012 21:00:37 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:48434) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T8L1m-0005UV-I8 for bug-gnu-emacs@gnu.org; Sun, 02 Sep 2012 21:00:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T8L1l-0007Cw-MT for bug-gnu-emacs@gnu.org; Sun, 02 Sep 2012 21:00:34 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:53888) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T8L1l-0007Cs-Ir for bug-gnu-emacs@gnu.org; Sun, 02 Sep 2012 21:00:33 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1T8L3C-0003vX-Jm for bug-gnu-emacs@gnu.org; Sun, 02 Sep 2012 21:02:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 03 Sep 2012 01:02:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 12291 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 12291-submit@debbugs.gnu.org id=B12291.134663406414999 (code B ref 12291); Mon, 03 Sep 2012 01:02:02 +0000 Original-Received: (at 12291) by debbugs.gnu.org; 3 Sep 2012 01:01:04 +0000 Original-Received: from localhost ([127.0.0.1]:35188 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T8L2F-0003ts-Pt for submit@debbugs.gnu.org; Sun, 02 Sep 2012 21:01:04 -0400 Original-Received: from fencepost.gnu.org ([208.118.235.10]:57514) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1T8L2C-0003tT-M8 for 12291@debbugs.gnu.org; Sun, 02 Sep 2012 21:01:01 -0400 Original-Received: from [150.29.149.7] (port=64775 helo=ubuntu) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1T8L0j-0001oE-KW; Sun, 02 Sep 2012 20:59:30 -0400 In-Reply-To: <83bohrqr83.fsf@gnu.org> (message from Eli Zaretskii on Fri, 31 Aug 2012 13:40:44 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:63694 Archived-At: In article <83bohrqr83.fsf@gnu.org>, Eli Zaretskii writes: > > Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST) > > From: Werner LEMBERG > > Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu > > > > > I think the correct behaviour on reading such a file by utf-8 is to > > > treat each byte as raw-byte. > > > > Maybe. I'm not sure how Emacs should behave in reading such files. > We can either read them as raw bytes, or convert them to u+FFFD. The > former sounds like a more useful behavior to me, FWIW. What to convert to U+FFFD? Each byte, or the byte sequence? Anyway, we can't simply convert them to U+FFFD because it results in change of file contents just by reading and writing. We can add post-read-conversion and pre-write-conversion functions to the conding system utf-8 to perform the conversion (and adding text properties for reverting) and reverting (using the text properties attached at the time of reading). But, is it worth doing that? I think converting each invalid byte to raw-byte is simpler and equally useful. --- Kenichi Handa handa@gnu.org