From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Asian characters in email corrupts all successive emails Date: Mon, 8 Dec 2003 13:41:45 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200312080441.NAA12929@etlken.m17n.org> References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1070858635 16407 80.91.224.253 (8 Dec 2003 04:43:55 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 8 Dec 2003 04:43:55 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Mon Dec 08 05:43:50 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1ATDFO-0003Ck-00 for ; Mon, 08 Dec 2003 05:43:50 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1ATDFO-0000KL-00 for ; Mon, 08 Dec 2003 05:43:50 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ATEC2-0001IH-DE for emacs-devel@quimby.gnus.org; Mon, 08 Dec 2003 00:44:26 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1ATEBT-0001IB-3g for emacs-devel@gnu.org; Mon, 08 Dec 2003 00:43:51 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1ATEAw-0001BS-1v for emacs-devel@gnu.org; Mon, 08 Dec 2003 00:43:49 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1ATEAv-0001BN-Ar for emacs-devel@gnu.org; Mon, 08 Dec 2003 00:43:17 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hB84fkh21284; Mon, 8 Dec 2003 13:41:46 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hB84fjs22030; Mon, 8 Dec 2003 13:41:46 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id NAA12929; Mon, 8 Dec 2003 13:41:45 +0900 (JST) Original-To: puneet@computer.org In-reply-to: (message from puneet on Thu, 04 Dec 2003 08:58:18 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18533 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18533 In article , puneet writes: > I am using a recent cvs build of emacs. I have > attached a small file with this email, which is part of my > procmail.log. > As you can see on the line 14 of the attached file, > it is the procmail log for some spam with asian > characters. The problem is that in emacs, I am seeing all > the lines after line 14 corrupted. It is not so if I use > any other text editor (I have tried vim/vi). Actually the file is already corrupted. As the line 14 doesn't end with an ESC sequence to designate ASCII, the bytes in the next lines are still considered as the lastly designated charset (a Japanese charset in this case). It seems that procmail trancates subject lines just by bytes without considering the encoding. And perhaps vim/vi assume ASCII designation sequence at line end even if a line doesn't end with it, and perhaps Emacs should do the same thing. I'll work on it. > When I open the file in emacs and save it in another > file, the saved file differs from the opened file. Emacs can't preserve the exact byte sequence of the original file on saving when the file is decoded by iso-2022 based coding system, especially when the file contains incorrect escape sequences. Have you tried vim/vi to save the file? Do they preserve the original byte sequence? > When such an email is coming to my mail > folder/spool, emacs fails to recognize all the emails > after the particular mail in the mail folder. Actually I > am using gnus/nnml and emacs is not recognizing/getting > any email after the particular email from the spool. I think it's a bug of gnus/nnml. From your description, I suspect that it decodes all mails at once instead of decoding each message one by one. But, I'm not a Gnus user. Could someone confirm it? Anyway, please try this command to fix such kind of incorrect ESC sequence. It reads a file name FILENAME, and write the fixed one into FILENAME.new. (defun fix-iso2022 (file) (interactive "fFile to recover: ") (with-temp-file (concat file ".new") (set-buffer-multibyte nil) (let ((coding-system-for-read 'raw-text)) (insert-file-contents file) (while (re-search-forward "\e$(?[@-~]" nil t) (or (search-forward "\e(B" (line-end-position) 'move) (insert "\e(B")))))) --- Ken'ichi HANDA handa@m17n.org