From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Email text that confuses charset recognition in emacs Date: Tue, 16 Apr 2013 21:37:08 -0700 Organization: UCLA Computer Science Department Message-ID: <516E26F4.1020303@cs.ucla.edu> References: <20130416162747.GA11871@saturn> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1366173439 31899 80.91.229.3 (17 Apr 2013 04:37:19 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 17 Apr 2013 04:37:19 +0000 (UTC) Cc: emacs-devel@gnu.org To: Giorgos Keramidas Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Apr 17 06:37:20 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1USK7O-0005pH-A1 for ged-emacs-devel@m.gmane.org; Wed, 17 Apr 2013 06:37:14 +0200 Original-Received: from localhost ([::1]:36073 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1USK7N-0001Ry-RW for ged-emacs-devel@m.gmane.org; Wed, 17 Apr 2013 00:37:13 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:46798) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1USK7K-0001Rt-KP for emacs-devel@gnu.org; Wed, 17 Apr 2013 00:37:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1USK7J-0007lv-Qi for emacs-devel@gnu.org; Wed, 17 Apr 2013 00:37:10 -0400 Original-Received: from smtp.cs.ucla.edu ([131.179.128.62]:46532) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1USK7J-0007lf-Ks for emacs-devel@gnu.org; Wed, 17 Apr 2013 00:37:09 -0400 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A99EB39E8108; Tue, 16 Apr 2013 21:37:08 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Original-Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id quZvqa0MUGQN; Tue, 16 Apr 2013 21:37:08 -0700 (PDT) Original-Received: from [192.168.1.9] (pool-71-189-154-249.lsanca.fios.verizon.net [71.189.154.249]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 43BD139E8105; Tue, 16 Apr 2013 21:37:08 -0700 (PDT) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5 In-Reply-To: <20130416162747.GA11871@saturn> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 131.179.128.62 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:158969 Archived-At: On 04/16/2013 09:27 AM, Giorgos Keramidas wrote: > the attached email message confuses the charset > detection machinery of Emacs, and it starts interpreting all text as > Japanese text -- even though most of the contents of the file are plain > us-ascii text. Although the text is US-ASCII it contains a valid ISO-2022-7bit coding sequence (the two things are not incompatible) which Emacs is properly detecting and converting. The problem is that the text later contains the invalid escape sequence ESC LF > > SP ( B This text was intended to switch out of a Japanese charset (the immediately preceding text is valid ISO-2022-7bit Japanese), but a mailer that *thought* that the text was ASCII inserted LF > > SP after the ESC and before the ( B, causing the ESC ( B to be corrupted, so Emacs remains in Japanese mode until the end of the input. Perhaps when Emacs is decoding ISO-2022-7bit and sees an invalid escape sequence, it should switch back to ASCII. That would have fixed your problem, and wouldn't break the decoding of any valid ISO-2022-7bit sequence.