From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Richard Stallman Newsgroups: gmane.emacs.devel Subject: Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.) Date: Sun, 29 Nov 2009 11:01:21 -0500 Message-ID: References: <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> <20091119212550.GG1314@muc.de> <87my2ign8u.fsf@lola.goethe.zz> <912155b0911231334s2b52e8eq864251c9aed386b3@mail.gmail.com> Reply-To: rms@gnu.org NNTP-Posting-Host: lo.gmane.org Content-Type: text/plain; charset=ISO-8859-15 X-Trace: ger.gmane.org 1259510587 19283 80.91.229.12 (29 Nov 2009 16:03:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 29 Nov 2009 16:03:07 +0000 (UTC) Cc: per.starback@gmail.com, dak@gnu.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 29 17:02:59 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NEmEg-0006Jh-If for ged-emacs-devel@m.gmane.org; Sun, 29 Nov 2009 17:02:54 +0100 Original-Received: from localhost ([127.0.0.1]:35802 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NEmEg-0008FE-96 for ged-emacs-devel@m.gmane.org; Sun, 29 Nov 2009 11:02:54 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NEmDH-0007ZD-LB for emacs-devel@gnu.org; Sun, 29 Nov 2009 11:01:27 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NEmDE-0007VA-0K for emacs-devel@gnu.org; Sun, 29 Nov 2009 11:01:27 -0500 Original-Received: from [199.232.76.173] (port=50471 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NEmDD-0007Ux-IQ for emacs-devel@gnu.org; Sun, 29 Nov 2009 11:01:23 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]:36827) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NEmDD-0004ED-9e for emacs-devel@gnu.org; Sun, 29 Nov 2009 11:01:23 -0500 Original-Received: from rms by fencepost.gnu.org with local (Exim 4.67) (envelope-from ) id 1NEmDB-0001p8-Hr; Sun, 29 Nov 2009 11:01:21 -0500 In-reply-to: (message from Kenichi Handa on Wed, 25 Nov 2009 10:33:54 +0900) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117926 Archived-At: We don't want to raise the priority of windows-1252 because it would cause many other encodings not to be recognized. If it turns out that windows-1252 files are the main cause of 8-bit-control characters in the buffer, here's another idea. If visiting a file gives you some 8-bit-control characters, ask the user "Is this file encoded in Windows encoding (windows-1252)?" and do so if she says yes. Here's another idea. We could employ some heuristics to see if the distribution of those characters seems typical for the way those characters are used. For instance, some of the punctuation characters (the ones that represent quotation marks) should always have whitespace or punctuation on at least one side. Also, there should be no ASCII control characters other than whitespace. Maybe more specific heuristics can be developed. These could be used as conditions for recognizing the file as windows-1252. If these heuristics are strong enough, they could reject nearly all false matches, provided the file is long enough. (A minimum length could be part of the conditions.) Then we could increase the priority of windows-1252 without the bad side effect of using it when it is not intended. This is ad-hoc, and not elegant. But the problem is important enough in practice that an ad-hoc solution is justified if it works well.