From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Displaying bytes (was: Inadequate documentation of silly characters on screen.) Date: Wed, 25 Nov 2009 10:33:54 +0900 Message-ID: References: <20091119155848.GB1314@muc.de> <87aayiihe9.fsf@lola.goethe.zz> <20091119180848.GE1314@muc.de> <47325.130.55.118.19.1258658705.squirrel@webmail.lanl.gov> <20091119212550.GG1314@muc.de> <87my2ign8u.fsf@lola.goethe.zz> <912155b0911231334s2b52e8eq864251c9aed386b3@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1259112866 19885 80.91.229.12 (25 Nov 2009 01:34:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 25 Nov 2009 01:34:26 +0000 (UTC) Cc: per.starback@gmail.com, dak@gnu.org, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: rms@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 25 02:34:18 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1ND6lt-0006os-QZ for ged-emacs-devel@m.gmane.org; Wed, 25 Nov 2009 02:34:18 +0100 Original-Received: from localhost ([127.0.0.1]:39613 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ND6ls-0002KV-Mf for ged-emacs-devel@m.gmane.org; Tue, 24 Nov 2009 20:34:16 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1ND6ll-0002Ia-Tv for emacs-devel@gnu.org; Tue, 24 Nov 2009 20:34:09 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1ND6lh-0002HM-7M for emacs-devel@gnu.org; Tue, 24 Nov 2009 20:34:09 -0500 Original-Received: from [199.232.76.173] (port=46066 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ND6lh-0002HJ-2H for emacs-devel@gnu.org; Tue, 24 Nov 2009 20:34:05 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:57072) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1ND6lc-0000KP-Af; Tue, 24 Nov 2009 20:34:00 -0500 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id nAP1Xt8x000712; Wed, 25 Nov 2009 10:33:55 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id nAP1XtZF028609; Wed, 25 Nov 2009 10:33:55 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp2.aist.go.jp with ESMTP id nAP1XsM7027029; Wed, 25 Nov 2009 10:33:54 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1ND6lW-0006AU-Gx; Wed, 25 Nov 2009 10:33:54 +0900 In-Reply-To: (message from Richard Stallman on Tue, 24 Nov 2009 17:47:13 -0500) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117719 Archived-At: In article , Richard Stallman writes: > $ od -c euro.txt > 0000000 T h a t c o s t s 200 1 7 . \n > 0000020 > $ emacs euro.txt > This is really a windows-1252 file and the strange character is > supposed to be a Euro sign. > For me, with no particular setup to make Emacs expect windows-1252 > files that shows in emacs as > "That costs \20017." with raw-text-unix. > Why doesn't Emacs guess right, in this case? Because some other coding system of the same coding-category of windows-1252 (coding-category-charset) has the higher priority and that coding system doesn't contain code \200. > Could we make it guess right by changing the coding system > priorities? Yes. > If so, should we change the default priorities? I'm not sure. As it seems that windows-1252 is a superset of iso-8859-1, it may be ok to give windows-1252 the higher priority. How do iso-8859-1 users think? The better thing is to allow registering multiple coding systems in one coding-category, but I'm not sure I have a time to work on it. > It may be that a different set of priorities would cause similar > problems in some other cases and the current defaults are the best. > But if we have not looked at the question in several years, it would > be worth studying it now. > In that case revert-buffer-with-coding-system. Ideally I'd like Emacs > to ask directly when opening the file > in such a case, if it can't determine anything better than raw-bytes. > Maybe so. For that, it seems that adding that facility in after-insert-file-set-coding is good. Here's a sample patch. The actual change should give more information to a user. --- mule.el.~1.294.~ 2009-11-17 11:42:45.000000000 +0900 +++ mule.el 2009-11-25 10:17:49.000000000 +0900 @@ -1893,7 +1893,18 @@ coding-system-for-read (not (eq coding-system-for-read 'auto-save-coding))) (setq buffer-file-coding-system-explicit - (cons coding-system-for-read nil))) + (cons coding-system-for-read nil)) + (when (and last-coding-system-used + (eq (coding-system-base last-coding-system-used) 'raw-text)) + ;; Give a chance of decoding by some coding system. + (let ((coding-system (read-coding-system "Actual coding system: "))) + (if coding-system + (save-restriction + (narrow-to-region (point) (+ (point) inserted)) + (let ((modified (buffer-modified-p))) + (decode-coding-region (point-min) (point-max) coding-system) + (setq inserted (- (point-max) (point-min))) + (set-buffer-modified-p modified))))))) (if last-coding-system-used (let ((coding-system (find-new-buffer-file-coding-system last-coding-system-used))) --- Kenichi Handa handa@m17n.org