From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: auto-detection Date: Fri, 21 Nov 2003 11:06:34 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311210206.LAA18425@etlken.m17n.org> References: <877k20a0q5.fsf@ID-87814.user.dfncis.de> <200311171103.UAA12032@etlken.m17n.org> <200311180127.KAA13174@etlken.m17n.org> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1069380761 18032 80.91.224.253 (21 Nov 2003 02:12:41 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 21 Nov 2003 02:12:41 +0000 (UTC) Cc: epameinondas@gmx.de, emacs-unicode@gnu.org, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Fri Nov 21 03:12:36 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN0mi-0000NF-00 for ; Fri, 21 Nov 2003 03:12:36 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AN0mi-00053N-00 for ; Fri, 21 Nov 2003 03:12:36 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN1fo-0007kS-5J for emacs-devel@quimby.gnus.org; Thu, 20 Nov 2003 22:09:32 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AN1fS-0007k8-UF for emacs-devel@gnu.org; Thu, 20 Nov 2003 22:09:10 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AN1ev-0007W3-Ui for emacs-devel@gnu.org; Thu, 20 Nov 2003 22:09:09 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AN1eO-00076J-HG; Thu, 20 Nov 2003 22:08:04 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAL26Yh07600; Fri, 21 Nov 2003 11:06:34 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAL26Ys12501; Fri, 21 Nov 2003 11:06:34 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id LAA18425; Fri, 21 Nov 2003 11:06:34 +0900 (JST) Original-To: monnier@IRO.UMontreal.CA In-reply-to: (message from Stefan Monnier on 18 Nov 2003 12:31:58 -0500) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:17999 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17999 In article , Stefan Monnier writes: >>> I think it would be good when saving a file to automatically verify that >>> the coding-system chosen will be correctly auto-detected if read by >>> a similarly-configured Emacs. This is already done w.r.t the >>> coding-cookie but not with the auto-detection. >> The easy but slow way to implement it is to insert the file >> again in a temporary buffer with (let >> ((coding-system-for-read 'undecided)) ..), and check which >> coding system is detected. And I think any other methods >> are quite difficult to implement. > That's indeed the problem: there doesn't seem to be any easy way to make > the test robust and lightweight. Something like this function is mostly acculate and lightweight. It would be better that it also accepts FILE argument to check auto-coding-alist and file-coding-system-alist. But, for the moment, I don't have a time to work on it further. (defun coding-system-round-trip-safe-p (coding-system from to &optional string) "Check if CODING-SYSTEM is round-trip safe for the region FROM and TO. The value is non-nil if and only if we can recover the same text by encoding a text in the region between FROM and TO with CODING-SYSTEM and decoding the result back with auto-detection. In the case the value is nil, you can check how it was asctually detected by the value of `last-coding-system-used'. If the optional 4th argument STRING is a string, FROM and TO are indices to STRING defaulting to 0 and length of STRING respectively. The check is done only for the first 10 non-ASCII characters." (let ((str "") (count 10)) (if (stringp string) (progn (or from (setq from 0)) (or to (setq to (length string))) (while (and (> count 0) (setq from (string-match "[^\000-\177]" string from)) (< from to)) (setq str (concat str (string (aref string from))) from (1+ from) count (1- count)))) (save-excursion (goto-char from) (while (and (> count 0) (re-search-forward "[^\000-\177]" to t)) (setq str (concat str (string (preceding-char))) count (1- count))))) (or (= (length str) 0) (string= (decode-coding-string (encode-coding-string str coding-system) 'undecided) str)))) --- Ken'ichi HANDA handa@m17n.org