unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: auto-detection
       [not found]       ` <jwvhe11d1s7.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
@ 2003-11-21  2:06         ` Kenichi Handa
  0 siblings, 0 replies; only message in thread
From: Kenichi Handa @ 2003-11-21  2:06 UTC (permalink / raw)
  Cc: epameinondas, emacs-unicode, emacs-devel

In article <jwvhe11d1s7.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>, Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

>>>  I think it would be good when saving a file to automatically verify that
>>>  the coding-system chosen will be correctly auto-detected if read by
>>>  a similarly-configured Emacs.  This is already done w.r.t the
>>>  coding-cookie but not with the auto-detection.
>>  The easy but slow way to implement it is to insert the file
>>  again in a temporary buffer with (let
>>  ((coding-system-for-read 'undecided)) ..), and check which
>>  coding system is detected.  And I think any other methods
>>  are quite difficult to implement.

> That's indeed the problem: there doesn't seem to be any easy way to make
> the test robust and lightweight.

Something like this function is mostly acculate and
lightweight.  It would be better that it also accepts FILE
argument to check auto-coding-alist and
file-coding-system-alist.  But, for the moment, I don't have
a time to work on it further.

(defun coding-system-round-trip-safe-p (coding-system from to &optional string)
  "Check if CODING-SYSTEM is round-trip safe for the region FROM and TO.

The value is non-nil if and only if we can recover the same text
by encoding a text in the region between FROM and TO with
CODING-SYSTEM and decoding the result back with auto-detection.

In the case the value is nil, you can check how it was asctually
detected by the value of `last-coding-system-used'.

If the optional 4th argument STRING is a string, FROM and TO are
indices to STRING defaulting to 0 and length of STRING
respectively.

The check is done only for the first 10 non-ASCII characters."
  (let ((str "")
	(count 10))
    (if (stringp string)
	(progn
	  (or from (setq from 0))
	  (or to (setq to (length string)))
	  (while (and (> count 0)
		      (setq from (string-match "[^\000-\177]" string from))
		      (< from to))
	    (setq str (concat str (string (aref string from)))
		  from (1+ from)
		  count (1- count))))
      (save-excursion
	(goto-char from)
	(while (and (> count 0)
		    (re-search-forward "[^\000-\177]" to t))
	  (setq str (concat str (string (preceding-char)))
		count (1- count)))))
    (or (= (length str) 0)
	(string= (decode-coding-string
		  (encode-coding-string str coding-system) 'undecided)
		 str))))

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2003-11-21  2:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <877k20a0q5.fsf@ID-87814.user.dfncis.de>
     [not found] ` <200311171103.UAA12032@etlken.m17n.org>
     [not found]   ` <jwvllqf9gci.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
     [not found]     ` <200311180127.KAA13174@etlken.m17n.org>
     [not found]       ` <jwvhe11d1s7.fsf-monnier+emacs/devel@vor.iro.umontreal.ca>
2003-11-21  2:06         ` auto-detection Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).