From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Kenichi Handa <handa@m17n.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Coding system robustness?
Date: Sat, 19 Mar 2005 09:52:26 +0900 (JST)
Message-ID: <200503190052.JAA22388@etlken.m17n.org>
References: <x54qf8df09.fsf@lola.goethe.zz>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya")
Content-Type: text/plain; charset=US-ASCII
X-Trace: sea.gmane.org 1111194037 22390 80.91.229.2 (19 Mar 2005 01:00:37 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sat, 19 Mar 2005 01:00:37 +0000 (UTC)
Cc: emacs-devel@gnu.org
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 19 02:00:36 2005
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1DCSKK-0001mJ-SJ
	for ged-emacs-devel@m.gmane.org; Sat, 19 Mar 2005 02:00:29 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1DCSav-0001cS-K5
	for ged-emacs-devel@m.gmane.org; Fri, 18 Mar 2005 20:17:37 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1DCSZl-0001IV-H0
	for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:25 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1DCSZg-0001H8-OK
	for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:22 -0500
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1DCSZg-0001Fw-K0
	for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:20 -0500
Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org)
	by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168)
	(Exim 4.34) id 1DCSCa-00043p-Um; Fri, 18 Mar 2005 19:52:29 -0500
Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7])
	by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id
	j2J0qQdY009732; Sat, 19 Mar 2005 09:52:26 +0900
Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125])
	by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j2J0qQDI031551;
	Sat, 19 Mar 2005 09:52:26 +0900
Original-Received: (from handa@localhost)
	by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA22388;
	Sat, 19 Mar 2005 09:52:26 +0900 (JST)
Original-To: David Kastrup <dak@gnu.org>
In-reply-to: <x54qf8df09.fsf@lola.goethe.zz> (message from David Kastrup on
	Fri, 18 Mar 2005 18:45:42 +0100)
User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2
	Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
X-MailScanner-To: ged-emacs-devel@m.gmane.org
Xref: news.gmane.org gmane.emacs.devel:34754
X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:34754

In article <x54qf8df09.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes:
> I'd like to know whether coding systems in general are supposed to be
> robust, meaning that decoding some random byte string into the coding
> system and reencoding it is guaranteed to deliver the same byte string
> again?

In genenral, no.

> Background for that question: I do error association in preview-latex
> (via AUCTeX) with the original source text, and generally unrobust
> transformations of the input may happen, such as splitting a
> multibyte-char in the middle, or translitering some of those chars,
> but not others.  I currently work this by having the process use a
> raw-text encoding, replace potentially questionable stuff and reencode
> when it turns out that the contexts do not match the source file.
> This has the disadvantage that

> a) I need to go through the works even in case TeX is set up nicely
> enough to deliver mostly working characters, since the raw encoding
> will match much less often than a properly decoded stream.

> b) The displayed output looks like junk unnecessarily.  If we are
> talking about multi-file documents written in different encodings,
> this problem is not possible to avoid with tolerable effort, but in
> the case where the encodings in one document match, it would be nicer
> to have AUCTeX have a nicer output buffer.

> So what encodings are expected to be "transparent" for what versions
> of Emacs (we are only interested in 21.3 and newer)?

These are detected as transparent automatically by the
attached code by the latest code.

chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2
iso-latin-3 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9
iso-safe japanese-iso-8bit japanese-shift-jis
korean-iso-8bit raw-text

I expect more CCL-based coding systems (lots of CPXXX) are
also transparent (at least utf-XX are so), but can't be
detected automatically.

---
Ken'ichi HANDA
handa@m17n.org

(let ((round-trip-safe nil))
  (dolist (elt (coding-system-list t))
    (and (not (coding-system-pre-write-conversion elt))
	 (not (coding-system-post-read-conversion elt))
	 (let ((type (coding-system-type elt)))
	   (if (memq type '(0 1 3 5))
	       (push elt round-trip-safe)
	     (if (eq type 2)
		 (let ((flags (coding-system-flags elt)))
		   (if (and (not (consp (aref flags 0)))
			    (not (consp (aref flags 1)))
			    (not (consp (aref flags 2)))
			    (not (consp (aref flags 3)))
			    (not (aref flags 8)))
		       (push elt round-trip-safe))))))))
  (pp round-trip-safe)
  nil)