From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Coding system robustness? Date: Sat, 19 Mar 2005 09:52:26 +0900 (JST) Message-ID: <200503190052.JAA22388@etlken.m17n.org> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1111194037 22390 80.91.229.2 (19 Mar 2005 01:00:37 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 19 Mar 2005 01:00:37 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Mar 19 02:00:36 2005 Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1DCSKK-0001mJ-SJ for ged-emacs-devel@m.gmane.org; Sat, 19 Mar 2005 02:00:29 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DCSav-0001cS-K5 for ged-emacs-devel@m.gmane.org; Fri, 18 Mar 2005 20:17:37 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1DCSZl-0001IV-H0 for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:25 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1DCSZg-0001H8-OK for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:22 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1DCSZg-0001Fw-K0 for emacs-devel@gnu.org; Fri, 18 Mar 2005 20:16:20 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1DCSCa-00043p-Um; Fri, 18 Mar 2005 19:52:29 -0500 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j2J0qQdY009732; Sat, 19 Mar 2005 09:52:26 +0900 Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.12.3/8.12.3/Debian-7.1) with ESMTP id j2J0qQDI031551; Sat, 19 Mar 2005 09:52:26 +0900 Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id JAA22388; Sat, 19 Mar 2005 09:52:26 +0900 (JST) Original-To: David Kastrup In-reply-to: (message from David Kastrup on Fri, 18 Mar 2005 18:45:42 +0100) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3.50 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org X-MailScanner-To: ged-emacs-devel@m.gmane.org Xref: news.gmane.org gmane.emacs.devel:34754 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:34754 In article , David Kastrup writes: > I'd like to know whether coding systems in general are supposed to be > robust, meaning that decoding some random byte string into the coding > system and reencoding it is guaranteed to deliver the same byte string > again? In genenral, no. > Background for that question: I do error association in preview-latex > (via AUCTeX) with the original source text, and generally unrobust > transformations of the input may happen, such as splitting a > multibyte-char in the middle, or translitering some of those chars, > but not others. I currently work this by having the process use a > raw-text encoding, replace potentially questionable stuff and reencode > when it turns out that the contexts do not match the source file. > This has the disadvantage that > a) I need to go through the works even in case TeX is set up nicely > enough to deliver mostly working characters, since the raw encoding > will match much less often than a properly decoded stream. > b) The displayed output looks like junk unnecessarily. If we are > talking about multi-file documents written in different encodings, > this problem is not possible to avoid with tolerable effort, but in > the case where the encodings in one document match, it would be nicer > to have AUCTeX have a nicer output buffer. > So what encodings are expected to be "transparent" for what versions > of Emacs (we are only interested in 21.3 and newer)? These are detected as transparent automatically by the attached code by the latest code. chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3 iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text I expect more CCL-based coding systems (lots of CPXXX) are also transparent (at least utf-XX are so), but can't be detected automatically. --- Ken'ichi HANDA handa@m17n.org (let ((round-trip-safe nil)) (dolist (elt (coding-system-list t)) (and (not (coding-system-pre-write-conversion elt)) (not (coding-system-post-read-conversion elt)) (let ((type (coding-system-type elt))) (if (memq type '(0 1 3 5)) (push elt round-trip-safe) (if (eq type 2) (let ((flags (coding-system-flags elt))) (if (and (not (consp (aref flags 0))) (not (consp (aref flags 1))) (not (consp (aref flags 2))) (not (consp (aref flags 3))) (not (aref flags 8))) (push elt round-trip-safe)))))))) (pp round-trip-safe) nil)