From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.mail.mew.general,gmane.emacs.devel Subject: [mew-int 01596] Re: windows 1252 Date: Fri, 7 Nov 2003 16:13:45 +0900 (JST) Message-ID: <200311070713.QAA24793@etlken.m17n.org> References: <20031031.213916.222075160.kazu@iijlab.net> <2719-Sat01Nov2003173635+0200-eliz@elta.co.il> <87llqzuvaj.fsf@tleepslib.sk.tsukuba.ac.jp> <20031104.111334.60445673.kazu@iijlab.net> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1068189282 22577 80.91.224.253 (7 Nov 2003 07:14:42 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 7 Nov 2003 07:14:42 +0000 (UTC) Cc: emacs-devel@gnu.org, mew-int@mew.org Original-X-From: mew-int-return-1596-gmmg-mew-int=m.gmane.org@mew.org Fri Nov 07 08:14:40 2003 Return-path: Original-Received: from mew2.iijlab.net ([202.232.15.102]) by deer.gmane.org with smtp (Exim 3.35 #1 (Debian)) id 1AI0pL-0000jl-00 for ; Fri, 07 Nov 2003 08:14:39 +0100 Original-Received: (qmail 17727 invoked by uid 7800); 7 Nov 2003 07:14:28 -0000 Mailing-List: contact mew-int-help@mew.org; run by ezmlm Precedence: bulk List-Unsubscribe: Original-Received: (qmail 17700 invoked from network); 7 Nov 2003 07:13:48 -0000 Original-Received: from unknown (HELO tsukuba.m17n.org) (192.47.44.130) by 202.232.15.102 with SMTP; 7 Nov 2003 07:13:48 -0000 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hA77Dlh19370; Fri, 7 Nov 2003 16:13:47 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hA77Dks24849; Fri, 7 Nov 2003 16:13:46 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id QAA24793; Fri, 7 Nov 2003 16:13:45 +0900 (JST) Original-To: kazu@iijlab.net In-reply-to: <20031104.111334.60445673.kazu@iijlab.net> User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/21.3 (sparc-sun-solaris2.6) MULE/5.0 (SAKAKI) X-ML-Name: mew-int X-Mail-Count: 01596 Xref: main.gmane.org gmane.mail.mew.general:521 gmane.emacs.devel:17714 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:17714 I'm sorry for the late response on this thread. I at first want to clarify these things: (1) windows-1252 This is actually not a charset but a coding system in Emacs. When Emacs reads a file by this coding system, it decode each byte into one of these character sets: ascii, latin-iso8859-1, mule-unicode-0100-24ff (2) ctext (alias of compound-text) On conversion, it works not fully compatible with the specification of X Compound Text because it encodes any Emacs characters while using an designation sequence for private character sets (please note that all Emacs charasets have a iso-final-char). So, Big5 characters are preceded by ESC $ ( 0 or 1, mule-unicode-0100-24ff characters are preceded by ESC - 1. (3) ctext-with-extensions (alias of compound-text-with-extensions) It can handle several kinds of "extended segment". On decoding, it handles ESC % / N M L ... ^b for what listed in ctext-non-standard-encoding-alist, and ESC % G ...ESC % @ for UTF-8. On encoding, it does two-path encoding; at first by `compound-text', then re-encode what are encoded by a designation sequence listed in ctext-non-standard-designations-alist using the "extended segment". Currently only ESC $ ( 0 and ESC $ ( 1 are listed. Thus only Big5 are encoded using the "extended segment". As to the Mew case, I think the following is good. When it runs under the current Emacs, keep using ctext but add a coding tag to the file. Emacs should be able to encode/decode all Emacs characters. When it runs under emacs-unicode version, on writing the file, if all the characters can be encoded by ctext, keep using it. If not (because, in emacs-unicode, some character doesn't belong to any charset that has iso-final-char), use utf-8. And in both cases, add a coding tag. On reading, check the coding tag at first. If no coding tag, read by ctext, otherwise, read by the coding system specified in the tag. By the way, > The one-and-only coding-system which, I found, meets the requirements > above is 'ctext. I think iso-latin-1-with-esc also meets your requirements. --- Ken'ichi HANDA handa@m17n.org