unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: emacs-devel@gnu.org, mew-int@mew.org
Subject: [mew-int 01596] Re: windows 1252
Date: Fri, 7 Nov 2003 16:13:45 +0900 (JST)	[thread overview]
Message-ID: <200311070713.QAA24793@etlken.m17n.org> (raw)
In-Reply-To: <20031104.111334.60445673.kazu@iijlab.net>

I'm sorry for the late response on this thread.

I at first want to clarify these things:

(1) windows-1252

This is actually not a charset but a coding system in
Emacs.  When Emacs reads a file by this coding system, it
decode each byte into one of these character sets:
	ascii, latin-iso8859-1, mule-unicode-0100-24ff

(2) ctext (alias of compound-text)

On conversion, it works not fully compatible with the
specification of X Compound Text because it encodes any
Emacs characters while using an designation sequence for
private character sets (please note that all Emacs charasets
have a iso-final-char).  So, Big5 characters are preceded by
ESC $ ( 0 or 1, mule-unicode-0100-24ff characters are
preceded by ESC - 1.

(3) ctext-with-extensions (alias of compound-text-with-extensions)

It can handle several kinds of "extended segment".  On
decoding, it handles ESC % / N M L ... ^b for what listed in
ctext-non-standard-encoding-alist, and ESC % G ...ESC % @
for UTF-8.  On encoding, it does two-path encoding; at first
by `compound-text', then re-encode what are encoded by a
designation sequence listed in
ctext-non-standard-designations-alist using the "extended
segment".  Currently only ESC $ ( 0 and ESC $ ( 1 are
listed.  Thus only Big5 are encoded using the "extended
segment".


As to the Mew case, I think the following is good.

When it runs under the current Emacs, keep using ctext but
add a coding tag to the file.  Emacs should be able to
encode/decode all Emacs characters.

When it runs under emacs-unicode version, on writing the
file, if all the characters can be encoded by ctext, keep
using it.  If not (because, in emacs-unicode, some character
doesn't belong to any charset that has iso-final-char), use
utf-8.  And in both cases, add a coding tag.  On reading,
check the coding tag at first.  If no coding tag, read by
ctext, otherwise, read by the coding system specified in the
tag.

By the way,

> The one-and-only coding-system which, I found, meets the requirements
> above is 'ctext.

I think iso-latin-1-with-esc also meets your requirements.

---
Ken'ichi HANDA
handa@m17n.org


  parent reply	other threads:[~2003-11-07  7:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20031029.160819.120233945.kazu@iijlab.net>
     [not found] ` <20031029.082403.193886873.wl@gnu.org>
     [not found]   ` <20031030.175736.39971315.kazu@iijlab.net>
2003-10-30 14:41     ` [mew-int 01581] Re: windows 1252 Werner LEMBERG
2003-10-31 11:04       ` [mew-int 01579] " Kenichi Handa
2003-10-31 12:39         ` [mew-int 01583] " Kazu Yamamoto
2003-11-01 15:36           ` [mew-int 01584] " Eli Zaretskii
2003-11-02  6:41             ` [mew-int 01582] " Stephen J. Turnbull
2003-11-04  2:13               ` [mew-int 01586] " Kazu Yamamoto
2003-11-04  5:55                 ` [mew-int 01585] " Eli Zaretskii
2003-11-04  6:13                   ` [mew-int 01587] " Kazu Yamamoto
2003-11-04  6:23                   ` [mew-int 01589] " Stephen J. Turnbull
2003-11-04 15:13                 ` [mew-int 01590] " Stefan Monnier
2003-11-04 15:55                   ` [mew-int 01591] " Kazu Yamamoto
2003-11-04 17:04                     ` [mew-int 01590] " Stefan Monnier
2003-11-04 18:45                     ` Stephen J. Turnbull
2003-11-05  1:59                       ` [mew-int 01594] " Kazu Yamamoto
2003-11-05  5:00                         ` [mew-int 01593] " Stephen J. Turnbull
2003-11-07  7:30                           ` Kenichi Handa
2003-11-07  7:28                         ` [mew-int 01597] " Kenichi Handa
2003-11-07  8:21                           ` [mew-int 01599] " Kazu Yamamoto
2003-11-07  7:13                 ` Kenichi Handa [this message]
2003-11-10  7:11                   ` [mew-int 01607] " Kazu Yamamoto
2003-11-10  7:42                     ` [mew-int 01608] " Kenichi Handa
2003-11-12 16:36                       ` [mew-int 01596] " Stephen J. Turnbull
2003-11-13  1:01                         ` Kenichi Handa
2003-11-13 16:32                           ` Stephen J. Turnbull
2003-11-14  2:57                             ` Kenichi Handa
2003-11-14 11:20                               ` Stephen J. Turnbull
2003-11-14 12:02                                 ` Kenichi Handa
2003-11-13 19:49                           ` Eli Zaretskii
2003-11-14  3:39                             ` [mew-int 01621] " Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200311070713.QAA24793@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    --cc=mew-int@mew.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).