* adding a new encoding
@ 2004-06-19 15:03 Baurjan Ismagulov
2004-06-20 16:35 ` Baurjan Ismagulov
0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-19 15:03 UTC (permalink / raw)
Hello,
I want to use Emacs to edit text in PT154 encoding. I've tried to type
the characters pretending they were in KOI8-R, but Emacs (I'm using
21.2) hasn't swallow it :) (it displayed some, beeped at some, and did
nothing for some).
I suppose I need to tell Emacs about PT154. I've started to dig through
lisp/language and lisp/international, but it's a bit overwhelming for a
novice. Could anyone give any tips where to start?
Thanks in advance,
Baurjan.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: adding a new encoding
2004-06-19 15:03 adding a new encoding Baurjan Ismagulov
@ 2004-06-20 16:35 ` Baurjan Ismagulov
2004-06-20 23:18 ` Kenichi Handa
0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-20 16:35 UTC (permalink / raw)
[-- Attachment #1: mutt4EvYRt --]
[-- Type: text/plain, Size: 1008 bytes --]
Hello,
On Sat, Jun 19, 2004 at 05:03:01PM +0200, Baurjan Ismagulov wrote:
> I want to use Emacs to edit text in PT154 encoding.
Here's what I've understood (please correct me if anything is wrong):
* Emacs uses ISO 2022 internally.
* Supported Cyrillic encodings map to ISO 8859-5.
* PT154 (http://www.iana.org/assignments/charset-reg/PTCP154) cannot be
mapped to ISO 8859-5 due to at least 2x18 characters not present in
the latter.
* One possible solution would be to define-charset cyrillic-asian.
However, the number of characters in a new charset is limited to 94 or
96, and there are 108 non-us-ascii letters in PT154.
So, the questions:
* Can I define cyrillic-asian with 36 characters and use it together
with us-ascii and cyrillic-iso8859-5, or should I define
cyrillic-asian-1 and cyrillic-asian-2, duplicating 2x(33+3) chars
already present in cyrillic-iso8859-5?
* What to do with the final char? These charsets are not registered in
ECMA.
Thanks in advance,
Baurjan.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: adding a new encoding
2004-06-20 16:35 ` Baurjan Ismagulov
@ 2004-06-20 23:18 ` Kenichi Handa
2004-06-21 20:06 ` Baurjan Ismagulov
0 siblings, 1 reply; 5+ messages in thread
From: Kenichi Handa @ 2004-06-20 23:18 UTC (permalink / raw)
Cc: emacs-devel
In article <20040620163514.GA3576@ata.cs.hun.edu.tr>, Baurjan Ismagulov <ibr@ata.cs.hun.edu.tr> writes:
> On Sat, Jun 19, 2004 at 05:03:01PM +0200, Baurjan Ismagulov wrote:
> > I want to use Emacs to edit text in PT154 encoding.
The latest (CVS) Emacs already supports that encoding.
Please try (require 'code-pages) in your .emacs.
> Here's what I've understood (please correct me if anything is wrong):
> * Emacs uses ISO 2022 internally.
Not correct. Emacs uses character codes of each ISO-2022
conforming charsets.
> * Supported Cyrillic encodings map to ISO 8859-5.
Not always. For instance, windows-1251 maps to
mule-unicode-0100-24ff.
> * PT154 (http://www.iana.org/assignments/charset-reg/PTCP154) cannot be
> mapped to ISO 8859-5 due to at least 2x18 characters not present in
> the latter.
Yes, thus the latest Emacs supports it by mapping to
mule-unicode-0100-24ff.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: adding a new encoding
2004-06-20 23:18 ` Kenichi Handa
@ 2004-06-21 20:06 ` Baurjan Ismagulov
2004-06-21 23:44 ` Kenichi Handa
0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-21 20:06 UTC (permalink / raw)
Hello, Kenichi!
Thanks for your reply!
On Mon, Jun 21, 2004 at 08:18:17AM +0900, Kenichi Handa wrote:
> The latest (CVS) Emacs already supports that encoding.
> Please try (require 'code-pages) in your .emacs.
Thanks for the tip, I'll try the cvs version (I was looking into Debian
unstable source).
> > * Emacs uses ISO 2022 internally.
> Not correct. Emacs uses character codes of each ISO-2022
> conforming charsets.
And each character is prepended with a charset code? How do raw bytes
look like in Emacs memory for, say, \U+0410\U+00DF\U+0534? Is there an
easy way to see that (princ somewhere in
lisp/international/mule-cmds.el)?
With kind regards,
Baurjan.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: adding a new encoding
2004-06-21 20:06 ` Baurjan Ismagulov
@ 2004-06-21 23:44 ` Kenichi Handa
0 siblings, 0 replies; 5+ messages in thread
From: Kenichi Handa @ 2004-06-21 23:44 UTC (permalink / raw)
Cc: emacs-devel
In article <20040621200633.GF1361@ata.cs.hun.edu.tr>, Baurjan Ismagulov <ibr@ata.cs.hun.edu.tr> writes:
> > > * Emacs uses ISO 2022 internally.
> > Not correct. Emacs uses character codes of each ISO-2022
> > conforming charsets.
> And each character is prepended with a charset code?
Yes, prepended with a charset code of range 0x80..0x9D plus
optional extented charset code of range 0xA0..0xFF, and code
points (0x20..0x7F) are `logior'ed with 0x80. So a
multibyte representation of character is classified into
these:
0x00..0x7F
0x80..0x9D 0xA0..0xFF
0x80..0x9D 0xA0..0xFF 0xA0..0xFF
0x80..0x9D 0xA0..0xFF 0xA0..0xFF 0xA0..0xFF
> How do raw bytes
> look like in Emacs memory for, say, \U+0410\U+00DF\U+0534? Is there an
> easy way to see that (princ somewhere in
> lisp/international/mule-cmds.el)?
Try this:
(string-as-unibyte (string (decode-char 'ucs #x0410)
(decode-char 'ucs #x00DF)
(decode-char 'ucs #x0534)))
M-x list-character-sets also give some information.
But, you'ld better not write a code depending on it.
Unicode-based Emacs (that will come after the release of
current CVS HEAD) uses UTF-8 as a multibyte representation.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-06-21 23:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-19 15:03 adding a new encoding Baurjan Ismagulov
2004-06-20 16:35 ` Baurjan Ismagulov
2004-06-20 23:18 ` Kenichi Handa
2004-06-21 20:06 ` Baurjan Ismagulov
2004-06-21 23:44 ` Kenichi Handa
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.