unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* adding a new encoding
@ 2004-06-19 15:03 Baurjan Ismagulov
  2004-06-20 16:35 ` Baurjan Ismagulov
  0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-19 15:03 UTC (permalink / raw)


Hello,

I want to use Emacs to edit text in PT154 encoding. I've tried to type
the characters pretending they were in KOI8-R, but Emacs (I'm using
21.2) hasn't swallow it :) (it displayed some, beeped at some, and did
nothing for some).

I suppose I need to tell Emacs about PT154. I've started to dig through
lisp/language and lisp/international, but it's a bit overwhelming for a
novice. Could anyone give any tips where to start?

Thanks in advance,
Baurjan.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: adding a new encoding
  2004-06-19 15:03 adding a new encoding Baurjan Ismagulov
@ 2004-06-20 16:35 ` Baurjan Ismagulov
  2004-06-20 23:18   ` Kenichi Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-20 16:35 UTC (permalink / raw)


[-- Attachment #1: mutt4EvYRt --]
[-- Type: text/plain, Size: 1008 bytes --]

Hello,

On Sat, Jun 19, 2004 at 05:03:01PM +0200, Baurjan Ismagulov wrote:
> I want to use Emacs to edit text in PT154 encoding.

Here's what I've understood (please correct me if anything is wrong):

* Emacs uses ISO 2022 internally.

* Supported Cyrillic encodings map to ISO 8859-5.

* PT154 (http://www.iana.org/assignments/charset-reg/PTCP154) cannot be
  mapped to ISO 8859-5 due to at least 2x18 characters not present in
  the latter.

* One possible solution would be to define-charset cyrillic-asian.
  However, the number of characters in a new charset is limited to 94 or
  96, and there are 108 non-us-ascii letters in PT154.


So, the questions:

* Can I define cyrillic-asian with 36 characters and use it together
  with us-ascii and cyrillic-iso8859-5, or should I define
  cyrillic-asian-1 and cyrillic-asian-2, duplicating 2x(33+3) chars
  already present in cyrillic-iso8859-5?

* What to do with the final char? These charsets are not registered in
  ECMA.


Thanks in advance,
Baurjan.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: adding a new encoding
  2004-06-20 16:35 ` Baurjan Ismagulov
@ 2004-06-20 23:18   ` Kenichi Handa
  2004-06-21 20:06     ` Baurjan Ismagulov
  0 siblings, 1 reply; 5+ messages in thread
From: Kenichi Handa @ 2004-06-20 23:18 UTC (permalink / raw)
  Cc: emacs-devel

In article <20040620163514.GA3576@ata.cs.hun.edu.tr>, Baurjan Ismagulov <ibr@ata.cs.hun.edu.tr> writes:

> On Sat, Jun 19, 2004 at 05:03:01PM +0200, Baurjan Ismagulov wrote:
> > I want to use Emacs to edit text in PT154 encoding.

The latest (CVS) Emacs already supports that encoding.
Please try (require 'code-pages) in your .emacs.

> Here's what I've understood (please correct me if anything is wrong):

> * Emacs uses ISO 2022 internally.

Not correct.  Emacs uses character codes of each ISO-2022
conforming charsets.

> * Supported Cyrillic encodings map to ISO 8859-5.

Not always.  For instance, windows-1251 maps to
mule-unicode-0100-24ff.

> * PT154 (http://www.iana.org/assignments/charset-reg/PTCP154) cannot be
>   mapped to ISO 8859-5 due to at least 2x18 characters not present in
>   the latter.

Yes, thus the latest Emacs supports it by mapping to
mule-unicode-0100-24ff.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: adding a new encoding
  2004-06-20 23:18   ` Kenichi Handa
@ 2004-06-21 20:06     ` Baurjan Ismagulov
  2004-06-21 23:44       ` Kenichi Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Baurjan Ismagulov @ 2004-06-21 20:06 UTC (permalink / raw)


Hello, Kenichi!

Thanks for your reply!

On Mon, Jun 21, 2004 at 08:18:17AM +0900, Kenichi Handa wrote:
> The latest (CVS) Emacs already supports that encoding.
> Please try (require 'code-pages) in your .emacs.

Thanks for the tip, I'll try the cvs version (I was looking into Debian
unstable source).


> > * Emacs uses ISO 2022 internally.
> Not correct.  Emacs uses character codes of each ISO-2022
> conforming charsets.

And each character is prepended with a charset code? How do raw bytes
look like in Emacs memory for, say, \U+0410\U+00DF\U+0534? Is there an
easy way to see that (princ somewhere in
lisp/international/mule-cmds.el)?


With kind regards,
Baurjan.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: adding a new encoding
  2004-06-21 20:06     ` Baurjan Ismagulov
@ 2004-06-21 23:44       ` Kenichi Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Kenichi Handa @ 2004-06-21 23:44 UTC (permalink / raw)
  Cc: emacs-devel

In article <20040621200633.GF1361@ata.cs.hun.edu.tr>, Baurjan Ismagulov <ibr@ata.cs.hun.edu.tr> writes:
> > > * Emacs uses ISO 2022 internally.
> > Not correct.  Emacs uses character codes of each ISO-2022
> > conforming charsets.

> And each character is prepended with a charset code?

Yes, prepended with a charset code of range 0x80..0x9D plus
optional extented charset code of range 0xA0..0xFF, and code
points (0x20..0x7F) are `logior'ed with 0x80.  So a
multibyte representation of character is classified into
these:
	0x00..0x7F
	0x80..0x9D 0xA0..0xFF
	0x80..0x9D 0xA0..0xFF 0xA0..0xFF
	0x80..0x9D 0xA0..0xFF 0xA0..0xFF 0xA0..0xFF

> How do raw bytes
> look like in Emacs memory for, say, \U+0410\U+00DF\U+0534? Is there an
> easy way to see that (princ somewhere in
> lisp/international/mule-cmds.el)?

Try this:

(string-as-unibyte (string (decode-char 'ucs #x0410)
			   (decode-char 'ucs #x00DF)
			   (decode-char 'ucs #x0534)))

M-x list-character-sets also give some information.

But, you'ld better not write a code depending on it.
Unicode-based Emacs (that will come after the release of
current CVS HEAD) uses UTF-8 as a multibyte representation.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-06-21 23:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-19 15:03 adding a new encoding Baurjan Ismagulov
2004-06-20 16:35 ` Baurjan Ismagulov
2004-06-20 23:18   ` Kenichi Handa
2004-06-21 20:06     ` Baurjan Ismagulov
2004-06-21 23:44       ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).