From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: "Stefan Monnier" Newsgroups: gmane.emacs.devel Subject: Re: Editing the 0x80..0x90 characters Date: Wed, 08 May 2002 20:15:04 -0400 Sender: emacs-devel-admin@gnu.org Message-ID: <200205090015.g490F5h13042@rum.cs.yale.edu> References: <3CD177AB.75299AE0@is.elta.co.il> <3CD2581B.42469AF1@is.elta.co.il> <5l6625dsfu.fsf@rum.cs.yale.edu> <3CD2E2E5.4EF10708@is.elta.co.il> <5lsn56cf58.fsf@rum.cs.yale.edu> <3CD5F7CB.F2FFD9B4@is.elta.co.il> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1020903497 1651 127.0.0.1 (9 May 2002 00:18:17 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 9 May 2002 00:18:17 +0000 (UTC) Cc: emacs-devel@gnu.org Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 175bdR-0000QW-00 for ; Thu, 09 May 2002 02:18:17 +0200 Original-Received: from fencepost.gnu.org ([199.232.76.164]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 175bln-0002cU-00 for ; Thu, 09 May 2002 02:26:55 +0200 Original-Received: from localhost ([127.0.0.1] helo=fencepost.gnu.org) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 175bdG-0003xH-00; Wed, 08 May 2002 20:18:06 -0400 Original-Received: from rum.cs.yale.edu ([128.36.229.169]) by fencepost.gnu.org with esmtp (Exim 3.34 #1 (Debian)) id 175baO-0003Zp-00 for ; Wed, 08 May 2002 20:15:08 -0400 Original-Received: (from monnier@localhost) by rum.cs.yale.edu (8.11.6/8.11.6) id g490F5h13042; Wed, 8 May 2002 20:15:05 -0400 X-Mailer: exmh version 2.4 06/23/2000 with nmh-1.0.4 Original-To: Eli Zaretskii X-MIME-Autoconverted: from 8bit to quoted-printable by rum.cs.yale.edu id g490F5h13042 Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.9 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:3755 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:3755 > > I'm more interested in the fundamental idea of using > > the mule-unicode charset instead of the eight-bit-(graphic|control) > > charset to encode the non-iso-8859-5 characters. >=20 > IMHO, there's nothing wrong with that idea. Of course, users who use s= uch > code will have to make sure their preferences are set up correctly beca= use > saving the resulting buffer with anything but the same cpNNN encoding w= ill > be, well, tricky (due to mixed character sets). I don't understand what you mean. Currently those coding systems decode into (or encode from) ascii + latin-iso8859-5 + eight-bit-control + eight-bit-graphic. The idea is to change it to decode into ascii + latin-iso8859-5 + mule-unicode. So I don't see how the problem is mixed character sets is changed. I don't understand anything about cpXXX charsets, so I'm not sure how to fix the problems you mentioned earlier. What do you think of the patch below ? I'm not sure what the koi8-u stuff is about. I suspect it's also meant for ukrainian so the new language environment should maybe be "Ukrainian" rather than "Cyrillic-KOI8-U" since that's what mule-cmds.el seems to expect for the "uk" locale. Stefan Index: cyrillic.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/emacs/emacs/lisp/language/cyrillic.el,v retrieving revision 1.30 diff -u -u -b -r1.30 cyrillic.el --- cyrillic.el 18 Dec 2001 17:50:12 -0000 1.30 +++ cyrillic.el 9 May 2002 00:13:53 -0000 @@ -25,8 +25,10 @@ ;;; Commentary: =20 ;; The character set ISO8859-5 is supported. See -;; http://www.ecma.ch/ecma1/STAND/ECMA-113.HTM. KOI-8 and +;; . KOI-8 and ;; ALTERNATIVNYJ are converted to ISO8859-5 internally. +;; For more info on Cyrillic charsets, see +;; . =20 =20 ;;; Code: =20 @@ -56,8 +58,11 @@ (documentation . "Support for Cyrillic ISO-8859-5.")) '("Cyrillic")) =20 -;; KOI-8 staff +;; KOI-8 stuff =20 +;; The mule-unicode portion of this is from +;; http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT, +;; which references RFC 1489. (defvar cyrillic-koi8-r-decode-table [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 @@ -68,10 +73,10 @@ 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 - 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 - 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 - 160 161 162 ?164 165 166 167 168 169 170 171 172 173 174 175 176 177= 178 ?180 181 182 183 184 185 186 187 188 189 190 191 ? ?=A0=A2 ?=A0=AC = ?=A0=B0 ?=A0=B4 ?=A0=B8 ?=A0=BC ?=A0=C4 ?=A0=CC ?=A0=D4 ?=A0=DC ?=A1=C0 ?= =A1=C4 ?=A1=C8 ?=A1=CC ?=A1=D0 ? ?=A1=D2 ?=A1=D3 ? ? ? ?=F8=BA ?=F8=E8 ?= =F9=A4 ?=F9=A5 ? ?=B2 ?=B7 ?=F7 + ? ?=A0=F1 ?=A0=F2 ? ?=A0=F4 ?=A0=F5 ?=A0=F6 ?=A0=F7 ?=A0=F8 ?=A0=F9 ?= =A0=FA ?=A0=FB ?=A0=FC ?=A0=FD ?=A0=FE ? ?=A1=A0 ?=A1=A1 ? ?=A1=A3 ?=A1=A4= ?=A1=A5 ?=A1=A6 ?=A1=A7 ?=A1=A8 ?=A1=A9 ?=A1=AA ?=A1=AB ?=A1=AC ? ??=D0= ?=D1 ?=E6 ?=D4 ?=D5 ?=E4 ?=D3 ?=E5 ?=D8 ?=D9 ?=DA ?=DB ?=DC = ?=DD ?=DE ??=EF ?=E0 ?=E1 ?=E2 ?=E3 ?=D6 ?=D2 ?=EC ?=EB ?=D7= ?=E8 ?=ED ?=E9 ?=E7 ?=EA ??=B0 ?=B1 ?=C6 ?=B4 ?=B5 ?=C4 ?=B3= ?=C5 ?=B8 ?=B9 ?=BA ?=BB ?=BC ?=BD ?=BE -94,16 +99,15 @@ ((translate-character cyrillic-koi8-r-nonascii-translation-table r0 r1) (write-multibyte-character r0 r1) (repeat)))))) - "CCL program to decode KOI8.") + "CCL program to decode KOI8-R.") =20 (define-ccl-program ccl-encode-koi8 `(1 ((loop (read-multibyte-character r0 r1) - (if (r0 =3D=3D ,(charset-id 'cyrillic-iso8859-5)) - (translate-character cyrillic-koi8-r-encode-table r0 r1)) + (translate-character cyrillic-koi8-r-encode-table r0 r1) (write-repeat r1)))) - "CCL program to encode KOI8.") + "CCL program to encode KOI8-R.") =20 (make-coding-system 'cyrillic-koi8 4 @@ -127,6 +131,7 @@ =20 (define-coding-system-alias 'koi8-r 'cyrillic-koi8) (define-coding-system-alias 'koi8 'cyrillic-koi8) +;; (define-coding-system-alias 'cp878 'cyrillic-koi8) =20 (define-ccl-program ccl-encode-koi8-font `(0 @@ -150,6 +155,90 @@ (documentation . "Support for Cyrillic KOI8-R.")) '("Cyrillic")) =20 + +(defvar cyrillic-koi8-u-decode-table + [ + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 + 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 + 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 + 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 + 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 + 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 + 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 + 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 + ? ?=A0=A2 ?=A0=AC ?=A0=B0 ?=A0=B4 ?=A0=B8 ?=A0=BC ?=A0=C4 ?=A0=CC ?=A0= =D4 ?=A0=DC ?=A1=C0 ?=A1=C4 ?=A1=C8 ?=A1=CC ?=A1=D0 ? ?=A1=D2 ?=A1=D3 ? = ? ? ?=F8=BA ?=F8=E8 ?=F9=A4 ?=F9=A5 ? ?=B2 ?=B7 ?=F7 + ? ?=A0=F1 ?=A0=F2 ?=F4 ? ?=F7 ? ?=A0=F8 ?=A0=F9 ?=A0=FA ?=A0=FB ? ? ?= =A0=FE ? ?=A1=A0 ?=A1=A1 ?=A4 ? ?=A7 ? ?=A1=A7 ?=A1=A8 ?=A1=A9 ?=A1=AA ?= ? ? ??=D0 ?=D1 ?=E6 ?=D4 ?=D5 ?=E4 ?=D3 ?=E5 ?=D8 ?=D9 ?=DA = ?=DB ?=DC ?=DD ?=DE ??=EF ?=E0 ?=E1 ?=E2 ?=E3 ?=D6 ?=D2 ?=EC= ?=EB ?=D7 ?=E8 ?=ED ?=E9 ?=E7 ?=EA ??=B0 ?=B1 ?=C6 ?=B4 ?=B5= ?=C4 ?=B3 ?=C5 ?=B8 ?=B9 ?=BA ?=BB ?=BC ?=BD ?=BE ??=CF ?=C0= ?=C1 ?=C2 ?=C3 ?=B6 ?=B2 ?=CC ?=CB ?=B7 ?=C8 ?=CD ?=C9 ?=C7 = ?=CA ] "Cyrillic KOI8-U decoding table.") + +(let ((table (make-translation-table-from-vector + cyrillic-koi8-u-decode-table))) + (define-translation-table 'cyrillic-koi8-u-nonascii-translation-table = table) + (define-translation-table 'cyrillic-koi8-u-encode-table + (char-table-extra-slot table 0))) + +(define-ccl-program ccl-decode-koi8-u + `(3 + ((loop + (r0 =3D 0) + (read r1) + (if (r1 < 128) + (write-repeat r1) + ((translate-character cyrillic-koi8-u-nonascii-translation-table r0 r1) + (write-multibyte-character r0 r1) + (repeat)))))) + "CCL program to decode KOI8-U.") + +(define-ccl-program ccl-encode-koi8-u + `(1 + ((loop + (read-multibyte-character r0 r1) + (translate-character cyrillic-koi8-u-encode-table r0 r1) + (write-repeat r1)))) + "CCL program to encode KOI8-U.") + +(make-coding-system + 'koi8-u 4 + ?U "KOI8 8-bit encoding for Cyrillic (MIME: KOI8-U)" + '(ccl-decode-koi8-u . ccl-encode-koi8-u) + `((safe-chars . ,(let ((table (make-char-table 'safe-chars)) + (i 0)) + (while (< i 256) + (aset table (aref cyrillic-koi8-u-decode-table i) t) + (setq i (1+ i))) + table)) + (mime-charset . koi8-u) + (valid-codes (0 . 127) 163 179 (192 . 255)) + (charset-origin-alist (cyrillic-iso8859-5 "KOI8-U" + cyrillic-encode-koi8-u-char)))) + +(define-ccl-program ccl-encode-koi8-u-font + `(0 + ((translate-character cyrillic-koi8-u-encode-table r0 r1))) + "CCL program to encode Cyrillic chars to KOI8-U font.") + +(setq font-ccl-encoder-alist + (cons '("koi8-u" . ccl-encode-koi8-u-font) font-ccl-encoder-alist)= ) + +(set-language-info-alist + "Cyrillic-KOI8-U" `((charset cyrillic-iso8859-5) + (nonascii-translation + . ,(get 'cyrillic-koi8-u-nonascii-translation-table + 'translation-table)) + (coding-system cyrillic-koi8-u) + (coding-priority cyrillic-koi8-u) + (input-method . "cyrillic-jcuken") + (features cyril-util) + (unibyte-display . cyrillic-koi8-u) + (sample-text . "Russian (=E1=DA=D8=D9)L=B7=D4=E0=D0=D2=E1=E2=D2=E3=D9= =E2=D5!") (documentation . "Support for Cyrillic KOI8-U.")) + '("Cyrillic")) + ;;; ALTERNATIVNYJ staff =20 (defvar cyrillic-alternativnyj-decode-table @@ -165,11 +254,11 @@ ??=B1 ?=B2 ?=B3 ?=B4 ?=B5 ?=B6 ?=B7 ?=B8 ?=B9 ?=BA ?=BB ?=BC= ?=BD ?=BE ?=BF ??=C1 ?=C2 ?=C3 ?=C4 ?=C5 ?=C6 ?=C7 ?=C8 ?=C9= ?=CA ?=CB ?=CC ?=CD ?=CE ?=CF ??=D1 ?=D2 ?=D3 ?=D4 ?=D5 ?=D6= ?=D7 ?=D8 ?=D9 ?=DA ?=DB ?=DC ?=DD ?=DE ?=DF 176 177 178 179 1= 80 181 182 183 184 185 186 187 188 189 190 191 - 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 - 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 + ? ?=A1=D2 ?=A1=D3 ?=A0=A2 ?=A0=C4 ?=A1=A1 ?=A1=A2 ?=A0=F6 ?=A0= =F5 ?=A1=A3 ?=A0=F1 ?=A0=F7 ?=A0=FD ?=A0=FC ?=A0=FB ?=A0=B0 ? ?=A0= =D4 ?=A0=CC ?=A0=BC ?=A0=A0 ?=A0=DC ?=A0=FE ?=A0=FF ?=A0=FA ?=A0=F4= ?=A1=A9 ?=A1=A6 ?=A1=A0 ?=A0=F0 ?=A1=AC ?=A1=A7 ? ?=A1=A4 ?=A1=A5= ?=A0=F9 ?=A0=F8 ?=A0=F2 ?=A0=F3 ?=A1=AB ?=A1=AA ?=A0=B8 ?=A0=AC = ?=A1=C8 ?=A1=C4 ?=A1=CC ?=A1=D0 ?=A1=C0 ??=E1 ?=E2 ?=E3 ?=E4 ?=E5= ?=E6 ?=E7 ?=E8 ?=E9 ?=EA ?=EB ?=EC ?=ED ?=EE ?=EF ??=F1 242 = 243 244 245 246 247 248 249 250 251 252 253 254 ?=F0] ??=F1 ? ?=A8=F4 = ?=A8=A7 ?=A8=F7 ?=A8=AE ?=A8=FE ?? ?? ??? ?A] "Cyrillic ALTERNATIVNYJ decoding table.") =20 (let ((table (make-translation-table-from-vector @@ -213,11 +302,13 @@ (setq i (1+ i))) table)) (valid-codes (0 . 175) (224 . 241) 255) + ;; (mime-charset . cp866) (charset-origin-alist (cyrillic-iso8859-5 "ALTERNATIVNYJ" cyrillic-encode-koi8-r-char)))) =20 =20 (define-coding-system-alias 'alternativnyj 'cyrillic-alternativnyj) +;; (define-coding-system-alias 'cp866 'cyrillic-alternativnyj) =20 (define-ccl-program ccl-encode-alternativnyj-font '(0