From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: TUTORIAL.bg and windows-1251 Date: Tue, 25 Nov 2003 08:55:52 +0900 (JST) Sender: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Message-ID: <200311242355.IAA24563@etlken.m17n.org> References: <3FB52552.6090302@fmi.uni-sofia.bg> <200311170721.QAA11735@etlken.m17n.org> <3FBA3F81.4010602@fmi.uni-sofia.bg> NNTP-Posting-Host: deer.gmane.org X-Trace: sea.gmane.org 1069718911 28824 80.91.224.253 (25 Nov 2003 00:08:31 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 25 Nov 2003 00:08:31 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Tue Nov 25 01:08:26 2003 Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AOQkk-0001fg-00 for ; Tue, 25 Nov 2003 01:08:26 +0100 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1AOQkj-00018s-00 for ; Tue, 25 Nov 2003 01:08:26 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AORg2-0003nT-9R for emacs-devel@quimby.gnus.org; Mon, 24 Nov 2003 20:07:38 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AORe3-0003Mh-KG for emacs-devel@gnu.org; Mon, 24 Nov 2003 20:05:35 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AORdW-0003Ep-MR for emacs-devel@gnu.org; Mon, 24 Nov 2003 20:05:33 -0500 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AORdV-0003Dr-Cq for emacs-devel@gnu.org; Mon, 24 Nov 2003 20:05:01 -0500 Original-Received: from fs.m17n.org (fs.m17n.org [192.47.44.2]) by tsukuba.m17n.org (8.11.6p2/3.7W-20010518204228) with ESMTP id hAONtrh18118; Tue, 25 Nov 2003 08:55:53 +0900 (JST) (envelope-from handa@m17n.org) Original-Received: from etlken.m17n.org (etlken.m17n.org [192.47.44.125]) by fs.m17n.org (8.11.6/3.7W-20010823150639) with ESMTP id hAONtrs10699; Tue, 25 Nov 2003 08:55:53 +0900 (JST) Original-Received: (from handa@localhost) by etlken.m17n.org (8.8.8+Sun/3.7W-2001040620) id IAA24563; Tue, 25 Nov 2003 08:55:52 +0900 (JST) Original-To: ogi@fmi.uni-sofia.bg In-reply-to: <3FBA3F81.4010602@fmi.uni-sofia.bg> (message from Ognyan Kulev on Tue, 18 Nov 2003 17:49:21 +0200) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+emacs-devel=quimby.gnus.org@gnu.org Xref: main.gmane.org gmane.emacs.devel:18094 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:18094 Sorry for the late responses on this thread. I'm now involved in threads more than what my capacity allows. In article <3FBA3F81.4010602@fmi.uni-sofia.bg>, Ognyan Kulev writes: > Kenichi Handa wrote: >> I think the default handling of cyrillic characters must be >> most convenient for native users. But, there are many >> languages that use cyrillic and their requests may conflict. >> So I think we must start from adjusting each language >> environment. Once we found most language environments >> require the same setting, we can make it the default. > Can X encoding be adjusted? Isn't there only two choices for cyrillic: > iso10646-1 and iso8859-5? It seems that bg_BG locale of glibc, gtk, or XFree86 (I don't know which is responsible for) encodes cyrillic characters using extended segment with charset name "microsoft-cp1251" in selection. Please try the attached file. It overrides the ctext encoder/decoder so that microsoft-cp1251 is used on decoding in Bulgarian lang. env. [...] > The negative site of Debian packages is that each encoding of the four > above mentioned has its own package. So people sometimes install only > microsoft-cp1251 and iso10646-1 fonts, without koi8-r and iso8859-5 ones. > Another problem with cronyx-courier is that it doesn't work when it's > set in Default in Basic Faces customize group. I've just posted > question to comp.emacs. > What about the following: when mule-unicode-0100-24ff is used and the > used iso10646-1 font doesn't contain wanted character (e.g. cyrillic > one), then another font is searched that contains such character. I > think this will often end up in cronyx-courier. Is this hard to be > implemented? I've implemented it in emacs-unicode verion. But, that change requires various infrastructure of emacs-unicode, so it's very difficult to back port it in HEAD. Anyway, the attached ctext.el also contains a short code to enable Emacs to display characters in windows-1251 by microsoft-cp1251 font. Please try to call (use-microsoft-cp1251-font). --- Ken'ichi HANDA handa@m17n.org --- ctext.el --- (defvar ctext-non-standard-encodings-database '(("big5-0" big5 2 (chinese-big5-1 chinese-big5-2))) "Alist of non-standard character set encodings for CTEXT's extended segments. Each element has the form (ENCODING-NAME CODING-SYSTEM N-OCTET CHARSET) and provides information about how to use \"extended segments\" with the encoding name ENCODING-NAME. CODING-SYSTEM is the coding-system to encode the characters into an extended segment. N-OCTET is the number of octets (bytes) that encodes a character in the segment. It can be 0 (meaning the number of octets per character is variable), 1, 2, 3, or 4. CHARSET is a charater set containing characters that are encoded as ENCODING-NAME. It may be a list of character sets. It may also be a char-table, in which case characters that have non-nil value in the char-table are the target. On decoding CTEXT, all encoding names listed here are recognized. On encoding CTEXT, encoding names in the variable `ctext-non-standard-encodings-list' and in `ctext-non-standard-encodings' property of the current language environment are used.") (defun ctext-post-read-conversion (len) "Decode LEN characters encoded as Compound Text with Extended Segments." (save-match-data (save-restriction (let ((case-fold-search nil) (in-workbuf (string= (buffer-name) " *code-converting-work*")) last-coding-system-used pos bytes) (or in-workbuf (narrow-to-region (point) (+ (point) len))) (decode-coding-region (point-min) (point-max) 'ctext) (if in-workbuf (set-buffer-multibyte t)) (while (re-search-forward ctext-non-standard-encodings-regexp nil 'move) (setq pos (match-beginning 0)) (if (match-beginning 1) ;; ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES-- (let* ((M (char-after (+ pos 4))) (L (char-after (+ pos 5))) (encoding (match-string 2)) (encoding-info (assoc-ignore-case encoding ctext-non-standard-encodings-database)) (coding (if encoding-info (nth 1 encoding-info) (setq encoding (intern (downcase encoding))) (and (coding-system-p encoding) encoding)))) (setq bytes (- (+ (* (- M 128) 128) (- L 128)) (- (point) (+ pos 6)))) (when coding (delete-region pos (point)) (forward-char bytes) (decode-coding-region (- (point) bytes) (point) coding))) ;; ESC % G --UTF-8-BYTES-- ESC % @ (setq bytes (- (point) pos)) (decode-coding-region (- (point) bytes) (point) 'utf-8)))) (goto-char (point-min)) (- (point-max) (point))))) (defvar ctext-non-standard-encodings-list '("big5-0") "List of non-standard character set encoding names used in CTEXT.") (defun ctext-non-standard-encodings-table () (let ((table (make-char-table 'translation-table))) (dolist (encoding (reverse (append (get-language-info current-language-environment 'ctext-non-standard-encodings) ctext-non-standard-encodings-list))) (let* ((slot (assoc encoding ctext-non-standard-encodings-database)) (charset (nth 3 slot))) (if charset (cond ((charsetp charset) (aset table (make-char charset) slot)) ((listp charset) (dolist (elt charset) (aset table (make-char elt) slot))) ((char-table-p charset) (map-char-table #'(lambda (k v) (if (and v (> k 128)) (aset table k slot))) charset)))))) table)) (defun ctext-pre-write-conversion (from to) "Encode characters between FROM and TO as Compound Text w/Extended Segments. If FROM is a string, or if the current buffer is not the one set up for us by encode-coding-string, generate a new temp buffer, insert the text, and convert it in the temporary buffer. Otherwise, convert in-place." (save-match-data ;; Setup a working buffer if necessary. (cond ((stringp from) (let ((buf (current-buffer))) (set-buffer (generate-new-buffer " *temp")) (set-buffer-multibyte (multibyte-string-p from)) (insert from))) ((not (string= (buffer-name) " *code-converting-work*")) (let ((buf (current-buffer)) (multibyte enable-multibyte-characters)) (set-buffer (generate-new-buffer " *temp")) (set-buffer-multibyte multibyte) (insert-buffer-substring buf from to)))) ;; Now we can encode the whole buffer. (let ((encoding-table (ctext-non-standard-encodings-table)) last-coding-system-used last-pos last-encoding-info pos encoding-info end-pos) (goto-char (setq last-pos (point-min))) (setq end-pos (point-marker)) (while (re-search-forward "[^\000-\177]+" nil t) (setq last-pos (match-beginning 0) last-encoding-info (aref encoding-table (char-after last-pos))) (set-marker end-pos (match-end 0)) (goto-char (1+ last-pos)) (catch 'tag (while t (setq encoding-info (if (< (point) end-pos) (aref encoding-table (following-char)))) (unless (eq last-encoding-info encoding-info) (if last-encoding-info (let ((encoding-name (car last-encoding-info)) (coding-system (nth 1 last-encoding-info)) (noctets (nth 2 last-encoding-info)) len) (encode-coding-region last-pos (point) coding-system) (setq len (+ (length encoding-name) 1 (- (point) last-pos))) (save-excursion (goto-char last-pos) (insert (string-to-multibyte (format "\e%%/%d%c%c%s" noctets (+ (/ len 128) 128) (+ (% len 128) 128) encoding-name))))) (encode-coding-region last-pos (point) 'ctext-no-compositions)) (setq last-pos (point) last-encoding-info encoding-info)) (if (< (point) end-pos) (forward-char 1) (throw 'tag nil)))) (if (< last-pos (point)) (encode-coding-region last-pos (point) 'ctext-no-compositions))) (set-marker end-pos nil) (goto-char (point-min)))) ;; Must return nil, as build_annotations_2 expects that. nil) ;; The followings are to override the current settings. (set-language-info "Bulgarian" 'ctext-non-standard-encodings '("microsoft-cp1251")) (let ((elt `("microsoft-cp1251" windows-1251 1 ,(get 'encode-windows-1251 'translation-table))) (slot (assoc "microsoft-cp1251" ctext-non-standard-encodings-database))) (if slot (setcdr slot (cdr elt)) (push elt ctext-non-standard-encodings-database))) (define-ccl-program ccl-encode-windows-1251-font '(0 ((r1 <<= 7) (r1 += r2) (translate-character encode-windows-1251 r0 r1) ))) (let ((slot (assoc "microsoft-cp1251" font-ccl-encoder-alist))) (if slot (setcdr slot ccl-encode-windows-1251-font) (push '("microsoft-cp1251" . ccl-encode-windows-1251-font) font-ccl-encoder-alist))) (defun use-microsoft-cp1251-font () (let ((fontspec '(nil . "microsoft-cp1251"))) (map-char-table #'(lambda (k v) (if (and v (> k 128)) (set-fontset-font "fontset-default" k fontspec))) (get 'encode-windows-1251 'translation-table))))