all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
Cc: emacs-devel@gnu.org
Subject: Re: TUTORIAL.bg and windows-1251
Date: Tue, 25 Nov 2003 08:55:52 +0900 (JST)	[thread overview]
Message-ID: <200311242355.IAA24563@etlken.m17n.org> (raw)
In-Reply-To: <3FBA3F81.4010602@fmi.uni-sofia.bg> (message from Ognyan Kulev on Tue, 18 Nov 2003 17:49:21 +0200)

Sorry for the late responses on this thread.  I'm now
involved in threads more than what my capacity allows.

In article <3FBA3F81.4010602@fmi.uni-sofia.bg>, Ognyan Kulev <ogi@fmi.uni-sofia.bg> writes:

> Kenichi Handa wrote:
>>  I think the default handling of cyrillic characters must be
>>  most convenient for native users.  But, there are many
>>  languages that use cyrillic and their requests may conflict.
>>  So I think we must start from adjusting each language
>>  environment.  Once we found most language environments
>>  require the same setting, we can make it the default.

> Can X encoding be adjusted?  Isn't there only two choices for cyrillic: 
> iso10646-1 and iso8859-5?

It seems that bg_BG locale of glibc, gtk, or XFree86 (I
don't know which is responsible for) encodes cyrillic
characters using extended segment with charset name
"microsoft-cp1251" in selection.

Please try the attached file.  It overrides the ctext
encoder/decoder so that microsoft-cp1251 is used on decoding
in Bulgarian lang. env.

[...]
> The negative site of Debian packages is that each encoding of the four 
> above mentioned has its own package.  So people sometimes install only 
> microsoft-cp1251 and iso10646-1 fonts, without koi8-r and iso8859-5 ones.

> Another problem with cronyx-courier is that it doesn't work when it's 
> set in Default in Basic Faces customize group.  I've just posted 
> question to comp.emacs.

> What about the following: when mule-unicode-0100-24ff is used and the 
> used iso10646-1 font doesn't contain wanted character (e.g. cyrillic 
> one), then another font is searched that contains such character.  I 
> think this will often end up in cronyx-courier.  Is this hard to be 
> implemented?

I've implemented it in emacs-unicode verion.  But, that
change requires various infrastructure of emacs-unicode, so
it's very difficult to back port it in HEAD.

Anyway, the attached ctext.el also contains a short code to
enable Emacs to display characters in windows-1251 by
microsoft-cp1251 font.  Please try to call
(use-microsoft-cp1251-font).

---
Ken'ichi HANDA
handa@m17n.org

--- ctext.el ---
(defvar ctext-non-standard-encodings-database
  '(("big5-0" big5 2 (chinese-big5-1 chinese-big5-2)))
  "Alist of non-standard character set encodings for CTEXT's extended segments.
Each element has the form (ENCODING-NAME CODING-SYSTEM N-OCTET CHARSET)
and provides information about how to use \"extended segments\"
with the encoding name ENCODING-NAME.

CODING-SYSTEM is the coding-system to encode the characters into
an extended segment.

N-OCTET is the number of octets (bytes) that encodes a character
in the segment.  It can be 0 (meaning the number of octets per
character is variable), 1, 2, 3, or 4.

CHARSET is a charater set containing characters that are encoded
as ENCODING-NAME.  It may be a list of character sets.  It may
also be a char-table, in which case characters that have non-nil
value in the char-table are the target.

On decoding CTEXT, all encoding names listed here are recognized.

On encoding CTEXT, encoding names in the variable
`ctext-non-standard-encodings-list' and in
`ctext-non-standard-encodings' property of the current language
environment are used.")

(defun ctext-post-read-conversion (len)
  "Decode LEN characters encoded as Compound Text with Extended Segments."
  (save-match-data
    (save-restriction
      (let ((case-fold-search nil)
	    (in-workbuf (string= (buffer-name) " *code-converting-work*"))
	    last-coding-system-used
	    pos bytes)
	(or in-workbuf
	    (narrow-to-region (point) (+ (point) len)))
	(decode-coding-region (point-min) (point-max) 'ctext)
	(if in-workbuf
	    (set-buffer-multibyte t))
	(while (re-search-forward ctext-non-standard-encodings-regexp
				  nil 'move)
	  (setq pos (match-beginning 0))
	  (if (match-beginning 1)
	      ;; ESC % / [0-4] M L --ENCODING-NAME-- \002 --BYTES--
	      (let* ((M (char-after (+ pos 4)))
		     (L (char-after (+ pos 5)))
		     (encoding (match-string 2))
		     (encoding-info (assoc-ignore-case 
				     encoding
				     ctext-non-standard-encodings-database))
		     (coding (if encoding-info
				 (nth 1 encoding-info)
			       (setq encoding (intern (downcase encoding)))
			       (and (coding-system-p encoding)
				    encoding))))
		(setq bytes (- (+ (* (- M 128) 128) (- L 128))
			       (- (point) (+ pos 6))))
		(when coding
		  (delete-region pos (point))
		  (forward-char bytes)
		  (decode-coding-region (- (point) bytes) (point) coding)))
	    ;; ESC % G --UTF-8-BYTES-- ESC % @
	    (setq bytes (- (point) pos))
	    (decode-coding-region (- (point) bytes) (point) 'utf-8))))
      (goto-char (point-min))
      (- (point-max) (point)))))

(defvar ctext-non-standard-encodings-list
  '("big5-0")
  "List of non-standard character set encoding names used in CTEXT.")

(defun ctext-non-standard-encodings-table ()
  (let ((table (make-char-table 'translation-table)))
    (dolist (encoding (reverse
		       (append
			(get-language-info current-language-environment
					   'ctext-non-standard-encodings)
			ctext-non-standard-encodings-list)))
      (let* ((slot (assoc encoding ctext-non-standard-encodings-database))
	     (charset (nth 3 slot)))
	(if charset
	    (cond ((charsetp charset)
		   (aset table (make-char charset) slot))
		  ((listp charset)
		   (dolist (elt charset)
		     (aset table (make-char elt) slot)))
		  ((char-table-p charset)
		   (map-char-table #'(lambda (k v) 
				   (if (and v (> k 128)) (aset table k slot)))
				   charset))))))
    table))

(defun ctext-pre-write-conversion (from to)
  "Encode characters between FROM and TO as Compound Text w/Extended Segments.

If FROM is a string, or if the current buffer is not the one set up for us
by encode-coding-string, generate a new temp buffer, insert the
text, and convert it in the temporary buffer.  Otherwise, convert in-place."
  (save-match-data
    ;; Setup a working buffer if necessary.
    (cond ((stringp from)
	   (let ((buf (current-buffer)))
	     (set-buffer (generate-new-buffer " *temp"))
	     (set-buffer-multibyte (multibyte-string-p from))
	     (insert from)))
	  ((not (string= (buffer-name) " *code-converting-work*"))
	   (let ((buf (current-buffer))
		 (multibyte enable-multibyte-characters))
	     (set-buffer (generate-new-buffer " *temp"))
	     (set-buffer-multibyte multibyte)
	     (insert-buffer-substring buf from to))))

    ;; Now we can encode the whole buffer.
    (let ((encoding-table (ctext-non-standard-encodings-table))
	  last-coding-system-used
	  last-pos last-encoding-info
	  pos encoding-info end-pos)
      (goto-char (setq last-pos (point-min)))
      (setq end-pos (point-marker))
      (while (re-search-forward "[^\000-\177]+" nil t)
	(setq last-pos (match-beginning 0)
	      last-encoding-info (aref encoding-table (char-after last-pos)))
	(set-marker end-pos (match-end 0))
	(goto-char (1+ last-pos))
	(catch 'tag
	  (while t
	    (setq encoding-info
		  (if (< (point) end-pos)
		      (aref encoding-table (following-char))))
	    (unless (eq last-encoding-info encoding-info)
	      (if last-encoding-info
		  (let ((encoding-name (car last-encoding-info))
			(coding-system (nth 1 last-encoding-info))
			(noctets (nth 2 last-encoding-info))
			len)
		    (encode-coding-region last-pos (point) coding-system)
		    (setq len (+ (length encoding-name) 1
				 (- (point) last-pos)))
		    (save-excursion
		      (goto-char last-pos)
		      (insert (string-to-multibyte 
			       (format "\e%%/%d%c%c%s\x02"
				       noctets
				       (+ (/ len 128) 128)
				       (+ (% len 128) 128)
				       encoding-name)))))
		(encode-coding-region last-pos (point) 'ctext-no-compositions))
	      (setq last-pos (point)
		    last-encoding-info encoding-info))
	    (if (< (point) end-pos)
		(forward-char 1)
	      (throw 'tag nil))))
	(if (< last-pos (point))
	    (encode-coding-region last-pos (point) 'ctext-no-compositions)))
      (set-marker end-pos nil)
      (goto-char (point-min))))
  ;; Must return nil, as build_annotations_2 expects that.
  nil)

;; The followings are to override the current settings.

(set-language-info "Bulgarian" 'ctext-non-standard-encodings
		   '("microsoft-cp1251"))

(let ((elt `("microsoft-cp1251" windows-1251 1
	     ,(get 'encode-windows-1251 'translation-table)))
      (slot (assoc "microsoft-cp1251" ctext-non-standard-encodings-database)))
  (if slot
      (setcdr slot (cdr elt))
    (push elt ctext-non-standard-encodings-database)))

(define-ccl-program ccl-encode-windows-1251-font
  '(0
    ((r1 <<= 7)
     (r1 += r2)
     (translate-character encode-windows-1251 r0 r1)
     )))

(let ((slot (assoc "microsoft-cp1251" font-ccl-encoder-alist)))
  (if slot
      (setcdr slot ccl-encode-windows-1251-font)
    (push '("microsoft-cp1251" . ccl-encode-windows-1251-font)
	  font-ccl-encoder-alist)))

(defun use-microsoft-cp1251-font ()
  (let ((fontspec '(nil . "microsoft-cp1251")))
    (map-char-table
     #'(lambda (k v) 
	 (if (and v (> k 128))
	     (set-fontset-font "fontset-default" k fontspec)))
     (get 'encode-windows-1251 'translation-table))))

  reply	other threads:[~2003-11-24 23:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-14 18:56 TUTORIAL.bg and windows-1251 Ognyan Kulev
2003-11-15 12:19 ` Ognyan Kulev
2003-11-26  7:33   ` Ognyan Kulev
2003-11-15 14:24 ` Jason Rumney
2003-11-17  7:21 ` Kenichi Handa
2003-11-18 15:49   ` Ognyan Kulev
2003-11-24 23:55     ` Kenichi Handa [this message]
2003-11-26  7:16       ` Ognyan Kulev
2003-11-26  7:47         ` Kenichi Handa
2003-11-26  8:30           ` Ognyan Kulev
2003-11-26 13:17             ` Kenichi Handa
2003-11-26 14:08               ` Ognyan Kulev
2003-12-03  8:34               ` Kenichi Handa
2003-12-04 16:28                 ` Ognyan Kulev
2003-12-04 23:28                   ` Kenichi Handa
2003-12-31 15:06                     ` Ognyan Kulev
2003-12-31 15:54                       ` Eli Zaretskii
2004-01-05  4:20                         ` Kenichi Handa
2004-01-05  4:14                       ` Kenichi Handa
2004-01-06 12:03                         ` YAMAMOTO Mitsuharu
2004-01-07  0:25                           ` Kenichi Handa
2004-01-07  1:32                             ` YAMAMOTO Mitsuharu
2004-01-07 16:22                         ` Ognyan Kulev
2004-01-07 23:58                           ` Kenichi Handa
2004-01-09 16:10                             ` Ognyan Kulev
2004-01-13  4:07                               ` Kenichi Handa
2004-01-14 11:42                                 ` Ognyan Kulev
2004-01-14 12:10                                   ` Kenichi Handa
2004-01-17 19:31                                     ` Ognyan Kulev
2004-01-19  0:34                                       ` Kenichi Handa
2004-01-21  6:45                                         ` Ognyan Kulev
2004-01-21 10:52                                           ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200311242355.IAA24563@etlken.m17n.org \
    --to=handa@m17n.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.