From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Ehud Karni" Newsgroups: gmane.emacs.devel Subject: Re: Usage of standard-display-table in MSDOS Date: Wed, 8 Sep 2010 00:11:40 +0300 Organization: Mivtach-Simon Insurance agencies Message-ID: <201009072111.o87LBeU2009811@beta.mvs.co.il> References: <83aao8mjzx.fsf@gnu.org> <837hjcm9cw.fsf@gnu.org> <201008291016.o7TAG22t007365@beta.mvs.co.il> <201008291149.o7TBn3bO010199@beta.mvs.co.il> Reply-To: ehud@unix.mvs.co.il NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-8-i Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1283895011 25094 80.91.229.12 (7 Sep 2010 21:30:11 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 7 Sep 2010 21:30:11 +0000 (UTC) Cc: emacs-devel@gnu.org, handa@m17n.org To: eliz@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 07 23:30:09 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ot5jw-0005VP-V5 for ged-emacs-devel@m.gmane.org; Tue, 07 Sep 2010 23:30:05 +0200 Original-Received: from localhost ([127.0.0.1]:56794 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ot5jw-0004JH-0t for ged-emacs-devel@m.gmane.org; Tue, 07 Sep 2010 17:30:04 -0400 Original-Received: from [140.186.70.92] (port=60835 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ot5hK-0002zX-1K for emacs-devel@gnu.org; Tue, 07 Sep 2010 17:27:24 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ot5SG-0000aV-6d for emacs-devel@gnu.org; Tue, 07 Sep 2010 17:11:50 -0400 Original-Received: from [193.16.147.12] (port=56156 helo=unix.mvs.co.il) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ot5SF-0000aD-IY; Tue, 07 Sep 2010 17:11:48 -0400 Original-Received: from beta.mvs.co.il (beta [10.253.0.3]) by unix.mvs.co.il (8.13.8/8.13.7) with ESMTP id o87LBhu5002713; Wed, 8 Sep 2010 00:11:43 +0300 Original-Received: from beta.mvs.co.il (localhost [127.0.0.1]) by beta.mvs.co.il (8.14.1/8.14.1) with ESMTP id o87LBhiL009814; Wed, 8 Sep 2010 00:11:43 +0300 Original-Received: (from root@localhost) by beta.mvs.co.il (8.14.1/8.14.1/Submit) id o87LBeU2009811; Wed, 8 Sep 2010 00:11:40 +0300 In-reply-to: (message from Eli Zaretskii on Sun, 29 Aug 2010 10:04:16 -0400) X-Mailer: Emacs 21.3.1 rmail (send-msg 1.109) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4-2.6 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:129759 Archived-At: [ long post ] On Sun, 29 Aug 2010 10:04:16 Eli Zaretskii wrote: > > Note that Handa-san recommended to set more than just one slot in > standard-display-table in Emacs 23 to solve similar problems: I have not solved it yet fully, but I think now it is only minor details, after I defined a new coding system (see below). On Mon, 06 Sep 2010 14:14:01 Kenichi Handa wrote: > > Does it mean that you want bidi-reordering for the bytes > #xE0..#xFA (code-points of iso-8859-8) but bidi-reordering > is not necessary for the bytes #x80..#x8A (code-points of > cp862)? No, I want bidi ordering (or not) for both iso-8859-8 and CP862 at the SAME time. > But, your file "lit1" contains #xE0..#xFA (code-points of > iso-8859-8) at the second to 4th lines in visual order. If > bidi-reordering is applied on them, you'll get the different > view than lit1-tty.png and lit1-x.png. Is that ok? This is just an example. Files used directly with cat (like /etc/motd) must be in visual order. Other files used with GUI must have logical order. The data files we edit each day are of both types. EK> May be I can define a new coding system that will have bytes #x80-#xF= F EK> as legal characters and be recognized as Hebrew variant. > > This code will that. I think it's not difficult to > understand what the code is doing. [snip] > But, if you do that, you must consider the problem Eli wrote: > EZ> But if you want all the Hebrew characters to be treated by Emacs as EZ> such (e.g., for bidi display), no matter what's their encoding in the= EZ> file, you will have to define a coding-system that will decode them EZ> all into Unicode codepoints of Hebrew characters. There's a problem EZ> you will need to solve for defining such a coding system: it has 2 EZ> different encodings for the same character, one from hebrew-iso-8bit,= EZ> the other from cp862. So you will need to decide how will Hebrew EZ> characters be encoded when the file is saved. > > In the above definition of mix-hebrew, as iso-8859-8-sub is > listed before cp862-sub, all Hebrew characters are encoded > into bytes #xE0..#xFA even if they were originally decoded > from bytes #x80..#x9A. > > If you don't like it, you must give up decoding bytes > #x80..#x9A into Hebrew chars. You decode them as raw-bytes, > and setup a display table to display them as Hebrew chars. > It can be done by this code: I think I solved this by using text properties. It is still unfinished, but it works, and I'll appreciate any comments. There are some problems, see at the end. Here is what I did (based on your advice). (define-charset 'hebrew-MSDOS-binary "Hebrew subset of CP862 (#x80-#x9A) with no-conversion" :code-space [#x80 #x9A] :map (let ((map (make-vector 54 0)) (ix 27)) (while (> ix 0) (setq ix (1- ix)) (aset map (+ ix ix) (+ #x80 ix)) (aset map (+ ix ix 1) (+ #x80 ix))) map) :supplementary-p t) (define-charset 'graphic-MSDOS-subset "Graphic subset of CP862" :code-space [#x9B #xDF] :subset '(cp862 #x9B #xDF #x00)) :supplementary-p t) (define-charset 'hebrew-iso-8859-8-subset "Subset of ISO-8859-8" :code-space [#xE0 #xFA] :subset '(iso-8859-8 #xE0 #xFA #x00)) :supplementary-p t) (define-coding-system 'hebrew-iso-with-8bit-bytes "The iso-8859-8 charset + bytes #x80-#xDF from CP862" :mnemonic ?H :coding-type 'charset :charset-list '(ascii hebrew-iso-8859-8-subset hebrew-MSDOS-binary grap= hic-MSDOS-subset) :post-read-conversion 'hebrew-iso-with-8bit-post-read :pre-write-conversion 'hebrew-iso-with-8bit-pre-write :ascii-compatible-p t) (defun hebrew-iso-with-8bit-post-read (length) (let ((src (concat "^" '[ #x80 ] "-" '[ #x9A ])) ;; seems "^\20= 0-\232" does not work (sv-pos (point)) (max-pos (+ (point) length)) chr) (while (and (skip-chars-forward src max-pos) (setq chr (char-after))) ;; (message "At %d after char %d" (point) (char-after= )) (delete-char 1) (insert-char (+ chr #x550) 1) ;; #x05D0 - #x= 80 (add-text-properties (1- (point)) (point) `(Hebrew DOS face menu)))) 0) (defun hebrew-iso-with-8bit-pre-write (start end) (let* ((text (if (numberp start) (buffer-substring start end) start)) (beg 0) (end (length text)) va) (while (setq beg (text-property-any beg end 'Hebrew 'DOS text)= ) (setq va (aref text beg)) (and (>=3D va #x05D0) ;; =E0 (<=3D va #x05EA) ;; =FA (aset text beg (- va #x550))) (setq beg (1+ beg))) (set-buffer (get-buffer-create " *heb-wrt*")) (delete-region (point-min) (point-max)) (insert text) nil)) There are some Problems: 1. (describe-character-set 'hebrew-MSDOS-binary) exit with error: Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 13= 1 132 132 ...] The vector is the :map value. 2. The `:post-read-conversion' function must return a number otherwise th= ere is an error. There is nothing about it in `define-coding-system' documentation. 3. The documentation for `write-region-annotate-functions' has: "The function should return a list of pairs of the form (POSITION . S= TRING), consisting of strings to be effectively inserted at the specified pos= itions of the file being written (1 means to insert before the first byte wr= itten). The POSITIONs must be sorted into increasing order." This did not work at all. I had to use the alternate pathway: An annotation function can return with a different buffer current. Doing so removes the annotations returned by previous functions, and resets START and END to `point-min' and `point-max' of the new buffer= .= Thank you both. I will post when I'll finish all the details. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D Better Safe Than Sorry