[ long post ]

On Sun, 29 Aug 2010 10:04:16 Eli Zaretskii wrote:
>
> Note that Handa-san recommended to set more than just one slot in
> standard-display-table in Emacs 23 to solve similar problems:

I have not solved it yet fully, but I think now it is only minor
details, after I defined a new coding system (see below).


On Mon, 06 Sep 2010 14:14:01 Kenichi Handa wrote:
>
> Does it mean that you want bidi-reordering for the bytes
> #xE0..#xFA (code-points of iso-8859-8) but bidi-reordering
> is not necessary for the bytes #x80..#x8A (code-points of
> cp862)?

No, I want bidi ordering (or not) for both iso-8859-8 and CP862 at
the SAME time.

> But, your file "lit1" contains #xE0..#xFA (code-points of
> iso-8859-8) at the second to 4th lines in visual order.  If
> bidi-reordering is applied on them, you'll get the different
> view than lit1-tty.png and lit1-x.png.  Is that ok?

This is just an example. Files used directly with cat (like /etc/motd)
must be in visual order. Other files used with GUI must have logical
order. The data files we edit each day are of both types.

EK> May be I can define a new coding system that will have bytes #x80-#xFF
EK> as legal characters and be recognized as Hebrew variant.
>
> This code will that.  I think it's not difficult to
> understand what the code is doing.

[snip]

> But, if you do that, you must consider the problem Eli wrote:
>
EZ> But if you want all the Hebrew characters to be treated by Emacs as
EZ> such (e.g., for bidi display), no matter what's their encoding in the
EZ> file, you will have to define a coding-system that will decode them
EZ> all into Unicode codepoints of Hebrew characters.  There's a problem
EZ> you will need to solve for defining such a coding system: it has 2
EZ> different encodings for the same character, one from hebrew-iso-8bit,
EZ> the other from cp862.  So you will need to decide how will Hebrew
EZ> characters be encoded when the file is saved.
>
> In the above definition of mix-hebrew, as iso-8859-8-sub is
> listed before cp862-sub, all Hebrew characters are encoded
> into bytes #xE0..#xFA even if they were originally decoded
> from bytes #x80..#x9A.
>
> If you don't like it, you must give up decoding bytes
> #x80..#x9A into Hebrew chars.  You decode them as raw-bytes,
> and setup a display table to display them as Hebrew chars.
> It can be done by this code:

I think I solved this by using text properties.

It is still unfinished, but it works, and I'll appreciate any comments.
There are some problems, see at the end.

Here is what I did (based on your advice).


(define-charset 'hebrew-MSDOS-binary
  "Hebrew subset of CP862 (#x80-#x9A) with no-conversion"
  :code-space [#x80 #x9A]
  :map (let ((map (make-vector 54 0))
             (ix 27))
         (while (> ix 0)
	   (setq ix (1- ix))
           (aset map (+ ix ix)   (+ #x80 ix))
           (aset map (+ ix ix 1) (+ #x80 ix)))
         map)
  :supplementary-p t)


(define-charset 'graphic-MSDOS-subset
  "Graphic subset of CP862"
  :code-space [#x9B #xDF]
  :subset '(cp862 #x9B #xDF #x00))
  :supplementary-p t)


(define-charset 'hebrew-iso-8859-8-subset
  "Subset of ISO-8859-8"
  :code-space [#xE0 #xFA]
  :subset '(iso-8859-8 #xE0 #xFA #x00))
  :supplementary-p t)


(define-coding-system 'hebrew-iso-with-8bit-bytes
  "The iso-8859-8 charset + bytes #x80-#xDF from CP862"
  :mnemonic ?H
  :coding-type 'charset
  :charset-list '(ascii hebrew-iso-8859-8-subset hebrew-MSDOS-binary graphic-MSDOS-subset)
  :post-read-conversion 'hebrew-iso-with-8bit-post-read
  :pre-write-conversion 'hebrew-iso-with-8bit-pre-write
  :ascii-compatible-p t)


(defun hebrew-iso-with-8bit-post-read (length)
       (let ((src (concat "^" '[ #x80 ] "-" '[ #x9A ]))    ;; seems "^\200-\232" does not work
             (sv-pos (point))
             (max-pos (+ (point) length))
             chr)
           (while (and (skip-chars-forward src max-pos)
                       (setq chr (char-after)))
               ;;      (message "At %d after char %d" (point) (char-after))
               (delete-char 1)
               (insert-char (+ chr #x550) 1)               ;; #x05D0 - #x80
               (add-text-properties (1- (point)) (point)
                       `(Hebrew DOS
                         face menu))))
       0)


(defun hebrew-iso-with-8bit-pre-write (start end)
       (let* ((text (if (numberp start)
                      (buffer-substring start end)
                      start))
              (beg 0)
              (end (length text))
              va)
           (while (setq beg (text-property-any beg end 'Hebrew 'DOS text))
               (setq va (aref text beg))
               (and (>= va #x05D0)                 ;; à
                    (<= va #x05EA)                 ;; ú
                    (aset text beg (- va #x550)))
               (setq beg (1+ beg)))
           (set-buffer (get-buffer-create " *heb-wrt*"))
           (delete-region (point-min) (point-max))
           (insert text)
           nil))


There are some Problems:

1. (describe-character-set 'hebrew-MSDOS-binary) exit with error:
   Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 131 132 132 ...]
   The vector is the :map value.

2. The `:post-read-conversion' function must return a number otherwise there is an error.
   There is nothing about it in `define-coding-system' documentation.

3. The documentation for `write-region-annotate-functions' has:
    "The function should return a list of pairs of the form (POSITION . STRING),
    consisting of strings to be effectively inserted at the specified positions
    of the file being written (1 means to insert before the first byte written).
    The POSITIONs must be sorted into increasing order."
  This did not work at all. I had to use the alternate pathway:
    An annotation function can return with a different buffer current.
    Doing so removes the annotations returned by previous functions, and
    resets START and END to `point-min' and `point-max' of the new buffer.

Thank you both. I will post when I'll finish all the details.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry