all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Usage of standard-display-table in MSDOS
@ 2010-08-23 12:44 Kenichi Handa
  2010-08-24  5:34 ` Stephen J. Turnbull
  2010-08-27 10:24 ` Eli Zaretskii
  0 siblings, 2 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-08-23 12:44 UTC (permalink / raw)
  To: emacs-devel

In msdos-initialize-window-system (of term/pc-win.el), I
found this code:

  ;; In multibyte mode, we want unibyte buffers to be displayed
  ;; using the terminal coding system, so that they display
  ;; correctly on the DOS terminal; in unibyte mode we want to see
  ;; all 8-bit characters verbatim.  In both cases, we want the
  ;; entire range of 8-bit characters to arrive at our display code
  ;; verbatim.
  (standard-display-8bit 127 255)

Is it really working in non-iso-8859-1 environment as
expected?  Note that 128..255 are latin-1 characters after
Emacs 23, not raw-bytes.  So, I think the above call will
make 8-bit bytes in unibyte buffer displayed as latin-1
characters, but as the termial encoding system doesn't
support latin-1 chars in, for instance, greek environment,
just '?' will be displayed.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Usage of standard-display-table in MSDOS
  2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa
@ 2010-08-24  5:34 ` Stephen J. Turnbull
  2010-08-24 11:13   ` Ehud Karni
  2010-08-27 10:24 ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Stephen J. Turnbull @ 2010-08-24  5:34 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

Kenichi Handa writes:
 > In msdos-initialize-window-system (of term/pc-win.el), I
 > found this code:
 > 
 >   ;; In multibyte mode, we want unibyte buffers to be displayed
 >   ;; using the terminal coding system, so that they display
 >   ;; correctly on the DOS terminal; in unibyte mode we want to see
 >   ;; all 8-bit characters verbatim.  In both cases, we want the
 >   ;; entire range of 8-bit characters to arrive at our display code
 >   ;; verbatim.
 >   (standard-display-8bit 127 255)
 > 
 > Is it really working in non-iso-8859-1 environment as
 > expected?  Note that 128..255 are latin-1 characters after
 > Emacs 23, not raw-bytes.  So, I think the above call will
 > make 8-bit bytes in unibyte buffer displayed as latin-1
 > characters, but as the termial encoding system doesn't
 > support latin-1 chars in, for instance, greek environment,
 > just '?' will be displayed.

Hebrew and Cyrillic are other obvious candidates for testing here.
They seem to have more active participants on emacs-devel.




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-24  5:34 ` Stephen J. Turnbull
@ 2010-08-24 11:13   ` Ehud Karni
  2010-08-24 16:51     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-24 11:13 UTC (permalink / raw)
  To: stephen; +Cc: eliz, emacs-devel, handa

On Tue, 24 Aug 2010 14:34:37 Stephen J. Turnbull wrote:
>  >
>  >   ;; In multibyte mode, we want unibyte buffers to be displayed
>  >   ;; using the terminal coding system, so that they display
>  >   ;; correctly on the DOS terminal; in unibyte mode we want to see
>  >   ;; all 8-bit characters verbatim.  In both cases, we want the
>  >   ;; entire range of 8-bit characters to arrive at our display code
>  >   ;; verbatim.
>  >   (standard-display-8bit 127 255)
>  >
>  > Is it really working in non-iso-8859-1 environment as
>  > expected?  Note that 128..255 are latin-1 characters after
>  > Emacs 23, not raw-bytes.  So, I think the above call will
>  > make 8-bit bytes in unibyte buffer displayed as latin-1
>  > characters, but as the termial encoding system doesn't
>  > support latin-1 chars in, for instance, greek environment,
>  > just '?' will be displayed.
>
> Hebrew and Cyrillic are other obvious candidates for testing here.
> They seem to have more active participants on emacs-devel.

From my checks this does not work on text terminals (it really depends
on the LANG env variable). I had this code in Emacs 21.3:

(defun set-standard-display-table ()
    (setq standard-display-table (make-display-table))
    (standard-display-8bit 127 254))

I then set the DOS Hebrew chars (128-144) each to a vector:
    [ 169 <the corresponding UNIX Hebrew char> ]

Then visit a file (literally).

In Emacs 21.3 it works fine with any value of LANG, show the Hebrew
chars as they should, and Hebrew DOS (CP862) chars with a prefix.

In Emacs 23.1 it works only if the LANG is set to a Latin-1 value
(eg en_GB).

I want to see Hebrew (iso-8559-8) characters even when LANG=C, because
setting the LANG to he_IL changes to much other things (for example,
it change the `ls' output, which breaks dired).

The problem as I see it is that the characters it the vectors in the
display table are going further translation and not used "literally".

The use of UTF-8 (which works well on X) is not an option. Many of
the users has text terminals, and most of the data file viewed are
in iso-8859-8 or even Hebrew DOS (CP862).

I recently install some Emacs stuff in an Israeli insurance company
and because of this problem I used 21.3 instead of newer version.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-24 11:13   ` Ehud Karni
@ 2010-08-24 16:51     ` Eli Zaretskii
  2010-08-25 13:04       ` Ehud Karni
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-24 16:51 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel

> Date: Tue, 24 Aug 2010 14:13:46 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: handa@m17n.org, eliz@gnu.org, emacs-devel@gnu.org
> 
> I want to see Hebrew (iso-8559-8) characters even when LANG=C, because
> setting the LANG to he_IL changes to much other things (for example,
> it change the `ls' output, which breaks dired).

You could do

  M-x set-locale-environment RET he_IL RET

from inside Emacs, which I think will do what you want without
affecting `ls' etc. (unless you mean `ls' that is run from the Emacs
shell buffer).

> The problem as I see it is that the characters it the vectors in the
> display table are going further translation and not used "literally".

I don't understand what you are trying to say here.  Please elaborate
about "further translation" and "not used literally".



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-24 16:51     ` Eli Zaretskii
@ 2010-08-25 13:04       ` Ehud Karni
  2010-08-25 18:09         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-25 13:04 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Tue, 24 Aug 2010 19:51:58 Eli Zaretskii wrote:
> >
> > I want to see Hebrew (iso-8559-8) characters even when LANG=C, because
> > setting the LANG to he_IL changes to much other things (for example,
> > it change the `ls' output, which breaks dired).
>
> You could do
>
>   M-x set-locale-environment RET he_IL RET
>
> from inside Emacs, which I think will do what you want without
> affecting `ls' etc. (unless you mean `ls' that is run from the Emacs
> shell buffer).

That fix my problem. It does not change any env variable so it is good
even for shell spawned from Emacs.


> > The problem as I see it is that the characters it the vectors in the
> > display table are going further translation and not used "literally".
>
> I don't understand what you are trying to say here.  Please elaborate
> about "further translation" and "not used literally".

The best way to understand it is with an example:
For the DOS Hebrew Aleph The standard-display-table is set like this:
    (aset standard-display-table 128 '[ 169 244 ] )
In Emacs 21.3 these exact characters were displayed (sent) to the text
terminal and appeared as prefix char + Aleph.
In 23.1 I see the prefix + ? (question mark).

The character `244' (Aleph) is been encoded in the current locale
and this inhibits its display as Aleph.

You can easily check it by the following prescription:

(setq standard-display-table (make-display-table))
(standard-display-8bit 128 254)
(set-locale-environment "en_GB")
(find-file-literally <a file with Hebrew (#xE0-#xFA) characters>)
    check how it is displayed - you see the Hebrew as it should.
    Now change the locale.
(set-locale-environment "he_IL")
    You see ? because the #xE0-#xFA is encoded in Hebrew locale
    and are meaningless (instead of just being plain 8 bit).

The standard-display-table has not changed, but the meaning of the
8 bit numbers in the characters vectors has changed.


To solve my Hebrew display I have 2 possibilities:

1. Set the locale to some Latin-1 language (e.g. en_GB) and continue to
   work like I do in 21.3. It is simpler but I it is some kind of
   deceiving myself, and it will work only with 8 bit Hebrew fonts.

2. Set the locale to Hebrew and change the display table (entries #x80-
   #x9A - DOS Hebrew, and #xE0-#xFA - ISO-8859-8 Hebrew to UTF Hebrew)
   but then I have to set all the DOS graphic characters myself.

I'll go the 2nd way, but I'll appreciate something that will ease it,
i.e. a way to set the standard-display-table for all the non Hebrew
characters < 256 to something that will make it work like CP862.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-25 13:04       ` Ehud Karni
@ 2010-08-25 18:09         ` Eli Zaretskii
  2010-08-26 15:26           ` Ehud Karni
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-25 18:09 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel

> Date: Wed, 25 Aug 2010 16:04:56 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: emacs-devel@gnu.org
> 
> For the DOS Hebrew Aleph The standard-display-table is set like this:
>     (aset standard-display-table 128 '[ 169 244 ] )
> In Emacs 21.3 these exact characters were displayed (sent) to the text
> terminal and appeared as prefix char + Aleph.
> In 23.1 I see the prefix + ? (question mark).
> 
> The character `244' (Aleph) is been encoded in the current locale
> and this inhibits its display as Aleph.

And why don't you just say "C-x RET t cp862 RET"?  That's what you
want -- to send cp862 codes to the terminal, right?




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-25 18:09         ` Eli Zaretskii
@ 2010-08-26 15:26           ` Ehud Karni
  2010-08-26 16:43             ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-26 15:26 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Wed, 25 Aug 2010 21:09:46 Eli Zaretskii <eliz@gnu.org> wrote:
>
> And why don't you just say "C-x RET t cp862 RET"?  That's what you
> want -- to send cp862 codes to the terminal, right?

No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF
to appear in Hebrew on BOTH text terminals and X.

In addition I want to use some of the graphic characters of the CP862
set, again both in text terminal and in X.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-26 15:26           ` Ehud Karni
@ 2010-08-26 16:43             ` Eli Zaretskii
  2010-08-27 13:35               ` Ehud Karni
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-26 16:43 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel

> Date: Thu, 26 Aug 2010 18:26:13 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: emacs-devel@gnu.org
> 
> On Wed, 25 Aug 2010 21:09:46 Eli Zaretskii <eliz@gnu.org> wrote:
> >
> > And why don't you just say "C-x RET t cp862 RET"?  That's what you
> > want -- to send cp862 codes to the terminal, right?
> 
> No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF
> to appear in Hebrew on BOTH text terminals and X.

Sorry, I don't understand: what do you mean by "Hebrew of any kind"?

In Emacs 23 and later, there's only one kind of Hebrew: the Unicode
kind.  All the characters, including Hebrew, are internally
represented as their Unicode codepoints.  When Emacs visits a file
encoded in cp862, it converts the encoded characters into their
Unicode codepoints.  What is delivered to the screen is either some
encoding, like cp862 (in the case of a text terminal), or a glyph from
some font (on GUI terminals).  In both of these cases, Emacs
translates the Unicode codepoints to either the corresponding cp862
etc. codes, or to the codes of the characters in the font used to
display Hebrew.  All that's needed for Emacs to DTRT is (a) that Emacs
knows it is dealing with Hebrew characters, and (b) for text terminals
only, that the terminal encoding is set up according to the encoding
the terminal expects.

Now, what am I missing to understand why you needed to use display
tables?

> In addition I want to use some of the graphic characters of the CP862
> set, again both in text terminal and in X.

These graphic characters are part of Unicode as well (in the U+25XX
block), and Emacs 23 knows how to encode them in cp862, or any other
codepage that supports these characters.  Try "C-x 8 RET 2525 RET" and
see for yourself, it has a valid cp862 encoding.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa
  2010-08-24  5:34 ` Stephen J. Turnbull
@ 2010-08-27 10:24 ` Eli Zaretskii
  2010-08-27 11:44   ` Kenichi Handa
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-27 10:24 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Date: Mon, 23 Aug 2010 21:44:07 +0900
> 
> In msdos-initialize-window-system (of term/pc-win.el), I
> found this code:
> 
>   ;; In multibyte mode, we want unibyte buffers to be displayed
>   ;; using the terminal coding system, so that they display
>   ;; correctly on the DOS terminal; in unibyte mode we want to see
>   ;; all 8-bit characters verbatim.  In both cases, we want the
>   ;; entire range of 8-bit characters to arrive at our display code
>   ;; verbatim.
>   (standard-display-8bit 127 255)
> 
> Is it really working in non-iso-8859-1 environment as
> expected?  Note that 128..255 are latin-1 characters after
> Emacs 23, not raw-bytes.  So, I think the above call will
> make 8-bit bytes in unibyte buffer displayed as latin-1
> characters, but as the termial encoding system doesn't
> support latin-1 chars in, for instance, greek environment,
> just '?' will be displayed.

It's quite possible that this doesn't work in Emacs 23 and later like
it did in older versions.  But to figure out what, if anything, is
needed instead, I would like first to understand better what you are
saying.

It sounds like you are saying that standard-display-8bit no longer
does what its doc string advertises:

    "Display characters in the range L to H literally."

The "literally" part is no longer true, is it?

And one other question: why do we do something similar in
standard-display-european-internal?  Specifically:

    (defun standard-display-european-internal ()
      ;; Actually set up direct output of non-ASCII characters.
      (standard-display-8bit (if (eq window-system 'pc) 128 160) 255)

(I'm asking about the case where window-system is _not_ `pc'.)
This is called in set-display-table-and-terminal-coding-system under
the following conditions:

  (let ((coding (get-language-info language-name 'unibyte-display)))
    (if (and coding
	     (or (not coding-system)
		 (coding-system-equal coding coding-system)))
	(standard-display-european-internal)



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-27 10:24 ` Eli Zaretskii
@ 2010-08-27 11:44   ` Kenichi Handa
  2010-08-27 14:13     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Kenichi Handa @ 2010-08-27 11:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83aao8mjzx.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> It sounds like you are saying that standard-display-8bit no longer
> does what its doc string advertises:

>     "Display characters in the range L to H literally."

> The "literally" part is no longer true, is it?

What's the meaning of "literally" when a display table
element is [#xA0]?

Before Emacs 23, the character #xA0 represents the byte
0xA0.  But now it is a character representing a Unicode
character U+00A0, and #x3FFFA0 is the character representing
the byte 0xA0.

And, to "display characters literally", we have been encoded
characters by the terminal coding system.  Before Emacs 23,
the encoded result of #xA0 is always the byte 0xA0, but now
it depends on the terminal coding system.

> And one other question: why do we do something similar in
> standard-display-european-internal?  Specifically:

>     (defun standard-display-european-internal ()
>       ;; Actually set up direct output of non-ASCII characters.
>       (standard-display-8bit (if (eq window-system 'pc) 128 160) 255)

> (I'm asking about the case where window-system is _not_ `pc'.)
> This is called in set-display-table-and-terminal-coding-system under
> the following conditions:

>   (let ((coding (get-language-info language-name 'unibyte-display)))
>     (if (and coding
> 	     (or (not coding-system)
> 		 (coding-system-equal coding coding-system)))
> 	(standard-display-european-internal)

I don't know.  I didn't modify those part when I merged
unicode branch.  I should have investigated the semantics of
display table at that time.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-26 16:43             ` Eli Zaretskii
@ 2010-08-27 13:35               ` Ehud Karni
  2010-08-27 16:30                 ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-27 13:35 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Thu, 26 Aug 2010 19:43:48 Eli Zaretskii wrote:
> > From: "Ehud Karni" <ehud@unix.mvs.co.il>
> >
> > No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF
> > to appear in Hebrew on BOTH text terminals and X.
>
> Sorry, I don't understand: what do you mean by "Hebrew of any kind"?
>
> In Emacs 23 and later, there's only one kind of Hebrew: the Unicode
> kind.  All the characters, including Hebrew, are internally
> represented as their Unicode codepoints.  When Emacs visits a file
> encoded in cp862, it converts the encoded characters into their
> Unicode codepoints.  What is delivered to the screen is either some
> encoding, like cp862 (in the case of a text terminal), or a glyph from
> some font (on GUI terminals).  In both of these cases, Emacs
> translates the Unicode codepoints to either the corresponding cp862
> etc. codes, or to the codes of the characters in the font used to
> display Hebrew.  All that's needed for Emacs to DTRT is (a) that Emacs
> knows it is dealing with Hebrew characters, and (b) for text terminals
> only, that the terminal encoding is set up according to the encoding
> the terminal expects.
>
> Now, what am I missing to understand why you needed to use display
> tables?

You missing the point that most of my files are not "word-processor"
(or HTML/XML) files but are data file that are either read as ISO-8859-8
or no-conversion (binary) encoding.

Now, some of them has DOS Hebrew (#x80-9A) and graphic characters in
them, in ADDITION to UNIX Hebrew (#xE0-FA). I still want to see it as
Hebrew characters (so I can read it) but with a distinction between the
2 Hebrew types, I want to know the 8-bit encoding, it matters.

When I visit a file literally (i.e. no conversion) I still want to see
the Hebrew (and DOS graphic) characters as Hebrew and graphics, not as
an octal representation.

So I have to use a display table, and I want it to work for both text
terminals and X (or other windowed system - Mac, MS - which I myself
don't use).

> These graphic characters are part of Unicode as well (in the U+25XX
> block), and Emacs 23 knows how to encode them in cp862, or any other
> codepage that supports these characters.  Try "C-x 8 RET 2525 RET" and
> see for yourself, it has a valid cp862 encoding.

What I want is just a subset of this in my display table, so bytes in
the range #xB0-#xDF will be shown as is on text terminal and as the
CP862 glyphs on X (I am willing to have different display tables for
each case, I don't use text terminal and X on the same Emacs instance).

I know how to do it when the locale environment is set to "en_GB".
Can you instruct me how to do this when the locale environment is set
to "he_IL" ?


Just as curiosity, some times I get files where the Hebrew is encoded
as the lower Latin letters and Aleph is represented by @ (this is
known as old-code and it is still used by some companies, even though
in is some other applications already use UTF-8 XML files).

Do you have a way to display it as Hebrew without a display table ?

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-27 11:44   ` Kenichi Handa
@ 2010-08-27 14:13     ` Eli Zaretskii
  2010-08-28  4:18       ` Kenichi Handa
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-27 14:13 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 27 Aug 2010 20:44:16 +0900
> 
> >     "Display characters in the range L to H literally."
> 
> > The "literally" part is no longer true, is it?
> 
> What's the meaning of "literally" when a display table
> element is [#xA0]?

It means that a literal byte 0xA0 is sent to the terminal.

> Before Emacs 23, the character #xA0 represents the byte
> 0xA0.  But now it is a character representing a Unicode
> character U+00A0, and #x3FFFA0 is the character representing
> the byte 0xA0.
> 
> And, to "display characters literally", we have been encoded
> characters by the terminal coding system.  Before Emacs 23,
> the encoded result of #xA0 is always the byte 0xA0, but now
> it depends on the terminal coding system.

Which means, AFAIU, that "literally" is no longer possible.  At least
in the case of a multibyte buffer.

What about a unibyte buffer, though?  How do we display the characters
there?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-27 13:35               ` Ehud Karni
@ 2010-08-27 16:30                 ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-27 16:30 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel

> Date: Fri, 27 Aug 2010 16:35:40 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: emacs-devel@gnu.org
> 
> You missing the point that most of my files are not "word-processor"
> (or HTML/XML) files but are data file that are either read as ISO-8859-8
> or no-conversion (binary) encoding.
> 
> Now, some of them has DOS Hebrew (#x80-9A) and graphic characters in
> them, in ADDITION to UNIX Hebrew (#xE0-FA). I still want to see it as
> Hebrew characters (so I can read it) but with a distinction between the
> 2 Hebrew types, I want to know the 8-bit encoding, it matters.

So you basically have files that mix different encodings of Hebrew
characters, is that right?

If so, I would suggest indeed to set up the display table, but not as
you did it in older Emacsen.  What you need is to map those 8-bit
bytes to the Unicode codepoints of the corresponding Hebrew
characters.  That is, let the slot of eight-bit character #xA0, which
is represented in Emacs as #x3FFFA0, be set in the display table to
#x5d0 (the Unicode codepoint of Aleph).  Then you will see Aleph when
the file has #xA0, provided that you read the file with no-conversion.

> So I have to use a display table, and I want it to work for both text
> terminals and X (or other windowed system - Mac, MS - which I myself
> don't use).

If you set up the display table as I describe above, both X and text
terminals will work.  For text terminals, you will need to set
terminal-coding-system to some Hebrew capable encoding that these
terminals support.  For GUI displays, you need a font to be installed
that is capable of displaying Hebrew characters.

> > These graphic characters are part of Unicode as well (in the U+25XX
> > block), and Emacs 23 knows how to encode them in cp862, or any other
> > codepage that supports these characters.  Try "C-x 8 RET 2525 RET" and
> > see for yourself, it has a valid cp862 encoding.
> 
> What I want is just a subset of this in my display table, so bytes in
> the range #xB0-#xDF will be shown as is on text terminal and as the
> CP862 glyphs on X (I am willing to have different display tables for
> each case, I don't use text terminal and X on the same Emacs instance).

There should be no problem in using the same display table set up as
above on all types of terminals.

> I know how to do it when the locale environment is set to "en_GB".
> Can you instruct me how to do this when the locale environment is set
> to "he_IL" ?

The locale environment shouldn't have any effect on that.  All it does
is set defaults for certain coding-systems.  You will want to override
those defaults anyway, e.g. for using no-conversion when visiting
these files.  I don't see anything else that might interfere, do you?

> Just as curiosity, some times I get files where the Hebrew is encoded
> as the lower Latin letters and Aleph is represented by @ (this is
> known as old-code and it is still used by some companies, even though
> in is some other applications already use UTF-8 XML files).
> 
> Do you have a way to display it as Hebrew without a display table ?

You could write your own coding-system, but I think display tables are
easier.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-27 14:13     ` Eli Zaretskii
@ 2010-08-28  4:18       ` Kenichi Handa
  2010-08-28  7:22         ` Eli Zaretskii
  2010-08-29 10:16         ` Ehud Karni
  0 siblings, 2 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-08-28  4:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > >     "Display characters in the range L to H literally."
> > 
> > > The "literally" part is no longer true, is it?
> > 
> > What's the meaning of "literally" when a display table
> > element is [#xA0]?

> It means that a literal byte 0xA0 is sent to the terminal.

From which document, can we get that interpretation?  The
docstring of buffer-display-table says:

Each element should be a vector of
characters or nil.  The value nil means display the character in the
default fashion; otherwise, the characters from the vector are delivered
to the screen instead of the original character.

It only says about "character".  Although it doesn't say how
to deliver a character to a terminal, the natural way is to
encode that character by the terminal coding system, or
display that character by the corresponding glyph of a font.

> > Before Emacs 23, the character #xA0 represents the byte
> > 0xA0.  But now it is a character representing a Unicode
> > character U+00A0, and #x3FFFA0 is the character representing
> > the byte 0xA0.
> > 
> > And, to "display characters literally", we have been encoded
> > characters by the terminal coding system.  Before Emacs 23,
> > the encoded result of #xA0 is always the byte 0xA0, but now
> > it depends on the terminal coding system.

> Which means, AFAIU, that "literally" is no longer possible.  At least
> in the case of a multibyte buffer.

> What about a unibyte buffer, though?  How do we display the characters
> there?

This is the way to display the character #xA0 (in
multibyte-buffer) and the chararacter representing the byte
#xA0 (in both multibyte-buffer and unibyte-buffer) by
sending the byte #xA0 to the terminal .

;; For NBSP (U+00A0)
(aset standard-display-table #xA0
      (vector (unibyte-char-to-multibyte #xA0)))
;; For byte #xA0.
(aset standard-display-table (unibyte-char-to-multibyte #xA0)
      (vector (unibyte-char-to-multibyte #xA0)))
(set-terminal-coding-sytem 'no-conversion)
(set-safe-terminal-coding-system-internal 'no-conversion)

The last two lines are currently necessary because of a bug
in term.c.  I'm going to fix it.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-28  4:18       ` Kenichi Handa
@ 2010-08-28  7:22         ` Eli Zaretskii
  2010-08-30  2:24           ` Kenichi Handa
  2010-08-29 10:16         ` Ehud Karni
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-28  7:22 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 28 Aug 2010 13:18:02 +0900
> 
> In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> 
> > > >     "Display characters in the range L to H literally."
> > > 
> > > > The "literally" part is no longer true, is it?
> > > 
> > > What's the meaning of "literally" when a display table
> > > element is [#xA0]?
> 
> > It means that a literal byte 0xA0 is sent to the terminal.
> 
> From which document, can we get that interpretation?

That's my understanding of the word "literally".  Plus,
standard-display-8bit worked like that in previous versions of Emacs.
If we mean for it to do something else, we should amend the docstring.

> (aset standard-display-table (unibyte-char-to-multibyte #xA0)
>       (vector (unibyte-char-to-multibyte #xA0)))

Shouldn't standard-display-8bit be modified to use this, instead of
what it does now?  It seems like it was previously used to work around
the terminal encoding, but that fire escape was plumbed in Emacs 23.
Perhaps we should reinstate that feature?

And there's still the question of what to do with the fragment in
standard-display-european-internal that uses standard-display-8bit.
Should it be removed, or should it be rewritten in some way?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-28  4:18       ` Kenichi Handa
  2010-08-28  7:22         ` Eli Zaretskii
@ 2010-08-29 10:16         ` Ehud Karni
  2010-08-29 11:21           ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-29 10:16 UTC (permalink / raw)
  To: handa; +Cc: eliz, emacs-devel

On Sat, 28 Aug 2010 13:18:02 Kenichi Handa wrote:
>
> ;; For NBSP (U+00A0)
> (aset standard-display-table #xA0
>       (vector (unibyte-char-to-multibyte #xA0)))

This does not work because `unibyte-char-to-multibyte' does not give
the right result in Emacs-23.1 (it works well on Emacs-21.3).
Sorry, I did not check on latest Emacs

I used the following to check it:

(defun check-multibyte-code (byte)
       (message "Byte: %02X (%d),  Char: %04X (%d)"
                byte byte
                (unibyte-char-to-multibyte byte)
                (unibyte-char-to-multibyte byte)))

(check-multibyte-code #xE0)


The result in 21.3 is correct:
    Byte: E0 (224),  Char: 0C60 (3168)

But on 23.1 I get:
    Byte: E0 (224),  Char: 3FFFE0 (4194272)
i.e. "literal" #xE0.

The output of describe-current-coding-system (Emacs-21.3, the
output of Emacs-23.1 is almost the same) is:

Coding system for saving this buffer:
  Not set locally, use the default.
Default coding system (for new files):
  8 -- hebrew-iso-8bit-unix
Coding system for keyboard input:
  nil
Coding system for terminal output:
  8 -- hebrew-iso-8bit
Defaults for subprocess I/O:
  decoding: 8 -- hebrew-iso-8bit-unix
  encoding: 8 -- hebrew-iso-8bit-unix

Priority order for recognizing coding systems when reading files:
  1. hebrew-iso-8bit
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. iso-2022-jp (alias: junet)
  4. iso-2022-7bit
  5. iso-2022-7bit-lock (alias: iso-2022-int-1)
  6. iso-2022-8bit-ss2
  7. emacs-mule
  8. raw-text
  9. japanese-shift-jis (alias: shift_jis sjis)
  10. chinese-big5 (alias: big5 cn-big5)
  11. no-conversion (alias: binary)
  12. mule-utf-8 (alias: utf-8)

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

  The followings are decoded correctly but recognized as iso-2022-7bit-lock:
    iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext iso-2022-jp-2 iso-2022-kr

Particular coding systems specified for certain file names:

  OPERATION	TARGET PATTERN		CODING SYSTEM(s)
  ---------	--------------		----------------
  File I/O      "\\.\\(reg\\|REG\\)$"   (raw-text-dos . raw-text-dos)
                "\\.t\\(bz2?\\)\\|\\([bz]2\\)\\'"
                                        (no-conversion . no-conversion)
                "\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.gz\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.tgz\\'"             (no-conversion . no-conversion)
                "\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
                                        (no-conversion . no-conversion)
                "\\.elc\\'"             (emacs-mule . emacs-mule)
                "\\(\\`\\|/\\)loaddefs.el\\'"
                                        (raw-text . raw-text-unix)
                "\\.tar\\'"             (no-conversion . no-conversion)
                ""                      (hebrew-iso-8bit)
  Process I/O	nothing specified
  Network I/O	nothing specified

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 10:16         ` Ehud Karni
@ 2010-08-29 11:21           ` Eli Zaretskii
  2010-08-29 11:49             ` Ehud Karni
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-29 11:21 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel, handa

> Date: Sun, 29 Aug 2010 13:16:02 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: eliz@gnu.org, emacs-devel@gnu.org
> Reply-to: ehud@unix.mvs.co.il
> 
> > ;; For NBSP (U+00A0)
> > (aset standard-display-table #xA0
> >       (vector (unibyte-char-to-multibyte #xA0)))
> 
> This does not work because `unibyte-char-to-multibyte' does not give
> the right result in Emacs-23.1 (it works well on Emacs-21.3).
> Sorry, I did not check on latest Emacs
> 
> I used the following to check it:
> 
> (defun check-multibyte-code (byte)
>        (message "Byte: %02X (%d),  Char: %04X (%d)"
>                 byte byte
>                 (unibyte-char-to-multibyte byte)
>                 (unibyte-char-to-multibyte byte)))
> 
> (check-multibyte-code #xE0)
> 
> 
> The result in 21.3 is correct:
>     Byte: E0 (224),  Char: 0C60 (3168)
> 
> But on 23.1 I get:
>     Byte: E0 (224),  Char: 3FFFE0 (4194272)
> i.e. "literal" #xE0.

The last result is correct 0x3FFFE0 is the internal representation of
0xE0 in Emacs 23.  Emacs 23 and later extends the Unicode code space
with these characters (and some others).

Why did you think it was incorrect?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 11:21           ` Eli Zaretskii
@ 2010-08-29 11:49             ` Ehud Karni
  2010-08-29 13:06               ` Ehud Karni
  2010-08-29 14:04               ` Eli Zaretskii
  0 siblings, 2 replies; 36+ messages in thread
From: Ehud Karni @ 2010-08-29 11:49 UTC (permalink / raw)
  To: eliz; +Cc: handa, emacs-devel

On Sun, 29 Aug 2010 07:21:26 Eli Zaretskii wrote:
>
> > From: "Ehud Karni" <ehud@unix.mvs.co.il>
> >
> > > ;; For NBSP (U+00A0)
> > > (aset standard-display-table #xA0
> > >       (vector (unibyte-char-to-multibyte #xA0)))
> >
> > This does not work because `unibyte-char-to-multibyte' does not give
> > the right result in Emacs-23.1 (it works well on Emacs-21.3).
> >
> > I used the following to check it:
> >
> > (defun check-multibyte-code (byte)
> >        (message "Byte: %02X (%d),  Char: %04X (%d)"
> >                 byte byte
> >                 (unibyte-char-to-multibyte byte)
> >                 (unibyte-char-to-multibyte byte)))
> >
> > (check-multibyte-code #xE0)
> >
> >
> > The result in 21.3 is correct:
> >     Byte: E0 (224),  Char: 0C60 (3168)
> >
> > But on 23.1 I get:
> >     Byte: E0 (224),  Char: 3FFFE0 (4194272)
> > i.e. "literal" #xE0.
>
> The last result is correct 0x3FFFE0 is the internal representation of
> 0xE0 in Emacs 23.  Emacs 23 and later extends the Unicode code space
> with these characters (and some others).
>
> Why did you think it was incorrect?

Because of my coding system (iso-8859, remember ?) the #xE0 should be
displayed as Aleph, not some 8 bit byte E0.

Instead of trying to understand my problem, you are telling me why
Emacs behaves in this way (which is of no use for Handa san suggestion:
a way to set the display table). If you think this is the way it should
be, give the reason, not the technical details.


Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 11:49             ` Ehud Karni
@ 2010-08-29 13:06               ` Ehud Karni
  2010-08-29 13:50                 ` Eli Zaretskii
  2010-08-29 14:04               ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-08-29 13:06 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel, handa

On Sun, 29 Aug 2010 14:49:03 Ehud Karni wrote:
>
> On Sun, 29 Aug 2010 07:21:26 Eli Zaretskii wrote:
> >
> > > From: "Ehud Karni" <ehud@unix.mvs.co.il>
> > >
> > > > ;; For NBSP (U+00A0)
> > > > (aset standard-display-table #xA0
> > > >       (vector (unibyte-char-to-multibyte #xA0)))
> > >
> > > The result in 21.3 is correct:
> > >     Byte: E0 (224),  Char: 0C60 (3168)
> > >
> > > But on 23.1 I get:
> > >     Byte: E0 (224),  Char: 3FFFE0 (4194272)
> > > i.e. "literal" #xE0.
> >
> > The last result is correct 0x3FFFE0 is the internal representation of
> > 0xE0 in Emacs 23.  Emacs 23 and later extends the Unicode code space
> > with these characters (and some others).
> >
> > Why did you think it was incorrect?
>
> Because of my coding system (iso-8859, remember ?) the #xE0 should be
> displayed as Aleph, not some 8 bit byte E0.

From another thread, I found Handa san suggestion to use `decode-char'.

So my my check function looks now like this:

(defun check-multibyte-code (byte)
       (message "Byte: %02X (%d),  M-Char: %04X (%d),  D-Char: %04X (%d)"
                byte byte
                (unibyte-char-to-multibyte byte)
                (unibyte-char-to-multibyte byte)
                (decode-char 'iso-8859-8 byte)
                (decode-char 'iso-8859-8 byte)))

It fails for Emacs-21.3 because `decode-char' returns nil.
For Emacs-23.1 the result is:
    Byte: E0 (224),  M-Char: 3FFFE0 (4194272),  D-Char 05D0 (1488)

So I can use `decode-char' in 23.1 and `unibyte-char-to-multibyte'
in 21.3 for building a display table.

Can you give the reasons to the changes in these functions ?

I think it is a bad practice to keep the function names while
changing how they work. It breaks tested code.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 13:06               ` Ehud Karni
@ 2010-08-29 13:50                 ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-29 13:50 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel, handa

> Date: Sun, 29 Aug 2010 16:06:42 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: handa@m17n.org, emacs-devel@gnu.org
> Reply-to: ehud@unix.mvs.co.il
> 
> From another thread, I found Handa san suggestion to use `decode-char'.
> 
> So my my check function looks now like this:
> 
> (defun check-multibyte-code (byte)
>        (message "Byte: %02X (%d),  M-Char: %04X (%d),  D-Char: %04X (%d)"
>                 byte byte
>                 (unibyte-char-to-multibyte byte)
>                 (unibyte-char-to-multibyte byte)
>                 (decode-char 'iso-8859-8 byte)
>                 (decode-char 'iso-8859-8 byte)))
> 
> It fails for Emacs-21.3 because `decode-char' returns nil.

IIRC, Emacs 21 supported only 'ucs as the 2nd arg of decode-char.

> Can you give the reasons to the changes in these functions ?

Two: (1) switch to Unicode-based internal representation, and as
result (2) changes in handling of raw eight-bit bytes.

> I think it is a bad practice to keep the function names while
> changing how they work. It breaks tested code.

I agree, but I think in this case there was no choice, unfortunately.

Anyway, I think the key to solving your problem is elsewhere.  I will
try to explain in a separate mail.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 11:49             ` Ehud Karni
  2010-08-29 13:06               ` Ehud Karni
@ 2010-08-29 14:04               ` Eli Zaretskii
  2010-09-07 21:11                 ` Ehud Karni
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-29 14:04 UTC (permalink / raw)
  To: ehud; +Cc: handa, emacs-devel

> Date: Sun, 29 Aug 2010 14:49:03 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: emacs-devel@gnu.org, handa@m17n.org
> Reply-to: ehud@unix.mvs.co.il
> 
> Instead of trying to understand my problem, you are telling me why
> Emacs behaves in this way

Sorry, this wasn't the intent.  I simply didn't see the connection
between your original problem and the code you presented, so I
responded to your "this doesn't work in Emacs 23.1".

Let's back up a little: please tell what is the value of
buffer-file-coding-system after you visit the offending files?

> > > > ;; For NBSP (U+00A0)
> > > > (aset standard-display-table #xA0
> > > >       (vector (unibyte-char-to-multibyte #xA0)))
> > >
> > > This does not work because `unibyte-char-to-multibyte' does not give
> > > the right result in Emacs-23.1 (it works well on Emacs-21.3).

Note that Handa-san recommended to set more than just one slot in
standard-display-table in Emacs 23 to solve similar problems:

  ;; For NBSP (U+00A0)
  (aset standard-display-table #xA0
	(vector (unibyte-char-to-multibyte #xA0)))
  ;; For byte #xA0.
  (aset standard-display-table (unibyte-char-to-multibyte #xA0)
	(vector (unibyte-char-to-multibyte #xA0)))
  (set-terminal-coding-sytem 'no-conversion)
  (set-safe-terminal-coding-system-internal 'no-conversion)

Did you set both slots of standard-display-table as shown above?

> Because of my coding system (iso-8859, remember ?) the #xE0 should be
> displayed as Aleph, not some 8 bit byte E0.

What encoding does the text terminal expect for Hebrew characters?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-28  7:22         ` Eli Zaretskii
@ 2010-08-30  2:24           ` Kenichi Handa
  2010-08-30  3:02             ` Eli Zaretskii
  2010-09-01  3:21             ` Kenichi Handa
  0 siblings, 2 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-08-30  2:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

In article <83y6brkxqe.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Kenichi Handa <handa@m17n.org> > Cc:
> emacs-devel@gnu.org > Date: Sat, 28 Aug 2010 13:18:02
> +0900
> > 
> > In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii
> <eliz@gnu.org> writes:
> > 
> > > > > "Display characters in the range L to H
> literally."
> > > > 
> > > > > The "literally" part is no longer true, is it?
> > > > 
> > > > What's the meaning of "literally" when a display
> table > > > element is [#xA0]?
> > 
> > > It means that a literal byte 0xA0 is sent to the
> terminal.
> > 
> > From which document, can we get that interpretation?

> That's my understanding of the word "literally".

But, how do you apply that understanding to this element:
  [#x100]

> Plus,
> standard-display-8bit worked like that in previous
> versions of Emacs.  If we mean for it to do something
> else, we should amend the docstring.

The current behaviour of standard-display-8bit is the
natural consequence of the fact that we changed character
codes.  But, perhaps we should explain what "literally"
really means.

> > (aset standard-display-table (unibyte-char-to-multibyte
> #xA0) > (vector (unibyte-char-to-multibyte #xA0)))

> Shouldn't standard-display-8bit be modified to use this, instead of
> what it does now?  It seems like it was previously used to work around
> the terminal encoding, but that fire escape was plumbed in Emacs 23.
> Perhaps we should reinstate that feature?

Yes.  That's why I wrote:

handa> Should we change the above code and all other codes setting
handa> 0x80th..0xA0th elements of a display table?

eliz> Yes.  IMO, we should consistently use the codepoints of eight-bit
eliz> characters in all char-tables.

handa>Ok, if Yidong and Stefan agree too, I'll work on it.

I have not yet got any response but have started the work.

> And there's still the question of what to do with the
> fragment in standard-display-european-internal that uses
> standard-display-8bit.  Should it be removed, or should it
> be rewritten in some way?

The docstring of standard-display-european says it's
semi-obsolete.  But, as far as we provide that function, we
should modify the current code to do what expected.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-30  2:24           ` Kenichi Handa
@ 2010-08-30  3:02             ` Eli Zaretskii
  2010-09-01  3:21             ` Kenichi Handa
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2010-08-30  3:02 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-devel

> From: Kenichi Handa <handa@m17n.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 30 Aug 2010 11:24:13 +0900
> 
> > > > > > "Display characters in the range L to H literally."
> > > > > 
> > > > > > The "literally" part is no longer true, is it?
> > > > > 
> > > > > What's the meaning of "literally" when a display table
> > > > > element is [#xA0]?
> > > 
> > > > It means that a literal byte 0xA0 is sent to the terminal.
> > > 
> > > From which document, can we get that interpretation?
> 
> > That's my understanding of the word "literally".
> 
> But, how do you apply that understanding to this element:
>   [#x100]

We could document that standard-display-8bit works only for arguments
less than 256.  We could even code it that way.  That would be
backward compatible, since this function was written when Emacs
supported only unibyte characters.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-30  2:24           ` Kenichi Handa
  2010-08-30  3:02             ` Eli Zaretskii
@ 2010-09-01  3:21             ` Kenichi Handa
  2010-09-01  9:20               ` Ehud Karni
  2010-09-01 23:33               ` Ehud Karni
  1 sibling, 2 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-09-01  3:21 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: eliz, ehud, emacs-devel

In article <tl7hbic3kj6.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes:

handa> Should we change the above code and all other codes setting
handa> 0x80th..0xA0th elements of a display table?

eliz> Yes.  IMO, we should consistently use the codepoints of eight-bit
eliz> characters in all char-tables.

handa>Ok, if Yidong and Stefan agree too, I'll work on it.

> I have not yet got any response but have started the work.

I've just committed the work to emacs-23 branch.  Ehud,
could you try it?

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-01  3:21             ` Kenichi Handa
@ 2010-09-01  9:20               ` Ehud Karni
  2010-09-01 23:33               ` Ehud Karni
  1 sibling, 0 replies; 36+ messages in thread
From: Ehud Karni @ 2010-09-01  9:20 UTC (permalink / raw)
  To: handa; +Cc: eliz, emacs-devel, handa

On Wed, 01 Sep 2010 12:21:17 Kenichi Handa wrote:
>
> handa> Should we change the above code and all other codes setting
> handa> 0x80th..0xA0th elements of a display table?
>
> eliz> Yes.  IMO, we should consistently use the codepoints of eight-bit
> eliz> characters in all char-tables.
>
> handa>Ok, if Yidong and Stefan agree too, I'll work on it.
>
> > I have not yet got any response but have started the work.
>
> I've just committed the work to emacs-23 branch.  Ehud,
> could you try it?

It will take me some time, I'll report after I'll check it.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-01  3:21             ` Kenichi Handa
  2010-09-01  9:20               ` Ehud Karni
@ 2010-09-01 23:33               ` Ehud Karni
  2010-09-02  5:19                 ` Eli Zaretskii
                                   ` (2 more replies)
  1 sibling, 3 replies; 36+ messages in thread
From: Ehud Karni @ 2010-09-01 23:33 UTC (permalink / raw)
  To: handa; +Cc: eliz, emacs-devel, handa

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: Type: text/plain, Size: 2931 bytes --]

On Wed, 01 Sep 2010 12:21:17 Kenichi Handa wrote:
>
> handa> Should we change the above code and all other codes setting
> handa> 0x80th..0xA0th elements of a display table?
>
> I've just committed the work to emacs-23 branch.  Ehud,
> could you try it?

OK. I downloaded and compiled the emacs-23 branch.

I only tested the characters displayed in different language
environment on X and text terminal.

There are 2 big problems - display on text terminal and use of the
display table with `find-file-literally'.

My testing shows that the display table works well on X (in the
3 language environment tested), but VERY poorly on text terminal.

I change the language environment with `set-locale-environment'.


Problem 1:
On text terminal the language environment has great influence on the
use of the display table - characters not it the language - are
always displayed as ? . So in the "C" locale, all characters > 127
are displayed as ?.
In the "he_IL" locale (= ISO-8859-8) characters in the range
191-223 and 251-255 are displayed as ?.
In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0-
#x5EA) are displayed as ?.

I really must use the "he_IL" because most of the file my users view
are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but
I want to see all the characters (#xB0-#xDF) literally (i.e. when a
byte in this range is displayed, its 8 bit value should be sent to
the terminal.


Problem 2:
When I use `find-file-literally' to visit a file, the display table is
mostly ignored, characters in #xA0-B2 are displayed in \OOO form, while
#xB3-DF are displayed as empty boxes (on X), whatever locale I use.
This is a change from the behavior of emacs-21.
Note: This can be controlled by `set-buffer-multibyte t', but then the
display is sometimes corrupted.

I attach a tar.bz2 file containing the following files:

1. test-heb.el - 2 functions: `display-hebrew' sets the display table.
                              `chars-list' - show characters #x20-#xFF.
2. motd - a file with many graphic characters.

3. 23-X-disp.png - display of #x20-#xFF on X (good, not dependent on
                   the locale)

4. 23-tty-C.png  - chars #x20-#xFF on text terminal with locale "C".
5. 23-tty-he.png - chars #x20-#xFF on text terminal with locale "he_IL".
6. 23-tty-en.png - chars #x20-#xFF on text terminal with locale "en_GB".

7. 21-motd-lit.png - `find-file-literally' of the motd on text terminal
                   in emacs 21.4 (good).
8. 23-X-motd.png - the motd on X - upper window: literally,
                                   lower window: regular find-file.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry

[-- Attachment #3: test-heb.tar.bz2 --]
[-- Type: application/X-bzip2-compressed, Size: 150165 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-01 23:33               ` Ehud Karni
@ 2010-09-02  5:19                 ` Eli Zaretskii
  2010-09-02  5:20                 ` Kenichi Handa
  2010-09-02 12:32                 ` Kenichi Handa
  2 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2010-09-02  5:19 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel, handa

> Date: Thu, 2 Sep 2010 02:33:53 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: handa@m17n.org, eliz@gnu.org, emacs-devel@gnu.org
> Reply-to: ehud@unix.mvs.co.il
> 
> Problem 1:
> On text terminal the language environment has great influence on the
> use of the display table - characters not it the language - are
> always displayed as ? . So in the "C" locale, all characters > 127
> are displayed as ?.
> In the "he_IL" locale (= ISO-8859-8) characters in the range
> 191-223 and 251-255 are displayed as ?.
> In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0-
> #x5EA) are displayed as ?.

The locale affects the value of terminal-coding-system.  Other than
that, it shouldn't affect the issues that are important to you in the
context of what we discuss here.

You can control the value of terminal-coding-system with "C-x RET t",
if what set-locale-environment does is not good enough.  In
particular, I would try using cp862.

> I really must use the "he_IL" because most of the file my users view
> are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but
> I want to see all the characters (#xB0-#xDF) literally (i.e. when a
> byte in this range is displayed, its 8 bit value should be sent to
> the terminal.

I thought the latest changes by Handa-san were supposed to do this.
Eight-bit-characters sent to the terminal should always produce the
corresponding 8-bit byte values, no matter which
terminal-coding-system is used.  It sounds like you say that didn't
work?

> Problem 2:
> When I use `find-file-literally' to visit a file, the display table is
> mostly ignored, characters in #xA0-B2 are displayed in \OOO form, while
> #xB3-DF are displayed as empty boxes (on X), whatever locale I use.
> This is a change from the behavior of emacs-21.
> Note: This can be controlled by `set-buffer-multibyte t', but then the
> display is sometimes corrupted.

Sounds like display of unibyte characters doesn't work according to
the display table?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-01 23:33               ` Ehud Karni
  2010-09-02  5:19                 ` Eli Zaretskii
@ 2010-09-02  5:20                 ` Kenichi Handa
  2010-09-04 22:54                   ` Ehud Karni
  2010-09-02 12:32                 ` Kenichi Handa
  2 siblings, 1 reply; 36+ messages in thread
From: Kenichi Handa @ 2010-09-02  5:20 UTC (permalink / raw)
  To: ehud; +Cc: eliz, emacs-devel

In article <201009012333.o81NXrRq016732@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes:

As for Problem 1, I'll reply later.

> Problem 2:
> When I use `find-file-literally' to visit a file,

My change was to make (standard-display-8bit 128 255) work
as Emacs 21 for a unibyte buffer; i.e. when you visit a file
by specifying no-conversion coding-system or by using
find-file-literally.

> I attach a tar.bz2 file containing the following files:

> 1. test-heb.el - 2 functions: `display-hebrew' sets the display table.
>                               `chars-list' - show characters #x20-#xFF.

Please try the attached version of chars-list without any
other display-table setting.  Does it work?

---
Kenichi Handa
handa@m17n.org

;; -*- mode: emacs-lisp; coding: hebrew-iso-8bit-unix -*-
(defun chars-list ()
  "display all characters in range 0x20-0xFF"
  (interactive)
  (let ((svbuf (get-buffer-create "*Help*"))
	(ch 32))
    (with-current-buffer svbuf
      (erase-buffer)
      ;; Make this a unibyte buffer.
      (set-buffer-multibyte nil)
      ;; Make all 8-bit bytes (0x80..0xFF) displayed literally.
      (standard-display-8bit 128 255)
      (insert " List of all displayable characters:\n\n")
      (while (< ch 88)
	(let ((c ch))
	  (while (< c 256)
	    (insert (format " [%c]=%3dD,%3oO,%2xX"  c c c c))
	    (setq c (+ c 56))
	    (if (< c 256)
		(insert " "))))
	(insert "\n")
	(setq ch (1+ ch)))
      (goto-char (point-min)))
    (pop-to-buffer svbuf)))



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-01 23:33               ` Ehud Karni
  2010-09-02  5:19                 ` Eli Zaretskii
  2010-09-02  5:20                 ` Kenichi Handa
@ 2010-09-02 12:32                 ` Kenichi Handa
  2010-09-04 23:32                   ` Ehud Karni
  2 siblings, 1 reply; 36+ messages in thread
From: Kenichi Handa @ 2010-09-02 12:32 UTC (permalink / raw)
  To: ehud; +Cc: eliz, emacs-devel

In article <201009012333.o81NXrRq016732@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes:

> Problem 1:
> On text terminal the language environment has great influence on the
> use of the display table - characters not it the language - are
> always displayed as ? . So in the "C" locale, all characters > 127
> are displayed as ?.
> In the "he_IL" locale (= ISO-8859-8) characters in the range
> 191-223 and 251-255 are displayed as ?.
> In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0-
> #x5EA) are displayed as ?.

> I really must use the "he_IL" because most of the file my users view
> are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but
> I want to see all the characters (#xB0-#xDF) literally (i.e. when a
> byte in this range is displayed, its 8 bit value should be sent to
> the terminal.

I've thought that you are reading files by
find-file-literally (thus buffers are unibyte) because you
wrote below at first:

> I had this code in Emacs 21.3:
> 
> (defun set-standard-display-table ()
>     (setq standard-display-table (make-display-table))
>     (standard-display-8bit 127 254))
> 
> I then set the DOS Hebrew chars (128-144) each to a vector:
>     [ 169 <the corresponding UNIX Hebrew char> ]
> 
> Then visit a file (literally).

But, as you wrote "Problem 2: When I use
`find-file-literally' ...", the "Problem 1" is the case that
you don't use `find-file-literally', and a file is read into
a multibyte buffer decoded by some coding-system, right?

Then, in he_IL locale, by which coding-system your file is
decoded?  C-h C RET shows that coding-system near the top
under the line "Coding system for saving this buffer:".

And I don't understand this part.

> I then set the DOS Hebrew chars (128-144) each to a vector:
>     [ 169 <the corresponding UNIX Hebrew char> ]

169 is not a "UNIX Hebrew char", i.e. not a Unicode
character code of a Hebrew char, nor a code-point of a
Hebrew character in iso-8859-8 character set.

In your mails, you mixup:
 (1) a code-point in a specific character set,
 (2) a character code in Emacs (that is Unicode character code),
 (3) a byte represented by Emacs' 8-bit characters,
and that makes it difficult to understand what exactly you
are saying.

Please write:
 For (1), "a character of code #xXX in XXX charset".
 For (2), just "U+XXXX".
 For (3), just "byte #xXX".

And, you wrote "a small part have MSDOS Hebrew (#x80-#x9A)",
but #x9a is 154, not 144.  Is "144" above just a typo?

Perhaps, the following is the best way to understand what
you want:

(1) You at first make sample files and give me them.
(2) Tell me how you want read that file exactly.
    Just C-x C-f FILENAME RET, or M-x find-file-literally ....,
    or C-x C-m c no-convesion RET C-x C-f FILENAME RET,
    or ...
(3) Show me how it should be displayed on a terminal by an
    image.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-02  5:20                 ` Kenichi Handa
@ 2010-09-04 22:54                   ` Ehud Karni
  2010-09-06  1:30                     ` Kenichi Handa
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-09-04 22:54 UTC (permalink / raw)
  To: handa; +Cc: eliz, emacs-devel

On Thu, 02 Sep 2010 14:20:42 Kenichi Handa wrote:
>
> My change was to make (standard-display-8bit 128 255) work
> as Emacs 21 for a unibyte buffer; i.e. when you visit a file
> by specifying no-conversion coding-system or by using
> find-file-literally.

OK, I misunderstood you before. For text terminal all the 8 bit bytes
are sent as is. ON x, most of the bytes are displayed as empty boxes
(i.e. no glyph). I don't know how to set the display table to get
something more meaningful.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-02 12:32                 ` Kenichi Handa
@ 2010-09-04 23:32                   ` Ehud Karni
  2010-09-05  5:30                     ` Eli Zaretskii
  2010-09-06  5:14                     ` Kenichi Handa
  0 siblings, 2 replies; 36+ messages in thread
From: Ehud Karni @ 2010-09-04 23:32 UTC (permalink / raw)
  To: handa; +Cc: eliz, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: Type: text/plain, Size: 2800 bytes --]

On Thu, 02 Sep 2010 21:32:21 Kenichi Handa wrote:
>
> Then, in he_IL locale, by which coding-system your file is
> decoded?  C-h C RET shows that coding-system near the top
> under the line "Coding system for saving this buffer:".

The coding-system-for-read is hebrew-iso-8bit.

> And I don't understand this part.
>
> > I then set the DOS Hebrew chars (128-144) each to a vector:
> >     [ 169 <the corresponding UNIX Hebrew char> ]
>
> 169 is not a "UNIX Hebrew char", i.e. not a Unicode
> character code of a Hebrew char, nor a code-point of a
> Hebrew character in iso-8859-8 character set.

Yes, that's my problem, I have Hebrew in #xE0-#xFA (iso-8859-8)
but I have other 8 bit bytes (most of them are graphic shapes
from the cp862 set).

> And, you wrote "a small part have MSDOS Hebrew (#x80-#x9A)",
> but #x9a is 154, not 144.  Is "144" above just a typo?

Just a typo, it should be 154.

All my data files are 8bit bytes, so for me it is always, character =
byte (at least externally).


> Perhaps, the following is the best way to understand what
> you want:
>
> (1) You at first make sample files and give me them.
> (2) Tell me how you want read that file exactly.
>     Just C-x C-f FILENAME RET, or M-x find-file-literally ....,
>     or C-x C-m c no-convesion RET C-x C-f FILENAME RET,
>     or ...
> (3) Show me how it should be displayed on a terminal by an
>     image.

I attach a tar.bz2 file with 3 files:
1. lit1 - the sample file.
2. lit1-tty.png - how it should show on text terminal.
3. lit1-x.png   - how it should show on X.

I can do it if I read the file with the iso-latin-1 coding-system
and change the display table to show the Hebrew glyphs for the Hebrew
[#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g.
for the new bidi display). I want it the other way around, to read it
with hebrew-iso-8bit and to to tweak the display table to show all
the bytes not belonging to the Hebrew set.

I had similar problem a long time ago. In 2001 you suggested to use
the following code:

  (make-coding-system
      'hebrew-iso-8bit 2 ?8
      "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)"
      '(ascii hebrew-iso8859-8 nil nil
              nil ascii-eol ascii-cntl nil nil nil nil nil t)
      '((safe-charsets ascii hebrew-iso8859-8 eight-bit-control)
        (mime-charset . iso-8859-8)))

May be I can define a new coding system that will have bytes #x80-#xFF
as legal characters and be recognized as Hebrew variant.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry

[-- Attachment #3: lit1.tar.bz2 --]
[-- Type: application/X-bzip2-compressed, Size: 10044 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-04 23:32                   ` Ehud Karni
@ 2010-09-05  5:30                     ` Eli Zaretskii
  2010-09-06  5:14                     ` Kenichi Handa
  1 sibling, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2010-09-05  5:30 UTC (permalink / raw)
  To: ehud; +Cc: emacs-devel, handa

> Date: Sun, 5 Sep 2010 02:32:43 +0300
> From: "Ehud Karni" <ehud@unix.mvs.co.il>
> Cc: eliz@gnu.org, emacs-devel@gnu.org
> Reply-to: ehud@unix.mvs.co.il
> 
> 1. lit1 - the sample file.
> 2. lit1-tty.png - how it should show on text terminal.
> 3. lit1-x.png   - how it should show on X.
> 
> I can do it if I read the file with the iso-latin-1 coding-system
> and change the display table to show the Hebrew glyphs for the Hebrew
> [#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g.
> for the new bidi display). I want it the other way around, to read it
> with hebrew-iso-8bit and to to tweak the display table to show all
> the bytes not belonging to the Hebrew set.

The file includes Hebrew characters encoded in both hebrew-iso-8bit
and cp862, as well as line-drawing characters from cp862.

Barring bugs in the display table handling, you should be able
eventually to set up standard-display-table to display all the Hebrew
characters as you'd expect to see them, and display the line-drawing
characters correctly as well.  (Judging by the sample file, I'd
suggest to use cp862 rather than hebrew-iso-8bit, because much more
characters are from cp862.  However, you say elsewhere that most of
the characters in your files are hebrew-iso-8bit, so maybe the sample
file is not representative enough.)

But if you want all the Hebrew characters to be treated by Emacs as
such (e.g., for bidi display), no matter what's their encoding in the
file, you will have to define a coding-system that will decode them
all into Unicode codepoints of Hebrew characters.  There's a problem
you will need to solve for defining such a coding system: it has 2
different encodings for the same character, one from hebrew-iso-8bit,
the other from cp862.  So you will need to decide how will Hebrew
characters be encoded when the file is saved.

Alternatively, we could expose in Lisp the char-table used by the bidi
reordering engine for reordering characters, where you could change
the bidi class of the non-Hebrew characters that are displayed as
Hebrew.  Until now, there was no plausible use-case for changing that
table (and frankly, I'd prefer not to go there, as futzing with that
table could potentially cause trouble).



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-04 22:54                   ` Ehud Karni
@ 2010-09-06  1:30                     ` Kenichi Handa
  0 siblings, 0 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-09-06  1:30 UTC (permalink / raw)
  To: ehud; +Cc: eliz, emacs-devel

In article <201009042254.o84MsEcf004615@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes:

> > My change was to make (standard-display-8bit 128 255) work
> > as Emacs 21 for a unibyte buffer; i.e. when you visit a file
> > by specifying no-conversion coding-system or by using
> > find-file-literally.

> OK, I misunderstood you before. For text terminal all the 8 bit bytes
> are sent as is. ON x, most of the bytes are displayed as empty boxes
> (i.e. no glyph). I don't know how to set the display table to get
> something more meaningful.

This is the lasted docstring of standard-display-8bit:

============================================================
standard-display-8bit is a compiled Lisp function in `disp-table.el'.

(standard-display-8bit L H)

Display characters representing raw bytes in the range L to H literally.

On a terminal display, each character in the range is displayed
by sending the corresponding byte directly to the terminal.

On a graphic display, each character in the range is displayed
using the default font by a glyph whose code is the corresponding
byte.

Note that ASCII printable characters (SPC to TILDA) are displayed
in the default way after this call.
============================================================

So, on X (or on any graphic display), whether it works as
expected or not depends on which font you use as the default
font.  Most TrueType fonts are not good in this repect.  X
core fonts of legacy charset (e.g. -*-iso8859-8) are good.
Please tell me which font is the default for you (by moving
cursor on some Latin alphabet and typing C-u C-x =).

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-04 23:32                   ` Ehud Karni
  2010-09-05  5:30                     ` Eli Zaretskii
@ 2010-09-06  5:14                     ` Kenichi Handa
  1 sibling, 0 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-09-06  5:14 UTC (permalink / raw)
  To: ehud; +Cc: eliz, emacs-devel

In article <201009042332.o84NWhSA017839@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes:

> I attach a tar.bz2 file with 3 files:
> 1. lit1 - the sample file.
> 2. lit1-tty.png - how it should show on text terminal.
> 3. lit1-x.png   - how it should show on X.

> I can do it if I read the file with the iso-latin-1 coding-system
> and change the display table to show the Hebrew glyphs for the Hebrew
> [#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g.
> for the new bidi display). I want it the other way around, to read it
> with hebrew-iso-8bit and to to tweak the display table to show all
> the bytes not belonging to the Hebrew set.

Does it mean that you want bidi-reordering for the bytes
#xE0..#xFA (code-points of iso-8859-8) but bidi-reordering
is not necessary for the bytes #x80..#x8A (code-points of
cp862)?

But, your file "lit1" contains #xE0..#xFA (code-points of
iso-8859-8) at the second to 4th lines in visual order.  If
bidi-reordering is applied on them, you'll get the different
view than lit1-tty.png and lit1-x.png.  Is that ok?

> I had similar problem a long time ago. In 2001 you suggested to use
> the following code:

>   (make-coding-system
>       'hebrew-iso-8bit 2 ?8
>       "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)"
>       '(ascii hebrew-iso8859-8 nil nil
>               nil ascii-eol ascii-cntl nil nil nil nil nil t)
>       '((safe-charsets ascii hebrew-iso8859-8 eight-bit-control)
>         (mime-charset . iso-8859-8)))

> May be I can define a new coding system that will have bytes #x80-#xFF
> as legal characters and be recognized as Hebrew variant.

This code will that.  I think it's not difficult to
understand what the code is doing.

------------------------------------------------------------
(define-charset 'cp862-sub
  "Subset of CP862"
  :code-space [#x80 #xDF]
  :subset '(cp862 #x80 #xDF #x00))

(define-charset 'iso-8859-8-sub
  "Subset of ISO-8859-8"
  :code-space [#xE0 #xFA]
  :subset '(iso-8859-8 #xE0 #xFA #x00))

(define-coding-system 'mix-hebrew
  "Mixture of ISO-8859-8 and CP862"
  :mnemonic ?H
  :coding-type 'charset
  :charset-list '(ascii iso-8859-8-sub cp862-sub)
  :ascii-compatible-p t)
------------------------------------------------------------

Please try C-x C-m c mix-hebrew RET lit1 RET.

But, if you do that, you must consider the problem Eli wrote:

In article <E1Os7oU-0006m6-7X@fencepost.gnu.org>, Eli
Zaretskii <eliz@gnu.org> writes:

> But if you want all the Hebrew characters to be treated by Emacs as
> such (e.g., for bidi display), no matter what's their encoding in the
> file, you will have to define a coding-system that will decode them
> all into Unicode codepoints of Hebrew characters.  There's a problem
> you will need to solve for defining such a coding system: it has 2
> different encodings for the same character, one from hebrew-iso-8bit,
> the other from cp862.  So you will need to decide how will Hebrew
> characters be encoded when the file is saved.

In the above definition of mix-hebrew, as iso-8859-8-sub is
listed before cp862-sub, all Hebrew characters are encoded
into bytes #xE0..#xFA even if they were originally decoded
from bytes #x80..#x9A.

If you don't like it, you must give up decoding bytes
#x80..#x9A into Hebrew chars.  You decode them as raw-bytes,
and setup a display table to display them as Hebrew chars.
It can be done by this code:

------------------------------------------------------------
(define-charset 'cp862-sub
  "Subset of CP862"
  :code-space [#x9B #xDF]
  :subset '(cp862 #x9B #xDF #x00))

(define-charset 'iso-8859-8-sub
  "Subset of ISO-8859-8"
  :code-space [#xE0 #xFA]
  :subset '(iso-8859-8 #xE0 #xFA #x00))

(define-coding-system 'mix-hebrew
  "Mixture of ISO-8859-8, CP862, and raw 8-bit bytes"
  :mnemonic ?H
  :coding-type 'charset
  :charset-list '(ascii iso-8859-8-sub cp862-sub eight-bit)
  :ascii-compatible-p t)

(require 'disp-table)
;; Display bytes #x80..#x9A as Hebrew chars (code-points #xE0..#xFA of
;; ISO-8859-8).
(dotimes (i #x1B)
  (aset standard-display-table
	(unibyte-char-to-multibyte (+ #x80 i))
	(vector (decode-char 'iso-8859-8 (+ #xE0 i)))))
------------------------------------------------------------

This display-table setting works also on terminal as far as
you set terminal coding system to mix-hebrew.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-08-29 14:04               ` Eli Zaretskii
@ 2010-09-07 21:11                 ` Ehud Karni
  2010-09-09 11:57                   ` Kenichi Handa
  0 siblings, 1 reply; 36+ messages in thread
From: Ehud Karni @ 2010-09-07 21:11 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel, handa

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=iso-8859-8-i, Size: 6577 bytes --]

[ long post ]

On Sun, 29 Aug 2010 10:04:16 Eli Zaretskii wrote:
>
> Note that Handa-san recommended to set more than just one slot in
> standard-display-table in Emacs 23 to solve similar problems:

I have not solved it yet fully, but I think now it is only minor
details, after I defined a new coding system (see below).


On Mon, 06 Sep 2010 14:14:01 Kenichi Handa wrote:
>
> Does it mean that you want bidi-reordering for the bytes
> #xE0..#xFA (code-points of iso-8859-8) but bidi-reordering
> is not necessary for the bytes #x80..#x8A (code-points of
> cp862)?

No, I want bidi ordering (or not) for both iso-8859-8 and CP862 at
the SAME time.

> But, your file "lit1" contains #xE0..#xFA (code-points of
> iso-8859-8) at the second to 4th lines in visual order.  If
> bidi-reordering is applied on them, you'll get the different
> view than lit1-tty.png and lit1-x.png.  Is that ok?

This is just an example. Files used directly with cat (like /etc/motd)
must be in visual order. Other files used with GUI must have logical
order. The data files we edit each day are of both types.

EK> May be I can define a new coding system that will have bytes #x80-#xFF
EK> as legal characters and be recognized as Hebrew variant.
>
> This code will that.  I think it's not difficult to
> understand what the code is doing.

[snip]

> But, if you do that, you must consider the problem Eli wrote:
>
EZ> But if you want all the Hebrew characters to be treated by Emacs as
EZ> such (e.g., for bidi display), no matter what's their encoding in the
EZ> file, you will have to define a coding-system that will decode them
EZ> all into Unicode codepoints of Hebrew characters.  There's a problem
EZ> you will need to solve for defining such a coding system: it has 2
EZ> different encodings for the same character, one from hebrew-iso-8bit,
EZ> the other from cp862.  So you will need to decide how will Hebrew
EZ> characters be encoded when the file is saved.
>
> In the above definition of mix-hebrew, as iso-8859-8-sub is
> listed before cp862-sub, all Hebrew characters are encoded
> into bytes #xE0..#xFA even if they were originally decoded
> from bytes #x80..#x9A.
>
> If you don't like it, you must give up decoding bytes
> #x80..#x9A into Hebrew chars.  You decode them as raw-bytes,
> and setup a display table to display them as Hebrew chars.
> It can be done by this code:

I think I solved this by using text properties.

It is still unfinished, but it works, and I'll appreciate any comments.
There are some problems, see at the end.

Here is what I did (based on your advice).


(define-charset 'hebrew-MSDOS-binary
  "Hebrew subset of CP862 (#x80-#x9A) with no-conversion"
  :code-space [#x80 #x9A]
  :map (let ((map (make-vector 54 0))
             (ix 27))
         (while (> ix 0)
	   (setq ix (1- ix))
           (aset map (+ ix ix)   (+ #x80 ix))
           (aset map (+ ix ix 1) (+ #x80 ix)))
         map)
  :supplementary-p t)


(define-charset 'graphic-MSDOS-subset
  "Graphic subset of CP862"
  :code-space [#x9B #xDF]
  :subset '(cp862 #x9B #xDF #x00))
  :supplementary-p t)


(define-charset 'hebrew-iso-8859-8-subset
  "Subset of ISO-8859-8"
  :code-space [#xE0 #xFA]
  :subset '(iso-8859-8 #xE0 #xFA #x00))
  :supplementary-p t)


(define-coding-system 'hebrew-iso-with-8bit-bytes
  "The iso-8859-8 charset + bytes #x80-#xDF from CP862"
  :mnemonic ?H
  :coding-type 'charset
  :charset-list '(ascii hebrew-iso-8859-8-subset hebrew-MSDOS-binary graphic-MSDOS-subset)
  :post-read-conversion 'hebrew-iso-with-8bit-post-read
  :pre-write-conversion 'hebrew-iso-with-8bit-pre-write
  :ascii-compatible-p t)


(defun hebrew-iso-with-8bit-post-read (length)
       (let ((src (concat "^" '[ #x80 ] "-" '[ #x9A ]))    ;; seems "^\200-\232" does not work
             (sv-pos (point))
             (max-pos (+ (point) length))
             chr)
           (while (and (skip-chars-forward src max-pos)
                       (setq chr (char-after)))
               ;;      (message "At %d after char %d" (point) (char-after))
               (delete-char 1)
               (insert-char (+ chr #x550) 1)               ;; #x05D0 - #x80
               (add-text-properties (1- (point)) (point)
                       `(Hebrew DOS
                         face menu))))
       0)


(defun hebrew-iso-with-8bit-pre-write (start end)
       (let* ((text (if (numberp start)
                      (buffer-substring start end)
                      start))
              (beg 0)
              (end (length text))
              va)
           (while (setq beg (text-property-any beg end 'Hebrew 'DOS text))
               (setq va (aref text beg))
               (and (>= va #x05D0)                 ;; à
                    (<= va #x05EA)                 ;; ú
                    (aset text beg (- va #x550)))
               (setq beg (1+ beg)))
           (set-buffer (get-buffer-create " *heb-wrt*"))
           (delete-region (point-min) (point-max))
           (insert text)
           nil))


There are some Problems:

1. (describe-character-set 'hebrew-MSDOS-binary) exit with error:
   Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 131 132 132 ...]
   The vector is the :map value.

2. The `:post-read-conversion' function must return a number otherwise there is an error.
   There is nothing about it in `define-coding-system' documentation.

3. The documentation for `write-region-annotate-functions' has:
    "The function should return a list of pairs of the form (POSITION . STRING),
    consisting of strings to be effectively inserted at the specified positions
    of the file being written (1 means to insert before the first byte written).
    The POSITIONs must be sorted into increasing order."
  This did not work at all. I had to use the alternate pathway:
    An annotation function can return with a different buffer current.
    Doing so removes the annotations returned by previous functions, and
    resets START and END to `point-min' and `point-max' of the new buffer.

Thank you both. I will post when I'll finish all the details.

Ehud.


--
 Ehud Karni           Tel: +972-3-7966-561  /"\
 Mivtach - Simon      Fax: +972-3-7976-561  \ /  ASCII Ribbon Campaign
 Insurance agencies   (USA) voice mail and   X   Against   HTML   Mail
 http://www.mvs.co.il  FAX:  1-815-5509341  / \
 GnuPG: 98EA398D <http://www.keyserver.net/>    Better Safe Than Sorry



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Usage of standard-display-table in MSDOS
  2010-09-07 21:11                 ` Ehud Karni
@ 2010-09-09 11:57                   ` Kenichi Handa
  0 siblings, 0 replies; 36+ messages in thread
From: Kenichi Handa @ 2010-09-09 11:57 UTC (permalink / raw)
  To: ehud; +Cc: eliz, emacs-devel

In article <201009072111.o87LBeU2009811@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes:

> > If you don't like it, you must give up decoding bytes
> > #x80..#x9A into Hebrew chars.  You decode them as raw-bytes,
> > and setup a display table to display them as Hebrew chars.
> > It can be done by this code:

> I think I solved this by using text properties.

Ah, ummm, a kind of dirty hack, but, perhaps it's
unavoidable in your situation.

> It is still unfinished, but it works, and I'll appreciate any comments.
> There are some problems, see at the end.

> Here is what I did (based on your advice).


> (define-charset 'hebrew-MSDOS-binary
>   "Hebrew subset of CP862 (#x80-#x9A) with no-conversion"
>   :code-space [#x80 #x9A]
>   :map (let ((map (make-vector 54 0))
>              (ix 27))
>          (while (> ix 0)
> 	   (setq ix (1- ix))
>            (aset map (+ ix ix)   (+ #x80 ix))
>            (aset map (+ ix ix 1) (+ #x80 ix)))
>          map)
>   :supplementary-p t)

For that, you don't have to use :map but can use
:code-offset as this:

(define-charset 'hebrew-MSDOS-binary
  "Hebrew subset of CP862 (#x80-#x9A) with no-conversion"
  :code-space [#x80 #x9A]
  :code-offset #x80
  :supplementary-p t)

> (defun hebrew-iso-with-8bit-pre-write (start end)
>        (let* ((text (if (numberp start)
>                       (buffer-substring start end)
>                       start))
>               (beg 0)
>               (end (length text))
>               va)
>            (while (setq beg (text-property-any beg end 'Hebrew 'DOS text))
>                (setq va (aref text beg))
>                (and (>= va #x05D0)                 ;; א
>                     (<= va #x05EA)                 ;; ת
>                     (aset text beg (- va #x550)))
>                (setq beg (1+ beg)))
>            (set-buffer (get-buffer-create " *heb-wrt*"))
>            (delete-region (point-min) (point-max))
>            (insert text)
>            nil))

You don't have to make a working buffer.  You can directly
modify the current buffer because it's already a working
buffer managed by Emacs itself.

> There are some Problems:

> 1. (describe-character-set 'hebrew-MSDOS-binary) exit with error:
>    Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 131 132 132 ...]
>    The vector is the :map value.

Ah, I'll fix it soon.

> 2. The `:post-read-conversion' function must return a number otherwise there is an error.
>    There is nothing about it in `define-coding-system' documentation.

I'm going to fix the docstring as this.

`:post-read-conversion'

VALUE must be a function to call after some text is inserted and
decoded by the coding system itself and before any functions in
`after-insert-functions' are called.  This function is passed one
argument; the number of characters in the text to convert, with
point at the start of the text.  The function should leave point
the same, and return the new character count.


> 3. The documentation for `write-region-annotate-functions' has:
>     "The function should return a list of pairs of the form (POSITION . STRING),
>     consisting of strings to be effectively inserted at the specified positions
>     of the file being written (1 means to insert before the first byte written).
>     The POSITIONs must be sorted into increasing order."
>   This did not work at all. I had to use the alternate pathway:
>     An annotation function can return with a different buffer current.
>     Doing so removes the annotations returned by previous functions, and
>     resets START and END to `point-min' and `point-max' of the new buffer.

For this part, I'm going to fix as this:

`:pre-write-conversion'

VALUE must be a function to call after all functions in
`write-region-annotate-functions' and `buffer-file-format' are
called, and before the text is encoded by the coding system
itself.  This function should convert the whole text in the
current buffer.  For backward compatibility, this funciton is
passed two arguments which can be ignored.

---
Kenichi Handa
handa@m17n.org



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2010-09-09 11:57 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa
2010-08-24  5:34 ` Stephen J. Turnbull
2010-08-24 11:13   ` Ehud Karni
2010-08-24 16:51     ` Eli Zaretskii
2010-08-25 13:04       ` Ehud Karni
2010-08-25 18:09         ` Eli Zaretskii
2010-08-26 15:26           ` Ehud Karni
2010-08-26 16:43             ` Eli Zaretskii
2010-08-27 13:35               ` Ehud Karni
2010-08-27 16:30                 ` Eli Zaretskii
2010-08-27 10:24 ` Eli Zaretskii
2010-08-27 11:44   ` Kenichi Handa
2010-08-27 14:13     ` Eli Zaretskii
2010-08-28  4:18       ` Kenichi Handa
2010-08-28  7:22         ` Eli Zaretskii
2010-08-30  2:24           ` Kenichi Handa
2010-08-30  3:02             ` Eli Zaretskii
2010-09-01  3:21             ` Kenichi Handa
2010-09-01  9:20               ` Ehud Karni
2010-09-01 23:33               ` Ehud Karni
2010-09-02  5:19                 ` Eli Zaretskii
2010-09-02  5:20                 ` Kenichi Handa
2010-09-04 22:54                   ` Ehud Karni
2010-09-06  1:30                     ` Kenichi Handa
2010-09-02 12:32                 ` Kenichi Handa
2010-09-04 23:32                   ` Ehud Karni
2010-09-05  5:30                     ` Eli Zaretskii
2010-09-06  5:14                     ` Kenichi Handa
2010-08-29 10:16         ` Ehud Karni
2010-08-29 11:21           ` Eli Zaretskii
2010-08-29 11:49             ` Ehud Karni
2010-08-29 13:06               ` Ehud Karni
2010-08-29 13:50                 ` Eli Zaretskii
2010-08-29 14:04               ` Eli Zaretskii
2010-09-07 21:11                 ` Ehud Karni
2010-09-09 11:57                   ` Kenichi Handa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.