* Usage of standard-display-table in MSDOS @ 2010-08-23 12:44 Kenichi Handa 2010-08-24 5:34 ` Stephen J. Turnbull 2010-08-27 10:24 ` Eli Zaretskii 0 siblings, 2 replies; 36+ messages in thread From: Kenichi Handa @ 2010-08-23 12:44 UTC (permalink / raw) To: emacs-devel In msdos-initialize-window-system (of term/pc-win.el), I found this code: ;; In multibyte mode, we want unibyte buffers to be displayed ;; using the terminal coding system, so that they display ;; correctly on the DOS terminal; in unibyte mode we want to see ;; all 8-bit characters verbatim. In both cases, we want the ;; entire range of 8-bit characters to arrive at our display code ;; verbatim. (standard-display-8bit 127 255) Is it really working in non-iso-8859-1 environment as expected? Note that 128..255 are latin-1 characters after Emacs 23, not raw-bytes. So, I think the above call will make 8-bit bytes in unibyte buffer displayed as latin-1 characters, but as the termial encoding system doesn't support latin-1 chars in, for instance, greek environment, just '?' will be displayed. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Usage of standard-display-table in MSDOS 2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa @ 2010-08-24 5:34 ` Stephen J. Turnbull 2010-08-24 11:13 ` Ehud Karni 2010-08-27 10:24 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Stephen J. Turnbull @ 2010-08-24 5:34 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel Kenichi Handa writes: > In msdos-initialize-window-system (of term/pc-win.el), I > found this code: > > ;; In multibyte mode, we want unibyte buffers to be displayed > ;; using the terminal coding system, so that they display > ;; correctly on the DOS terminal; in unibyte mode we want to see > ;; all 8-bit characters verbatim. In both cases, we want the > ;; entire range of 8-bit characters to arrive at our display code > ;; verbatim. > (standard-display-8bit 127 255) > > Is it really working in non-iso-8859-1 environment as > expected? Note that 128..255 are latin-1 characters after > Emacs 23, not raw-bytes. So, I think the above call will > make 8-bit bytes in unibyte buffer displayed as latin-1 > characters, but as the termial encoding system doesn't > support latin-1 chars in, for instance, greek environment, > just '?' will be displayed. Hebrew and Cyrillic are other obvious candidates for testing here. They seem to have more active participants on emacs-devel. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-24 5:34 ` Stephen J. Turnbull @ 2010-08-24 11:13 ` Ehud Karni 2010-08-24 16:51 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-24 11:13 UTC (permalink / raw) To: stephen; +Cc: eliz, emacs-devel, handa On Tue, 24 Aug 2010 14:34:37 Stephen J. Turnbull wrote: > > > > ;; In multibyte mode, we want unibyte buffers to be displayed > > ;; using the terminal coding system, so that they display > > ;; correctly on the DOS terminal; in unibyte mode we want to see > > ;; all 8-bit characters verbatim. In both cases, we want the > > ;; entire range of 8-bit characters to arrive at our display code > > ;; verbatim. > > (standard-display-8bit 127 255) > > > > Is it really working in non-iso-8859-1 environment as > > expected? Note that 128..255 are latin-1 characters after > > Emacs 23, not raw-bytes. So, I think the above call will > > make 8-bit bytes in unibyte buffer displayed as latin-1 > > characters, but as the termial encoding system doesn't > > support latin-1 chars in, for instance, greek environment, > > just '?' will be displayed. > > Hebrew and Cyrillic are other obvious candidates for testing here. > They seem to have more active participants on emacs-devel. From my checks this does not work on text terminals (it really depends on the LANG env variable). I had this code in Emacs 21.3: (defun set-standard-display-table () (setq standard-display-table (make-display-table)) (standard-display-8bit 127 254)) I then set the DOS Hebrew chars (128-144) each to a vector: [ 169 <the corresponding UNIX Hebrew char> ] Then visit a file (literally). In Emacs 21.3 it works fine with any value of LANG, show the Hebrew chars as they should, and Hebrew DOS (CP862) chars with a prefix. In Emacs 23.1 it works only if the LANG is set to a Latin-1 value (eg en_GB). I want to see Hebrew (iso-8559-8) characters even when LANG=C, because setting the LANG to he_IL changes to much other things (for example, it change the `ls' output, which breaks dired). The problem as I see it is that the characters it the vectors in the display table are going further translation and not used "literally". The use of UTF-8 (which works well on X) is not an option. Many of the users has text terminals, and most of the data file viewed are in iso-8859-8 or even Hebrew DOS (CP862). I recently install some Emacs stuff in an Israeli insurance company and because of this problem I used 21.3 instead of newer version. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-24 11:13 ` Ehud Karni @ 2010-08-24 16:51 ` Eli Zaretskii 2010-08-25 13:04 ` Ehud Karni 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-24 16:51 UTC (permalink / raw) To: ehud; +Cc: emacs-devel > Date: Tue, 24 Aug 2010 14:13:46 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: handa@m17n.org, eliz@gnu.org, emacs-devel@gnu.org > > I want to see Hebrew (iso-8559-8) characters even when LANG=C, because > setting the LANG to he_IL changes to much other things (for example, > it change the `ls' output, which breaks dired). You could do M-x set-locale-environment RET he_IL RET from inside Emacs, which I think will do what you want without affecting `ls' etc. (unless you mean `ls' that is run from the Emacs shell buffer). > The problem as I see it is that the characters it the vectors in the > display table are going further translation and not used "literally". I don't understand what you are trying to say here. Please elaborate about "further translation" and "not used literally". ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-24 16:51 ` Eli Zaretskii @ 2010-08-25 13:04 ` Ehud Karni 2010-08-25 18:09 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-25 13:04 UTC (permalink / raw) To: eliz; +Cc: emacs-devel On Tue, 24 Aug 2010 19:51:58 Eli Zaretskii wrote: > > > > I want to see Hebrew (iso-8559-8) characters even when LANG=C, because > > setting the LANG to he_IL changes to much other things (for example, > > it change the `ls' output, which breaks dired). > > You could do > > M-x set-locale-environment RET he_IL RET > > from inside Emacs, which I think will do what you want without > affecting `ls' etc. (unless you mean `ls' that is run from the Emacs > shell buffer). That fix my problem. It does not change any env variable so it is good even for shell spawned from Emacs. > > The problem as I see it is that the characters it the vectors in the > > display table are going further translation and not used "literally". > > I don't understand what you are trying to say here. Please elaborate > about "further translation" and "not used literally". The best way to understand it is with an example: For the DOS Hebrew Aleph The standard-display-table is set like this: (aset standard-display-table 128 '[ 169 244 ] ) In Emacs 21.3 these exact characters were displayed (sent) to the text terminal and appeared as prefix char + Aleph. In 23.1 I see the prefix + ? (question mark). The character `244' (Aleph) is been encoded in the current locale and this inhibits its display as Aleph. You can easily check it by the following prescription: (setq standard-display-table (make-display-table)) (standard-display-8bit 128 254) (set-locale-environment "en_GB") (find-file-literally <a file with Hebrew (#xE0-#xFA) characters>) check how it is displayed - you see the Hebrew as it should. Now change the locale. (set-locale-environment "he_IL") You see ? because the #xE0-#xFA is encoded in Hebrew locale and are meaningless (instead of just being plain 8 bit). The standard-display-table has not changed, but the meaning of the 8 bit numbers in the characters vectors has changed. To solve my Hebrew display I have 2 possibilities: 1. Set the locale to some Latin-1 language (e.g. en_GB) and continue to work like I do in 21.3. It is simpler but I it is some kind of deceiving myself, and it will work only with 8 bit Hebrew fonts. 2. Set the locale to Hebrew and change the display table (entries #x80- #x9A - DOS Hebrew, and #xE0-#xFA - ISO-8859-8 Hebrew to UTF Hebrew) but then I have to set all the DOS graphic characters myself. I'll go the 2nd way, but I'll appreciate something that will ease it, i.e. a way to set the standard-display-table for all the non Hebrew characters < 256 to something that will make it work like CP862. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-25 13:04 ` Ehud Karni @ 2010-08-25 18:09 ` Eli Zaretskii 2010-08-26 15:26 ` Ehud Karni 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-25 18:09 UTC (permalink / raw) To: ehud; +Cc: emacs-devel > Date: Wed, 25 Aug 2010 16:04:56 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: emacs-devel@gnu.org > > For the DOS Hebrew Aleph The standard-display-table is set like this: > (aset standard-display-table 128 '[ 169 244 ] ) > In Emacs 21.3 these exact characters were displayed (sent) to the text > terminal and appeared as prefix char + Aleph. > In 23.1 I see the prefix + ? (question mark). > > The character `244' (Aleph) is been encoded in the current locale > and this inhibits its display as Aleph. And why don't you just say "C-x RET t cp862 RET"? That's what you want -- to send cp862 codes to the terminal, right? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-25 18:09 ` Eli Zaretskii @ 2010-08-26 15:26 ` Ehud Karni 2010-08-26 16:43 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-26 15:26 UTC (permalink / raw) To: eliz; +Cc: emacs-devel On Wed, 25 Aug 2010 21:09:46 Eli Zaretskii <eliz@gnu.org> wrote: > > And why don't you just say "C-x RET t cp862 RET"? That's what you > want -- to send cp862 codes to the terminal, right? No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF to appear in Hebrew on BOTH text terminals and X. In addition I want to use some of the graphic characters of the CP862 set, again both in text terminal and in X. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-26 15:26 ` Ehud Karni @ 2010-08-26 16:43 ` Eli Zaretskii 2010-08-27 13:35 ` Ehud Karni 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-26 16:43 UTC (permalink / raw) To: ehud; +Cc: emacs-devel > Date: Thu, 26 Aug 2010 18:26:13 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: emacs-devel@gnu.org > > On Wed, 25 Aug 2010 21:09:46 Eli Zaretskii <eliz@gnu.org> wrote: > > > > And why don't you just say "C-x RET t cp862 RET"? That's what you > > want -- to send cp862 codes to the terminal, right? > > No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF > to appear in Hebrew on BOTH text terminals and X. Sorry, I don't understand: what do you mean by "Hebrew of any kind"? In Emacs 23 and later, there's only one kind of Hebrew: the Unicode kind. All the characters, including Hebrew, are internally represented as their Unicode codepoints. When Emacs visits a file encoded in cp862, it converts the encoded characters into their Unicode codepoints. What is delivered to the screen is either some encoding, like cp862 (in the case of a text terminal), or a glyph from some font (on GUI terminals). In both of these cases, Emacs translates the Unicode codepoints to either the corresponding cp862 etc. codes, or to the codes of the characters in the font used to display Hebrew. All that's needed for Emacs to DTRT is (a) that Emacs knows it is dealing with Hebrew characters, and (b) for text terminals only, that the terminal encoding is set up according to the encoding the terminal expects. Now, what am I missing to understand why you needed to use display tables? > In addition I want to use some of the graphic characters of the CP862 > set, again both in text terminal and in X. These graphic characters are part of Unicode as well (in the U+25XX block), and Emacs 23 knows how to encode them in cp862, or any other codepage that supports these characters. Try "C-x 8 RET 2525 RET" and see for yourself, it has a valid cp862 encoding. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-26 16:43 ` Eli Zaretskii @ 2010-08-27 13:35 ` Ehud Karni 2010-08-27 16:30 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-27 13:35 UTC (permalink / raw) To: eliz; +Cc: emacs-devel On Thu, 26 Aug 2010 19:43:48 Eli Zaretskii wrote: > > From: "Ehud Karni" <ehud@unix.mvs.co.il> > > > > No, I want Hebrew of any kind - DOS(CP862), UNIX (ISO-8862-8) and UTF > > to appear in Hebrew on BOTH text terminals and X. > > Sorry, I don't understand: what do you mean by "Hebrew of any kind"? > > In Emacs 23 and later, there's only one kind of Hebrew: the Unicode > kind. All the characters, including Hebrew, are internally > represented as their Unicode codepoints. When Emacs visits a file > encoded in cp862, it converts the encoded characters into their > Unicode codepoints. What is delivered to the screen is either some > encoding, like cp862 (in the case of a text terminal), or a glyph from > some font (on GUI terminals). In both of these cases, Emacs > translates the Unicode codepoints to either the corresponding cp862 > etc. codes, or to the codes of the characters in the font used to > display Hebrew. All that's needed for Emacs to DTRT is (a) that Emacs > knows it is dealing with Hebrew characters, and (b) for text terminals > only, that the terminal encoding is set up according to the encoding > the terminal expects. > > Now, what am I missing to understand why you needed to use display > tables? You missing the point that most of my files are not "word-processor" (or HTML/XML) files but are data file that are either read as ISO-8859-8 or no-conversion (binary) encoding. Now, some of them has DOS Hebrew (#x80-9A) and graphic characters in them, in ADDITION to UNIX Hebrew (#xE0-FA). I still want to see it as Hebrew characters (so I can read it) but with a distinction between the 2 Hebrew types, I want to know the 8-bit encoding, it matters. When I visit a file literally (i.e. no conversion) I still want to see the Hebrew (and DOS graphic) characters as Hebrew and graphics, not as an octal representation. So I have to use a display table, and I want it to work for both text terminals and X (or other windowed system - Mac, MS - which I myself don't use). > These graphic characters are part of Unicode as well (in the U+25XX > block), and Emacs 23 knows how to encode them in cp862, or any other > codepage that supports these characters. Try "C-x 8 RET 2525 RET" and > see for yourself, it has a valid cp862 encoding. What I want is just a subset of this in my display table, so bytes in the range #xB0-#xDF will be shown as is on text terminal and as the CP862 glyphs on X (I am willing to have different display tables for each case, I don't use text terminal and X on the same Emacs instance). I know how to do it when the locale environment is set to "en_GB". Can you instruct me how to do this when the locale environment is set to "he_IL" ? Just as curiosity, some times I get files where the Hebrew is encoded as the lower Latin letters and Aleph is represented by @ (this is known as old-code and it is still used by some companies, even though in is some other applications already use UTF-8 XML files). Do you have a way to display it as Hebrew without a display table ? Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-27 13:35 ` Ehud Karni @ 2010-08-27 16:30 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2010-08-27 16:30 UTC (permalink / raw) To: ehud; +Cc: emacs-devel > Date: Fri, 27 Aug 2010 16:35:40 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: emacs-devel@gnu.org > > You missing the point that most of my files are not "word-processor" > (or HTML/XML) files but are data file that are either read as ISO-8859-8 > or no-conversion (binary) encoding. > > Now, some of them has DOS Hebrew (#x80-9A) and graphic characters in > them, in ADDITION to UNIX Hebrew (#xE0-FA). I still want to see it as > Hebrew characters (so I can read it) but with a distinction between the > 2 Hebrew types, I want to know the 8-bit encoding, it matters. So you basically have files that mix different encodings of Hebrew characters, is that right? If so, I would suggest indeed to set up the display table, but not as you did it in older Emacsen. What you need is to map those 8-bit bytes to the Unicode codepoints of the corresponding Hebrew characters. That is, let the slot of eight-bit character #xA0, which is represented in Emacs as #x3FFFA0, be set in the display table to #x5d0 (the Unicode codepoint of Aleph). Then you will see Aleph when the file has #xA0, provided that you read the file with no-conversion. > So I have to use a display table, and I want it to work for both text > terminals and X (or other windowed system - Mac, MS - which I myself > don't use). If you set up the display table as I describe above, both X and text terminals will work. For text terminals, you will need to set terminal-coding-system to some Hebrew capable encoding that these terminals support. For GUI displays, you need a font to be installed that is capable of displaying Hebrew characters. > > These graphic characters are part of Unicode as well (in the U+25XX > > block), and Emacs 23 knows how to encode them in cp862, or any other > > codepage that supports these characters. Try "C-x 8 RET 2525 RET" and > > see for yourself, it has a valid cp862 encoding. > > What I want is just a subset of this in my display table, so bytes in > the range #xB0-#xDF will be shown as is on text terminal and as the > CP862 glyphs on X (I am willing to have different display tables for > each case, I don't use text terminal and X on the same Emacs instance). There should be no problem in using the same display table set up as above on all types of terminals. > I know how to do it when the locale environment is set to "en_GB". > Can you instruct me how to do this when the locale environment is set > to "he_IL" ? The locale environment shouldn't have any effect on that. All it does is set defaults for certain coding-systems. You will want to override those defaults anyway, e.g. for using no-conversion when visiting these files. I don't see anything else that might interfere, do you? > Just as curiosity, some times I get files where the Hebrew is encoded > as the lower Latin letters and Aleph is represented by @ (this is > known as old-code and it is still used by some companies, even though > in is some other applications already use UTF-8 XML files). > > Do you have a way to display it as Hebrew without a display table ? You could write your own coding-system, but I think display tables are easier. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa 2010-08-24 5:34 ` Stephen J. Turnbull @ 2010-08-27 10:24 ` Eli Zaretskii 2010-08-27 11:44 ` Kenichi Handa 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-27 10:24 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Mon, 23 Aug 2010 21:44:07 +0900 > > In msdos-initialize-window-system (of term/pc-win.el), I > found this code: > > ;; In multibyte mode, we want unibyte buffers to be displayed > ;; using the terminal coding system, so that they display > ;; correctly on the DOS terminal; in unibyte mode we want to see > ;; all 8-bit characters verbatim. In both cases, we want the > ;; entire range of 8-bit characters to arrive at our display code > ;; verbatim. > (standard-display-8bit 127 255) > > Is it really working in non-iso-8859-1 environment as > expected? Note that 128..255 are latin-1 characters after > Emacs 23, not raw-bytes. So, I think the above call will > make 8-bit bytes in unibyte buffer displayed as latin-1 > characters, but as the termial encoding system doesn't > support latin-1 chars in, for instance, greek environment, > just '?' will be displayed. It's quite possible that this doesn't work in Emacs 23 and later like it did in older versions. But to figure out what, if anything, is needed instead, I would like first to understand better what you are saying. It sounds like you are saying that standard-display-8bit no longer does what its doc string advertises: "Display characters in the range L to H literally." The "literally" part is no longer true, is it? And one other question: why do we do something similar in standard-display-european-internal? Specifically: (defun standard-display-european-internal () ;; Actually set up direct output of non-ASCII characters. (standard-display-8bit (if (eq window-system 'pc) 128 160) 255) (I'm asking about the case where window-system is _not_ `pc'.) This is called in set-display-table-and-terminal-coding-system under the following conditions: (let ((coding (get-language-info language-name 'unibyte-display))) (if (and coding (or (not coding-system) (coding-system-equal coding coding-system))) (standard-display-european-internal) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-27 10:24 ` Eli Zaretskii @ 2010-08-27 11:44 ` Kenichi Handa 2010-08-27 14:13 ` Eli Zaretskii 0 siblings, 1 reply; 36+ messages in thread From: Kenichi Handa @ 2010-08-27 11:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83aao8mjzx.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > It sounds like you are saying that standard-display-8bit no longer > does what its doc string advertises: > "Display characters in the range L to H literally." > The "literally" part is no longer true, is it? What's the meaning of "literally" when a display table element is [#xA0]? Before Emacs 23, the character #xA0 represents the byte 0xA0. But now it is a character representing a Unicode character U+00A0, and #x3FFFA0 is the character representing the byte 0xA0. And, to "display characters literally", we have been encoded characters by the terminal coding system. Before Emacs 23, the encoded result of #xA0 is always the byte 0xA0, but now it depends on the terminal coding system. > And one other question: why do we do something similar in > standard-display-european-internal? Specifically: > (defun standard-display-european-internal () > ;; Actually set up direct output of non-ASCII characters. > (standard-display-8bit (if (eq window-system 'pc) 128 160) 255) > (I'm asking about the case where window-system is _not_ `pc'.) > This is called in set-display-table-and-terminal-coding-system under > the following conditions: > (let ((coding (get-language-info language-name 'unibyte-display))) > (if (and coding > (or (not coding-system) > (coding-system-equal coding coding-system))) > (standard-display-european-internal) I don't know. I didn't modify those part when I merged unicode branch. I should have investigated the semantics of display table at that time. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-27 11:44 ` Kenichi Handa @ 2010-08-27 14:13 ` Eli Zaretskii 2010-08-28 4:18 ` Kenichi Handa 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-27 14:13 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 27 Aug 2010 20:44:16 +0900 > > > "Display characters in the range L to H literally." > > > The "literally" part is no longer true, is it? > > What's the meaning of "literally" when a display table > element is [#xA0]? It means that a literal byte 0xA0 is sent to the terminal. > Before Emacs 23, the character #xA0 represents the byte > 0xA0. But now it is a character representing a Unicode > character U+00A0, and #x3FFFA0 is the character representing > the byte 0xA0. > > And, to "display characters literally", we have been encoded > characters by the terminal coding system. Before Emacs 23, > the encoded result of #xA0 is always the byte 0xA0, but now > it depends on the terminal coding system. Which means, AFAIU, that "literally" is no longer possible. At least in the case of a multibyte buffer. What about a unibyte buffer, though? How do we display the characters there? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-27 14:13 ` Eli Zaretskii @ 2010-08-28 4:18 ` Kenichi Handa 2010-08-28 7:22 ` Eli Zaretskii 2010-08-29 10:16 ` Ehud Karni 0 siblings, 2 replies; 36+ messages in thread From: Kenichi Handa @ 2010-08-28 4:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > "Display characters in the range L to H literally." > > > > > The "literally" part is no longer true, is it? > > > > What's the meaning of "literally" when a display table > > element is [#xA0]? > It means that a literal byte 0xA0 is sent to the terminal. From which document, can we get that interpretation? The docstring of buffer-display-table says: Each element should be a vector of characters or nil. The value nil means display the character in the default fashion; otherwise, the characters from the vector are delivered to the screen instead of the original character. It only says about "character". Although it doesn't say how to deliver a character to a terminal, the natural way is to encode that character by the terminal coding system, or display that character by the corresponding glyph of a font. > > Before Emacs 23, the character #xA0 represents the byte > > 0xA0. But now it is a character representing a Unicode > > character U+00A0, and #x3FFFA0 is the character representing > > the byte 0xA0. > > > > And, to "display characters literally", we have been encoded > > characters by the terminal coding system. Before Emacs 23, > > the encoded result of #xA0 is always the byte 0xA0, but now > > it depends on the terminal coding system. > Which means, AFAIU, that "literally" is no longer possible. At least > in the case of a multibyte buffer. > What about a unibyte buffer, though? How do we display the characters > there? This is the way to display the character #xA0 (in multibyte-buffer) and the chararacter representing the byte #xA0 (in both multibyte-buffer and unibyte-buffer) by sending the byte #xA0 to the terminal . ;; For NBSP (U+00A0) (aset standard-display-table #xA0 (vector (unibyte-char-to-multibyte #xA0))) ;; For byte #xA0. (aset standard-display-table (unibyte-char-to-multibyte #xA0) (vector (unibyte-char-to-multibyte #xA0))) (set-terminal-coding-sytem 'no-conversion) (set-safe-terminal-coding-system-internal 'no-conversion) The last two lines are currently necessary because of a bug in term.c. I'm going to fix it. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-28 4:18 ` Kenichi Handa @ 2010-08-28 7:22 ` Eli Zaretskii 2010-08-30 2:24 ` Kenichi Handa 2010-08-29 10:16 ` Ehud Karni 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-28 7:22 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Sat, 28 Aug 2010 13:18:02 +0900 > > In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > > "Display characters in the range L to H literally." > > > > > > > The "literally" part is no longer true, is it? > > > > > > What's the meaning of "literally" when a display table > > > element is [#xA0]? > > > It means that a literal byte 0xA0 is sent to the terminal. > > From which document, can we get that interpretation? That's my understanding of the word "literally". Plus, standard-display-8bit worked like that in previous versions of Emacs. If we mean for it to do something else, we should amend the docstring. > (aset standard-display-table (unibyte-char-to-multibyte #xA0) > (vector (unibyte-char-to-multibyte #xA0))) Shouldn't standard-display-8bit be modified to use this, instead of what it does now? It seems like it was previously used to work around the terminal encoding, but that fire escape was plumbed in Emacs 23. Perhaps we should reinstate that feature? And there's still the question of what to do with the fragment in standard-display-european-internal that uses standard-display-8bit. Should it be removed, or should it be rewritten in some way? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-28 7:22 ` Eli Zaretskii @ 2010-08-30 2:24 ` Kenichi Handa 2010-08-30 3:02 ` Eli Zaretskii 2010-09-01 3:21 ` Kenichi Handa 0 siblings, 2 replies; 36+ messages in thread From: Kenichi Handa @ 2010-08-30 2:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83y6brkxqe.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > Cc: > emacs-devel@gnu.org > Date: Sat, 28 Aug 2010 13:18:02 > +0900 > > > > In article <837hjcm9cw.fsf@gnu.org>, Eli Zaretskii > <eliz@gnu.org> writes: > > > > > > > "Display characters in the range L to H > literally." > > > > > > > > > The "literally" part is no longer true, is it? > > > > > > > > What's the meaning of "literally" when a display > table > > > element is [#xA0]? > > > > > It means that a literal byte 0xA0 is sent to the > terminal. > > > > From which document, can we get that interpretation? > That's my understanding of the word "literally". But, how do you apply that understanding to this element: [#x100] > Plus, > standard-display-8bit worked like that in previous > versions of Emacs. If we mean for it to do something > else, we should amend the docstring. The current behaviour of standard-display-8bit is the natural consequence of the fact that we changed character codes. But, perhaps we should explain what "literally" really means. > > (aset standard-display-table (unibyte-char-to-multibyte > #xA0) > (vector (unibyte-char-to-multibyte #xA0))) > Shouldn't standard-display-8bit be modified to use this, instead of > what it does now? It seems like it was previously used to work around > the terminal encoding, but that fire escape was plumbed in Emacs 23. > Perhaps we should reinstate that feature? Yes. That's why I wrote: handa> Should we change the above code and all other codes setting handa> 0x80th..0xA0th elements of a display table? eliz> Yes. IMO, we should consistently use the codepoints of eight-bit eliz> characters in all char-tables. handa>Ok, if Yidong and Stefan agree too, I'll work on it. I have not yet got any response but have started the work. > And there's still the question of what to do with the > fragment in standard-display-european-internal that uses > standard-display-8bit. Should it be removed, or should it > be rewritten in some way? The docstring of standard-display-european says it's semi-obsolete. But, as far as we provide that function, we should modify the current code to do what expected. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-30 2:24 ` Kenichi Handa @ 2010-08-30 3:02 ` Eli Zaretskii 2010-09-01 3:21 ` Kenichi Handa 1 sibling, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2010-08-30 3:02 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Mon, 30 Aug 2010 11:24:13 +0900 > > > > > > > "Display characters in the range L to H literally." > > > > > > > > > > > The "literally" part is no longer true, is it? > > > > > > > > > > What's the meaning of "literally" when a display table > > > > > element is [#xA0]? > > > > > > > It means that a literal byte 0xA0 is sent to the terminal. > > > > > > From which document, can we get that interpretation? > > > That's my understanding of the word "literally". > > But, how do you apply that understanding to this element: > [#x100] We could document that standard-display-8bit works only for arguments less than 256. We could even code it that way. That would be backward compatible, since this function was written when Emacs supported only unibyte characters. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-30 2:24 ` Kenichi Handa 2010-08-30 3:02 ` Eli Zaretskii @ 2010-09-01 3:21 ` Kenichi Handa 2010-09-01 9:20 ` Ehud Karni 2010-09-01 23:33 ` Ehud Karni 1 sibling, 2 replies; 36+ messages in thread From: Kenichi Handa @ 2010-09-01 3:21 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, ehud, emacs-devel In article <tl7hbic3kj6.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: handa> Should we change the above code and all other codes setting handa> 0x80th..0xA0th elements of a display table? eliz> Yes. IMO, we should consistently use the codepoints of eight-bit eliz> characters in all char-tables. handa>Ok, if Yidong and Stefan agree too, I'll work on it. > I have not yet got any response but have started the work. I've just committed the work to emacs-23 branch. Ehud, could you try it? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-01 3:21 ` Kenichi Handa @ 2010-09-01 9:20 ` Ehud Karni 2010-09-01 23:33 ` Ehud Karni 1 sibling, 0 replies; 36+ messages in thread From: Ehud Karni @ 2010-09-01 9:20 UTC (permalink / raw) To: handa; +Cc: eliz, emacs-devel, handa On Wed, 01 Sep 2010 12:21:17 Kenichi Handa wrote: > > handa> Should we change the above code and all other codes setting > handa> 0x80th..0xA0th elements of a display table? > > eliz> Yes. IMO, we should consistently use the codepoints of eight-bit > eliz> characters in all char-tables. > > handa>Ok, if Yidong and Stefan agree too, I'll work on it. > > > I have not yet got any response but have started the work. > > I've just committed the work to emacs-23 branch. Ehud, > could you try it? It will take me some time, I'll report after I'll check it. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-01 3:21 ` Kenichi Handa 2010-09-01 9:20 ` Ehud Karni @ 2010-09-01 23:33 ` Ehud Karni 2010-09-02 5:19 ` Eli Zaretskii ` (2 more replies) 1 sibling, 3 replies; 36+ messages in thread From: Ehud Karni @ 2010-09-01 23:33 UTC (permalink / raw) To: handa; +Cc: eliz, emacs-devel, handa [-- Attachment #1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2: Type: text/plain, Size: 2931 bytes --] On Wed, 01 Sep 2010 12:21:17 Kenichi Handa wrote: > > handa> Should we change the above code and all other codes setting > handa> 0x80th..0xA0th elements of a display table? > > I've just committed the work to emacs-23 branch. Ehud, > could you try it? OK. I downloaded and compiled the emacs-23 branch. I only tested the characters displayed in different language environment on X and text terminal. There are 2 big problems - display on text terminal and use of the display table with `find-file-literally'. My testing shows that the display table works well on X (in the 3 language environment tested), but VERY poorly on text terminal. I change the language environment with `set-locale-environment'. Problem 1: On text terminal the language environment has great influence on the use of the display table - characters not it the language - are always displayed as ? . So in the "C" locale, all characters > 127 are displayed as ?. In the "he_IL" locale (= ISO-8859-8) characters in the range 191-223 and 251-255 are displayed as ?. In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0- #x5EA) are displayed as ?. I really must use the "he_IL" because most of the file my users view are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but I want to see all the characters (#xB0-#xDF) literally (i.e. when a byte in this range is displayed, its 8 bit value should be sent to the terminal. Problem 2: When I use `find-file-literally' to visit a file, the display table is mostly ignored, characters in #xA0-B2 are displayed in \OOO form, while #xB3-DF are displayed as empty boxes (on X), whatever locale I use. This is a change from the behavior of emacs-21. Note: This can be controlled by `set-buffer-multibyte t', but then the display is sometimes corrupted. I attach a tar.bz2 file containing the following files: 1. test-heb.el - 2 functions: `display-hebrew' sets the display table. `chars-list' - show characters #x20-#xFF. 2. motd - a file with many graphic characters. 3. 23-X-disp.png - display of #x20-#xFF on X (good, not dependent on the locale) 4. 23-tty-C.png - chars #x20-#xFF on text terminal with locale "C". 5. 23-tty-he.png - chars #x20-#xFF on text terminal with locale "he_IL". 6. 23-tty-en.png - chars #x20-#xFF on text terminal with locale "en_GB". 7. 21-motd-lit.png - `find-file-literally' of the motd on text terminal in emacs 21.4 (good). 8. 23-X-motd.png - the motd on X - upper window: literally, lower window: regular find-file. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry [-- Attachment #3: test-heb.tar.bz2 --] [-- Type: application/X-bzip2-compressed, Size: 150165 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-01 23:33 ` Ehud Karni @ 2010-09-02 5:19 ` Eli Zaretskii 2010-09-02 5:20 ` Kenichi Handa 2010-09-02 12:32 ` Kenichi Handa 2 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2010-09-02 5:19 UTC (permalink / raw) To: ehud; +Cc: emacs-devel, handa > Date: Thu, 2 Sep 2010 02:33:53 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: handa@m17n.org, eliz@gnu.org, emacs-devel@gnu.org > Reply-to: ehud@unix.mvs.co.il > > Problem 1: > On text terminal the language environment has great influence on the > use of the display table - characters not it the language - are > always displayed as ? . So in the "C" locale, all characters > 127 > are displayed as ?. > In the "he_IL" locale (= ISO-8859-8) characters in the range > 191-223 and 251-255 are displayed as ?. > In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0- > #x5EA) are displayed as ?. The locale affects the value of terminal-coding-system. Other than that, it shouldn't affect the issues that are important to you in the context of what we discuss here. You can control the value of terminal-coding-system with "C-x RET t", if what set-locale-environment does is not good enough. In particular, I would try using cp862. > I really must use the "he_IL" because most of the file my users view > are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but > I want to see all the characters (#xB0-#xDF) literally (i.e. when a > byte in this range is displayed, its 8 bit value should be sent to > the terminal. I thought the latest changes by Handa-san were supposed to do this. Eight-bit-characters sent to the terminal should always produce the corresponding 8-bit byte values, no matter which terminal-coding-system is used. It sounds like you say that didn't work? > Problem 2: > When I use `find-file-literally' to visit a file, the display table is > mostly ignored, characters in #xA0-B2 are displayed in \OOO form, while > #xB3-DF are displayed as empty boxes (on X), whatever locale I use. > This is a change from the behavior of emacs-21. > Note: This can be controlled by `set-buffer-multibyte t', but then the > display is sometimes corrupted. Sounds like display of unibyte characters doesn't work according to the display table? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-01 23:33 ` Ehud Karni 2010-09-02 5:19 ` Eli Zaretskii @ 2010-09-02 5:20 ` Kenichi Handa 2010-09-04 22:54 ` Ehud Karni 2010-09-02 12:32 ` Kenichi Handa 2 siblings, 1 reply; 36+ messages in thread From: Kenichi Handa @ 2010-09-02 5:20 UTC (permalink / raw) To: ehud; +Cc: eliz, emacs-devel In article <201009012333.o81NXrRq016732@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes: As for Problem 1, I'll reply later. > Problem 2: > When I use `find-file-literally' to visit a file, My change was to make (standard-display-8bit 128 255) work as Emacs 21 for a unibyte buffer; i.e. when you visit a file by specifying no-conversion coding-system or by using find-file-literally. > I attach a tar.bz2 file containing the following files: > 1. test-heb.el - 2 functions: `display-hebrew' sets the display table. > `chars-list' - show characters #x20-#xFF. Please try the attached version of chars-list without any other display-table setting. Does it work? --- Kenichi Handa handa@m17n.org ;; -*- mode: emacs-lisp; coding: hebrew-iso-8bit-unix -*- (defun chars-list () "display all characters in range 0x20-0xFF" (interactive) (let ((svbuf (get-buffer-create "*Help*")) (ch 32)) (with-current-buffer svbuf (erase-buffer) ;; Make this a unibyte buffer. (set-buffer-multibyte nil) ;; Make all 8-bit bytes (0x80..0xFF) displayed literally. (standard-display-8bit 128 255) (insert " List of all displayable characters:\n\n") (while (< ch 88) (let ((c ch)) (while (< c 256) (insert (format " [%c]=%3dD,%3oO,%2xX" c c c c)) (setq c (+ c 56)) (if (< c 256) (insert " ")))) (insert "\n") (setq ch (1+ ch))) (goto-char (point-min))) (pop-to-buffer svbuf))) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-02 5:20 ` Kenichi Handa @ 2010-09-04 22:54 ` Ehud Karni 2010-09-06 1:30 ` Kenichi Handa 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-09-04 22:54 UTC (permalink / raw) To: handa; +Cc: eliz, emacs-devel On Thu, 02 Sep 2010 14:20:42 Kenichi Handa wrote: > > My change was to make (standard-display-8bit 128 255) work > as Emacs 21 for a unibyte buffer; i.e. when you visit a file > by specifying no-conversion coding-system or by using > find-file-literally. OK, I misunderstood you before. For text terminal all the 8 bit bytes are sent as is. ON x, most of the bytes are displayed as empty boxes (i.e. no glyph). I don't know how to set the display table to get something more meaningful. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-04 22:54 ` Ehud Karni @ 2010-09-06 1:30 ` Kenichi Handa 0 siblings, 0 replies; 36+ messages in thread From: Kenichi Handa @ 2010-09-06 1:30 UTC (permalink / raw) To: ehud; +Cc: eliz, emacs-devel In article <201009042254.o84MsEcf004615@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes: > > My change was to make (standard-display-8bit 128 255) work > > as Emacs 21 for a unibyte buffer; i.e. when you visit a file > > by specifying no-conversion coding-system or by using > > find-file-literally. > OK, I misunderstood you before. For text terminal all the 8 bit bytes > are sent as is. ON x, most of the bytes are displayed as empty boxes > (i.e. no glyph). I don't know how to set the display table to get > something more meaningful. This is the lasted docstring of standard-display-8bit: ============================================================ standard-display-8bit is a compiled Lisp function in `disp-table.el'. (standard-display-8bit L H) Display characters representing raw bytes in the range L to H literally. On a terminal display, each character in the range is displayed by sending the corresponding byte directly to the terminal. On a graphic display, each character in the range is displayed using the default font by a glyph whose code is the corresponding byte. Note that ASCII printable characters (SPC to TILDA) are displayed in the default way after this call. ============================================================ So, on X (or on any graphic display), whether it works as expected or not depends on which font you use as the default font. Most TrueType fonts are not good in this repect. X core fonts of legacy charset (e.g. -*-iso8859-8) are good. Please tell me which font is the default for you (by moving cursor on some Latin alphabet and typing C-u C-x =). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-01 23:33 ` Ehud Karni 2010-09-02 5:19 ` Eli Zaretskii 2010-09-02 5:20 ` Kenichi Handa @ 2010-09-02 12:32 ` Kenichi Handa 2010-09-04 23:32 ` Ehud Karni 2 siblings, 1 reply; 36+ messages in thread From: Kenichi Handa @ 2010-09-02 12:32 UTC (permalink / raw) To: ehud; +Cc: eliz, emacs-devel In article <201009012333.o81NXrRq016732@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes: > Problem 1: > On text terminal the language environment has great influence on the > use of the display table - characters not it the language - are > always displayed as ? . So in the "C" locale, all characters > 127 > are displayed as ?. > In the "he_IL" locale (= ISO-8859-8) characters in the range > 191-223 and 251-255 are displayed as ?. > In the "en_GB" locale (= ISO-8859-1) the Hebrew characters (#x5D0- > #x5EA) are displayed as ?. > I really must use the "he_IL" because most of the file my users view > are in ISO-8859-8 and a small part have MSDOS Hebrew (#x80-#x9A), but > I want to see all the characters (#xB0-#xDF) literally (i.e. when a > byte in this range is displayed, its 8 bit value should be sent to > the terminal. I've thought that you are reading files by find-file-literally (thus buffers are unibyte) because you wrote below at first: > I had this code in Emacs 21.3: > > (defun set-standard-display-table () > (setq standard-display-table (make-display-table)) > (standard-display-8bit 127 254)) > > I then set the DOS Hebrew chars (128-144) each to a vector: > [ 169 <the corresponding UNIX Hebrew char> ] > > Then visit a file (literally). But, as you wrote "Problem 2: When I use `find-file-literally' ...", the "Problem 1" is the case that you don't use `find-file-literally', and a file is read into a multibyte buffer decoded by some coding-system, right? Then, in he_IL locale, by which coding-system your file is decoded? C-h C RET shows that coding-system near the top under the line "Coding system for saving this buffer:". And I don't understand this part. > I then set the DOS Hebrew chars (128-144) each to a vector: > [ 169 <the corresponding UNIX Hebrew char> ] 169 is not a "UNIX Hebrew char", i.e. not a Unicode character code of a Hebrew char, nor a code-point of a Hebrew character in iso-8859-8 character set. In your mails, you mixup: (1) a code-point in a specific character set, (2) a character code in Emacs (that is Unicode character code), (3) a byte represented by Emacs' 8-bit characters, and that makes it difficult to understand what exactly you are saying. Please write: For (1), "a character of code #xXX in XXX charset". For (2), just "U+XXXX". For (3), just "byte #xXX". And, you wrote "a small part have MSDOS Hebrew (#x80-#x9A)", but #x9a is 154, not 144. Is "144" above just a typo? Perhaps, the following is the best way to understand what you want: (1) You at first make sample files and give me them. (2) Tell me how you want read that file exactly. Just C-x C-f FILENAME RET, or M-x find-file-literally ...., or C-x C-m c no-convesion RET C-x C-f FILENAME RET, or ... (3) Show me how it should be displayed on a terminal by an image. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-02 12:32 ` Kenichi Handa @ 2010-09-04 23:32 ` Ehud Karni 2010-09-05 5:30 ` Eli Zaretskii 2010-09-06 5:14 ` Kenichi Handa 0 siblings, 2 replies; 36+ messages in thread From: Ehud Karni @ 2010-09-04 23:32 UTC (permalink / raw) To: handa; +Cc: eliz, emacs-devel [-- Attachment #1: Type: text/plain, Size: 0 bytes --] [-- Attachment #2: Type: text/plain, Size: 2800 bytes --] On Thu, 02 Sep 2010 21:32:21 Kenichi Handa wrote: > > Then, in he_IL locale, by which coding-system your file is > decoded? C-h C RET shows that coding-system near the top > under the line "Coding system for saving this buffer:". The coding-system-for-read is hebrew-iso-8bit. > And I don't understand this part. > > > I then set the DOS Hebrew chars (128-144) each to a vector: > > [ 169 <the corresponding UNIX Hebrew char> ] > > 169 is not a "UNIX Hebrew char", i.e. not a Unicode > character code of a Hebrew char, nor a code-point of a > Hebrew character in iso-8859-8 character set. Yes, that's my problem, I have Hebrew in #xE0-#xFA (iso-8859-8) but I have other 8 bit bytes (most of them are graphic shapes from the cp862 set). > And, you wrote "a small part have MSDOS Hebrew (#x80-#x9A)", > but #x9a is 154, not 144. Is "144" above just a typo? Just a typo, it should be 154. All my data files are 8bit bytes, so for me it is always, character = byte (at least externally). > Perhaps, the following is the best way to understand what > you want: > > (1) You at first make sample files and give me them. > (2) Tell me how you want read that file exactly. > Just C-x C-f FILENAME RET, or M-x find-file-literally ...., > or C-x C-m c no-convesion RET C-x C-f FILENAME RET, > or ... > (3) Show me how it should be displayed on a terminal by an > image. I attach a tar.bz2 file with 3 files: 1. lit1 - the sample file. 2. lit1-tty.png - how it should show on text terminal. 3. lit1-x.png - how it should show on X. I can do it if I read the file with the iso-latin-1 coding-system and change the display table to show the Hebrew glyphs for the Hebrew [#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g. for the new bidi display). I want it the other way around, to read it with hebrew-iso-8bit and to to tweak the display table to show all the bytes not belonging to the Hebrew set. I had similar problem a long time ago. In 2001 you suggested to use the following code: (make-coding-system 'hebrew-iso-8bit 2 ?8 "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)" '(ascii hebrew-iso8859-8 nil nil nil ascii-eol ascii-cntl nil nil nil nil nil t) '((safe-charsets ascii hebrew-iso8859-8 eight-bit-control) (mime-charset . iso-8859-8))) May be I can define a new coding system that will have bytes #x80-#xFF as legal characters and be recognized as Hebrew variant. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry [-- Attachment #3: lit1.tar.bz2 --] [-- Type: application/X-bzip2-compressed, Size: 10044 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-04 23:32 ` Ehud Karni @ 2010-09-05 5:30 ` Eli Zaretskii 2010-09-06 5:14 ` Kenichi Handa 1 sibling, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2010-09-05 5:30 UTC (permalink / raw) To: ehud; +Cc: emacs-devel, handa > Date: Sun, 5 Sep 2010 02:32:43 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: eliz@gnu.org, emacs-devel@gnu.org > Reply-to: ehud@unix.mvs.co.il > > 1. lit1 - the sample file. > 2. lit1-tty.png - how it should show on text terminal. > 3. lit1-x.png - how it should show on X. > > I can do it if I read the file with the iso-latin-1 coding-system > and change the display table to show the Hebrew glyphs for the Hebrew > [#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g. > for the new bidi display). I want it the other way around, to read it > with hebrew-iso-8bit and to to tweak the display table to show all > the bytes not belonging to the Hebrew set. The file includes Hebrew characters encoded in both hebrew-iso-8bit and cp862, as well as line-drawing characters from cp862. Barring bugs in the display table handling, you should be able eventually to set up standard-display-table to display all the Hebrew characters as you'd expect to see them, and display the line-drawing characters correctly as well. (Judging by the sample file, I'd suggest to use cp862 rather than hebrew-iso-8bit, because much more characters are from cp862. However, you say elsewhere that most of the characters in your files are hebrew-iso-8bit, so maybe the sample file is not representative enough.) But if you want all the Hebrew characters to be treated by Emacs as such (e.g., for bidi display), no matter what's their encoding in the file, you will have to define a coding-system that will decode them all into Unicode codepoints of Hebrew characters. There's a problem you will need to solve for defining such a coding system: it has 2 different encodings for the same character, one from hebrew-iso-8bit, the other from cp862. So you will need to decide how will Hebrew characters be encoded when the file is saved. Alternatively, we could expose in Lisp the char-table used by the bidi reordering engine for reordering characters, where you could change the bidi class of the non-Hebrew characters that are displayed as Hebrew. Until now, there was no plausible use-case for changing that table (and frankly, I'd prefer not to go there, as futzing with that table could potentially cause trouble). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-04 23:32 ` Ehud Karni 2010-09-05 5:30 ` Eli Zaretskii @ 2010-09-06 5:14 ` Kenichi Handa 1 sibling, 0 replies; 36+ messages in thread From: Kenichi Handa @ 2010-09-06 5:14 UTC (permalink / raw) To: ehud; +Cc: eliz, emacs-devel In article <201009042332.o84NWhSA017839@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes: > I attach a tar.bz2 file with 3 files: > 1. lit1 - the sample file. > 2. lit1-tty.png - how it should show on text terminal. > 3. lit1-x.png - how it should show on X. > I can do it if I read the file with the iso-latin-1 coding-system > and change the display table to show the Hebrew glyphs for the Hebrew > [#xE0-#xFA] bytes. But in this way it is not Hebrew characters (e.g. > for the new bidi display). I want it the other way around, to read it > with hebrew-iso-8bit and to to tweak the display table to show all > the bytes not belonging to the Hebrew set. Does it mean that you want bidi-reordering for the bytes #xE0..#xFA (code-points of iso-8859-8) but bidi-reordering is not necessary for the bytes #x80..#x8A (code-points of cp862)? But, your file "lit1" contains #xE0..#xFA (code-points of iso-8859-8) at the second to 4th lines in visual order. If bidi-reordering is applied on them, you'll get the different view than lit1-tty.png and lit1-x.png. Is that ok? > I had similar problem a long time ago. In 2001 you suggested to use > the following code: > (make-coding-system > 'hebrew-iso-8bit 2 ?8 > "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)" > '(ascii hebrew-iso8859-8 nil nil > nil ascii-eol ascii-cntl nil nil nil nil nil t) > '((safe-charsets ascii hebrew-iso8859-8 eight-bit-control) > (mime-charset . iso-8859-8))) > May be I can define a new coding system that will have bytes #x80-#xFF > as legal characters and be recognized as Hebrew variant. This code will that. I think it's not difficult to understand what the code is doing. ------------------------------------------------------------ (define-charset 'cp862-sub "Subset of CP862" :code-space [#x80 #xDF] :subset '(cp862 #x80 #xDF #x00)) (define-charset 'iso-8859-8-sub "Subset of ISO-8859-8" :code-space [#xE0 #xFA] :subset '(iso-8859-8 #xE0 #xFA #x00)) (define-coding-system 'mix-hebrew "Mixture of ISO-8859-8 and CP862" :mnemonic ?H :coding-type 'charset :charset-list '(ascii iso-8859-8-sub cp862-sub) :ascii-compatible-p t) ------------------------------------------------------------ Please try C-x C-m c mix-hebrew RET lit1 RET. But, if you do that, you must consider the problem Eli wrote: In article <E1Os7oU-0006m6-7X@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > But if you want all the Hebrew characters to be treated by Emacs as > such (e.g., for bidi display), no matter what's their encoding in the > file, you will have to define a coding-system that will decode them > all into Unicode codepoints of Hebrew characters. There's a problem > you will need to solve for defining such a coding system: it has 2 > different encodings for the same character, one from hebrew-iso-8bit, > the other from cp862. So you will need to decide how will Hebrew > characters be encoded when the file is saved. In the above definition of mix-hebrew, as iso-8859-8-sub is listed before cp862-sub, all Hebrew characters are encoded into bytes #xE0..#xFA even if they were originally decoded from bytes #x80..#x9A. If you don't like it, you must give up decoding bytes #x80..#x9A into Hebrew chars. You decode them as raw-bytes, and setup a display table to display them as Hebrew chars. It can be done by this code: ------------------------------------------------------------ (define-charset 'cp862-sub "Subset of CP862" :code-space [#x9B #xDF] :subset '(cp862 #x9B #xDF #x00)) (define-charset 'iso-8859-8-sub "Subset of ISO-8859-8" :code-space [#xE0 #xFA] :subset '(iso-8859-8 #xE0 #xFA #x00)) (define-coding-system 'mix-hebrew "Mixture of ISO-8859-8, CP862, and raw 8-bit bytes" :mnemonic ?H :coding-type 'charset :charset-list '(ascii iso-8859-8-sub cp862-sub eight-bit) :ascii-compatible-p t) (require 'disp-table) ;; Display bytes #x80..#x9A as Hebrew chars (code-points #xE0..#xFA of ;; ISO-8859-8). (dotimes (i #x1B) (aset standard-display-table (unibyte-char-to-multibyte (+ #x80 i)) (vector (decode-char 'iso-8859-8 (+ #xE0 i))))) ------------------------------------------------------------ This display-table setting works also on terminal as far as you set terminal coding system to mix-hebrew. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-28 4:18 ` Kenichi Handa 2010-08-28 7:22 ` Eli Zaretskii @ 2010-08-29 10:16 ` Ehud Karni 2010-08-29 11:21 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-29 10:16 UTC (permalink / raw) To: handa; +Cc: eliz, emacs-devel On Sat, 28 Aug 2010 13:18:02 Kenichi Handa wrote: > > ;; For NBSP (U+00A0) > (aset standard-display-table #xA0 > (vector (unibyte-char-to-multibyte #xA0))) This does not work because `unibyte-char-to-multibyte' does not give the right result in Emacs-23.1 (it works well on Emacs-21.3). Sorry, I did not check on latest Emacs I used the following to check it: (defun check-multibyte-code (byte) (message "Byte: %02X (%d), Char: %04X (%d)" byte byte (unibyte-char-to-multibyte byte) (unibyte-char-to-multibyte byte))) (check-multibyte-code #xE0) The result in 21.3 is correct: Byte: E0 (224), Char: 0C60 (3168) But on 23.1 I get: Byte: E0 (224), Char: 3FFFE0 (4194272) i.e. "literal" #xE0. The output of describe-current-coding-system (Emacs-21.3, the output of Emacs-23.1 is almost the same) is: Coding system for saving this buffer: Not set locally, use the default. Default coding system (for new files): 8 -- hebrew-iso-8bit-unix Coding system for keyboard input: nil Coding system for terminal output: 8 -- hebrew-iso-8bit Defaults for subprocess I/O: decoding: 8 -- hebrew-iso-8bit-unix encoding: 8 -- hebrew-iso-8bit-unix Priority order for recognizing coding systems when reading files: 1. hebrew-iso-8bit 2. iso-latin-1 (alias: iso-8859-1 latin-1) 3. iso-2022-jp (alias: junet) 4. iso-2022-7bit 5. iso-2022-7bit-lock (alias: iso-2022-int-1) 6. iso-2022-8bit-ss2 7. emacs-mule 8. raw-text 9. japanese-shift-jis (alias: shift_jis sjis) 10. chinese-big5 (alias: big5 cn-big5) 11. no-conversion (alias: binary) 12. mule-utf-8 (alias: utf-8) Other coding systems cannot be distinguished automatically from these, and therefore cannot be recognized automatically with the present coding system priorities. The followings are decoded correctly but recognized as iso-2022-7bit-lock: iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext iso-2022-jp-2 iso-2022-kr Particular coding systems specified for certain file names: OPERATION TARGET PATTERN CODING SYSTEM(s) --------- -------------- ---------------- File I/O "\\.\\(reg\\|REG\\)$" (raw-text-dos . raw-text-dos) "\\.t\\(bz2?\\)\\|\\([bz]2\\)\\'" (no-conversion . no-conversion) "\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'" (no-conversion . no-conversion) "\\.gz\\(~\\|\\.~[0-9]+~\\)?\\'" (no-conversion . no-conversion) "\\.tgz\\'" (no-conversion . no-conversion) "\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'" (no-conversion . no-conversion) "\\.elc\\'" (emacs-mule . emacs-mule) "\\(\\`\\|/\\)loaddefs.el\\'" (raw-text . raw-text-unix) "\\.tar\\'" (no-conversion . no-conversion) "" (hebrew-iso-8bit) Process I/O nothing specified Network I/O nothing specified Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 10:16 ` Ehud Karni @ 2010-08-29 11:21 ` Eli Zaretskii 2010-08-29 11:49 ` Ehud Karni 0 siblings, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-29 11:21 UTC (permalink / raw) To: ehud; +Cc: emacs-devel, handa > Date: Sun, 29 Aug 2010 13:16:02 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: eliz@gnu.org, emacs-devel@gnu.org > Reply-to: ehud@unix.mvs.co.il > > > ;; For NBSP (U+00A0) > > (aset standard-display-table #xA0 > > (vector (unibyte-char-to-multibyte #xA0))) > > This does not work because `unibyte-char-to-multibyte' does not give > the right result in Emacs-23.1 (it works well on Emacs-21.3). > Sorry, I did not check on latest Emacs > > I used the following to check it: > > (defun check-multibyte-code (byte) > (message "Byte: %02X (%d), Char: %04X (%d)" > byte byte > (unibyte-char-to-multibyte byte) > (unibyte-char-to-multibyte byte))) > > (check-multibyte-code #xE0) > > > The result in 21.3 is correct: > Byte: E0 (224), Char: 0C60 (3168) > > But on 23.1 I get: > Byte: E0 (224), Char: 3FFFE0 (4194272) > i.e. "literal" #xE0. The last result is correct 0x3FFFE0 is the internal representation of 0xE0 in Emacs 23. Emacs 23 and later extends the Unicode code space with these characters (and some others). Why did you think it was incorrect? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 11:21 ` Eli Zaretskii @ 2010-08-29 11:49 ` Ehud Karni 2010-08-29 13:06 ` Ehud Karni 2010-08-29 14:04 ` Eli Zaretskii 0 siblings, 2 replies; 36+ messages in thread From: Ehud Karni @ 2010-08-29 11:49 UTC (permalink / raw) To: eliz; +Cc: handa, emacs-devel On Sun, 29 Aug 2010 07:21:26 Eli Zaretskii wrote: > > > From: "Ehud Karni" <ehud@unix.mvs.co.il> > > > > > ;; For NBSP (U+00A0) > > > (aset standard-display-table #xA0 > > > (vector (unibyte-char-to-multibyte #xA0))) > > > > This does not work because `unibyte-char-to-multibyte' does not give > > the right result in Emacs-23.1 (it works well on Emacs-21.3). > > > > I used the following to check it: > > > > (defun check-multibyte-code (byte) > > (message "Byte: %02X (%d), Char: %04X (%d)" > > byte byte > > (unibyte-char-to-multibyte byte) > > (unibyte-char-to-multibyte byte))) > > > > (check-multibyte-code #xE0) > > > > > > The result in 21.3 is correct: > > Byte: E0 (224), Char: 0C60 (3168) > > > > But on 23.1 I get: > > Byte: E0 (224), Char: 3FFFE0 (4194272) > > i.e. "literal" #xE0. > > The last result is correct 0x3FFFE0 is the internal representation of > 0xE0 in Emacs 23. Emacs 23 and later extends the Unicode code space > with these characters (and some others). > > Why did you think it was incorrect? Because of my coding system (iso-8859, remember ?) the #xE0 should be displayed as Aleph, not some 8 bit byte E0. Instead of trying to understand my problem, you are telling me why Emacs behaves in this way (which is of no use for Handa san suggestion: a way to set the display table). If you think this is the way it should be, give the reason, not the technical details. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 11:49 ` Ehud Karni @ 2010-08-29 13:06 ` Ehud Karni 2010-08-29 13:50 ` Eli Zaretskii 2010-08-29 14:04 ` Eli Zaretskii 1 sibling, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-08-29 13:06 UTC (permalink / raw) To: eliz; +Cc: emacs-devel, handa On Sun, 29 Aug 2010 14:49:03 Ehud Karni wrote: > > On Sun, 29 Aug 2010 07:21:26 Eli Zaretskii wrote: > > > > > From: "Ehud Karni" <ehud@unix.mvs.co.il> > > > > > > > ;; For NBSP (U+00A0) > > > > (aset standard-display-table #xA0 > > > > (vector (unibyte-char-to-multibyte #xA0))) > > > > > > The result in 21.3 is correct: > > > Byte: E0 (224), Char: 0C60 (3168) > > > > > > But on 23.1 I get: > > > Byte: E0 (224), Char: 3FFFE0 (4194272) > > > i.e. "literal" #xE0. > > > > The last result is correct 0x3FFFE0 is the internal representation of > > 0xE0 in Emacs 23. Emacs 23 and later extends the Unicode code space > > with these characters (and some others). > > > > Why did you think it was incorrect? > > Because of my coding system (iso-8859, remember ?) the #xE0 should be > displayed as Aleph, not some 8 bit byte E0. From another thread, I found Handa san suggestion to use `decode-char'. So my my check function looks now like this: (defun check-multibyte-code (byte) (message "Byte: %02X (%d), M-Char: %04X (%d), D-Char: %04X (%d)" byte byte (unibyte-char-to-multibyte byte) (unibyte-char-to-multibyte byte) (decode-char 'iso-8859-8 byte) (decode-char 'iso-8859-8 byte))) It fails for Emacs-21.3 because `decode-char' returns nil. For Emacs-23.1 the result is: Byte: E0 (224), M-Char: 3FFFE0 (4194272), D-Char 05D0 (1488) So I can use `decode-char' in 23.1 and `unibyte-char-to-multibyte' in 21.3 for building a display table. Can you give the reasons to the changes in these functions ? I think it is a bad practice to keep the function names while changing how they work. It breaks tested code. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 13:06 ` Ehud Karni @ 2010-08-29 13:50 ` Eli Zaretskii 0 siblings, 0 replies; 36+ messages in thread From: Eli Zaretskii @ 2010-08-29 13:50 UTC (permalink / raw) To: ehud; +Cc: emacs-devel, handa > Date: Sun, 29 Aug 2010 16:06:42 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: handa@m17n.org, emacs-devel@gnu.org > Reply-to: ehud@unix.mvs.co.il > > From another thread, I found Handa san suggestion to use `decode-char'. > > So my my check function looks now like this: > > (defun check-multibyte-code (byte) > (message "Byte: %02X (%d), M-Char: %04X (%d), D-Char: %04X (%d)" > byte byte > (unibyte-char-to-multibyte byte) > (unibyte-char-to-multibyte byte) > (decode-char 'iso-8859-8 byte) > (decode-char 'iso-8859-8 byte))) > > It fails for Emacs-21.3 because `decode-char' returns nil. IIRC, Emacs 21 supported only 'ucs as the 2nd arg of decode-char. > Can you give the reasons to the changes in these functions ? Two: (1) switch to Unicode-based internal representation, and as result (2) changes in handling of raw eight-bit bytes. > I think it is a bad practice to keep the function names while > changing how they work. It breaks tested code. I agree, but I think in this case there was no choice, unfortunately. Anyway, I think the key to solving your problem is elsewhere. I will try to explain in a separate mail. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 11:49 ` Ehud Karni 2010-08-29 13:06 ` Ehud Karni @ 2010-08-29 14:04 ` Eli Zaretskii 2010-09-07 21:11 ` Ehud Karni 1 sibling, 1 reply; 36+ messages in thread From: Eli Zaretskii @ 2010-08-29 14:04 UTC (permalink / raw) To: ehud; +Cc: handa, emacs-devel > Date: Sun, 29 Aug 2010 14:49:03 +0300 > From: "Ehud Karni" <ehud@unix.mvs.co.il> > Cc: emacs-devel@gnu.org, handa@m17n.org > Reply-to: ehud@unix.mvs.co.il > > Instead of trying to understand my problem, you are telling me why > Emacs behaves in this way Sorry, this wasn't the intent. I simply didn't see the connection between your original problem and the code you presented, so I responded to your "this doesn't work in Emacs 23.1". Let's back up a little: please tell what is the value of buffer-file-coding-system after you visit the offending files? > > > > ;; For NBSP (U+00A0) > > > > (aset standard-display-table #xA0 > > > > (vector (unibyte-char-to-multibyte #xA0))) > > > > > > This does not work because `unibyte-char-to-multibyte' does not give > > > the right result in Emacs-23.1 (it works well on Emacs-21.3). Note that Handa-san recommended to set more than just one slot in standard-display-table in Emacs 23 to solve similar problems: ;; For NBSP (U+00A0) (aset standard-display-table #xA0 (vector (unibyte-char-to-multibyte #xA0))) ;; For byte #xA0. (aset standard-display-table (unibyte-char-to-multibyte #xA0) (vector (unibyte-char-to-multibyte #xA0))) (set-terminal-coding-sytem 'no-conversion) (set-safe-terminal-coding-system-internal 'no-conversion) Did you set both slots of standard-display-table as shown above? > Because of my coding system (iso-8859, remember ?) the #xE0 should be > displayed as Aleph, not some 8 bit byte E0. What encoding does the text terminal expect for Hebrew characters? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-08-29 14:04 ` Eli Zaretskii @ 2010-09-07 21:11 ` Ehud Karni 2010-09-09 11:57 ` Kenichi Handa 0 siblings, 1 reply; 36+ messages in thread From: Ehud Karni @ 2010-09-07 21:11 UTC (permalink / raw) To: eliz; +Cc: emacs-devel, handa [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=iso-8859-8-i, Size: 6577 bytes --] [ long post ] On Sun, 29 Aug 2010 10:04:16 Eli Zaretskii wrote: > > Note that Handa-san recommended to set more than just one slot in > standard-display-table in Emacs 23 to solve similar problems: I have not solved it yet fully, but I think now it is only minor details, after I defined a new coding system (see below). On Mon, 06 Sep 2010 14:14:01 Kenichi Handa wrote: > > Does it mean that you want bidi-reordering for the bytes > #xE0..#xFA (code-points of iso-8859-8) but bidi-reordering > is not necessary for the bytes #x80..#x8A (code-points of > cp862)? No, I want bidi ordering (or not) for both iso-8859-8 and CP862 at the SAME time. > But, your file "lit1" contains #xE0..#xFA (code-points of > iso-8859-8) at the second to 4th lines in visual order. If > bidi-reordering is applied on them, you'll get the different > view than lit1-tty.png and lit1-x.png. Is that ok? This is just an example. Files used directly with cat (like /etc/motd) must be in visual order. Other files used with GUI must have logical order. The data files we edit each day are of both types. EK> May be I can define a new coding system that will have bytes #x80-#xFF EK> as legal characters and be recognized as Hebrew variant. > > This code will that. I think it's not difficult to > understand what the code is doing. [snip] > But, if you do that, you must consider the problem Eli wrote: > EZ> But if you want all the Hebrew characters to be treated by Emacs as EZ> such (e.g., for bidi display), no matter what's their encoding in the EZ> file, you will have to define a coding-system that will decode them EZ> all into Unicode codepoints of Hebrew characters. There's a problem EZ> you will need to solve for defining such a coding system: it has 2 EZ> different encodings for the same character, one from hebrew-iso-8bit, EZ> the other from cp862. So you will need to decide how will Hebrew EZ> characters be encoded when the file is saved. > > In the above definition of mix-hebrew, as iso-8859-8-sub is > listed before cp862-sub, all Hebrew characters are encoded > into bytes #xE0..#xFA even if they were originally decoded > from bytes #x80..#x9A. > > If you don't like it, you must give up decoding bytes > #x80..#x9A into Hebrew chars. You decode them as raw-bytes, > and setup a display table to display them as Hebrew chars. > It can be done by this code: I think I solved this by using text properties. It is still unfinished, but it works, and I'll appreciate any comments. There are some problems, see at the end. Here is what I did (based on your advice). (define-charset 'hebrew-MSDOS-binary "Hebrew subset of CP862 (#x80-#x9A) with no-conversion" :code-space [#x80 #x9A] :map (let ((map (make-vector 54 0)) (ix 27)) (while (> ix 0) (setq ix (1- ix)) (aset map (+ ix ix) (+ #x80 ix)) (aset map (+ ix ix 1) (+ #x80 ix))) map) :supplementary-p t) (define-charset 'graphic-MSDOS-subset "Graphic subset of CP862" :code-space [#x9B #xDF] :subset '(cp862 #x9B #xDF #x00)) :supplementary-p t) (define-charset 'hebrew-iso-8859-8-subset "Subset of ISO-8859-8" :code-space [#xE0 #xFA] :subset '(iso-8859-8 #xE0 #xFA #x00)) :supplementary-p t) (define-coding-system 'hebrew-iso-with-8bit-bytes "The iso-8859-8 charset + bytes #x80-#xDF from CP862" :mnemonic ?H :coding-type 'charset :charset-list '(ascii hebrew-iso-8859-8-subset hebrew-MSDOS-binary graphic-MSDOS-subset) :post-read-conversion 'hebrew-iso-with-8bit-post-read :pre-write-conversion 'hebrew-iso-with-8bit-pre-write :ascii-compatible-p t) (defun hebrew-iso-with-8bit-post-read (length) (let ((src (concat "^" '[ #x80 ] "-" '[ #x9A ])) ;; seems "^\200-\232" does not work (sv-pos (point)) (max-pos (+ (point) length)) chr) (while (and (skip-chars-forward src max-pos) (setq chr (char-after))) ;; (message "At %d after char %d" (point) (char-after)) (delete-char 1) (insert-char (+ chr #x550) 1) ;; #x05D0 - #x80 (add-text-properties (1- (point)) (point) `(Hebrew DOS face menu)))) 0) (defun hebrew-iso-with-8bit-pre-write (start end) (let* ((text (if (numberp start) (buffer-substring start end) start)) (beg 0) (end (length text)) va) (while (setq beg (text-property-any beg end 'Hebrew 'DOS text)) (setq va (aref text beg)) (and (>= va #x05D0) ;; à (<= va #x05EA) ;; ú (aset text beg (- va #x550))) (setq beg (1+ beg))) (set-buffer (get-buffer-create " *heb-wrt*")) (delete-region (point-min) (point-max)) (insert text) nil)) There are some Problems: 1. (describe-character-set 'hebrew-MSDOS-binary) exit with error: Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 131 132 132 ...] The vector is the :map value. 2. The `:post-read-conversion' function must return a number otherwise there is an error. There is nothing about it in `define-coding-system' documentation. 3. The documentation for `write-region-annotate-functions' has: "The function should return a list of pairs of the form (POSITION . STRING), consisting of strings to be effectively inserted at the specified positions of the file being written (1 means to insert before the first byte written). The POSITIONs must be sorted into increasing order." This did not work at all. I had to use the alternate pathway: An annotation function can return with a different buffer current. Doing so removes the annotations returned by previous functions, and resets START and END to `point-min' and `point-max' of the new buffer. Thank you both. I will post when I'll finish all the details. Ehud. -- Ehud Karni Tel: +972-3-7966-561 /"\ Mivtach - Simon Fax: +972-3-7976-561 \ / ASCII Ribbon Campaign Insurance agencies (USA) voice mail and X Against HTML Mail http://www.mvs.co.il FAX: 1-815-5509341 / \ GnuPG: 98EA398D <http://www.keyserver.net/> Better Safe Than Sorry ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Usage of standard-display-table in MSDOS 2010-09-07 21:11 ` Ehud Karni @ 2010-09-09 11:57 ` Kenichi Handa 0 siblings, 0 replies; 36+ messages in thread From: Kenichi Handa @ 2010-09-09 11:57 UTC (permalink / raw) To: ehud; +Cc: eliz, emacs-devel In article <201009072111.o87LBeU2009811@beta.mvs.co.il>, "Ehud Karni" <ehud@unix.mvs.co.il> writes: > > If you don't like it, you must give up decoding bytes > > #x80..#x9A into Hebrew chars. You decode them as raw-bytes, > > and setup a display table to display them as Hebrew chars. > > It can be done by this code: > I think I solved this by using text properties. Ah, ummm, a kind of dirty hack, but, perhaps it's unavoidable in your situation. > It is still unfinished, but it works, and I'll appreciate any comments. > There are some problems, see at the end. > Here is what I did (based on your advice). > (define-charset 'hebrew-MSDOS-binary > "Hebrew subset of CP862 (#x80-#x9A) with no-conversion" > :code-space [#x80 #x9A] > :map (let ((map (make-vector 54 0)) > (ix 27)) > (while (> ix 0) > (setq ix (1- ix)) > (aset map (+ ix ix) (+ #x80 ix)) > (aset map (+ ix ix 1) (+ #x80 ix))) > map) > :supplementary-p t) For that, you don't have to use :map but can use :code-offset as this: (define-charset 'hebrew-MSDOS-binary "Hebrew subset of CP862 (#x80-#x9A) with no-conversion" :code-space [#x80 #x9A] :code-offset #x80 :supplementary-p t) > (defun hebrew-iso-with-8bit-pre-write (start end) > (let* ((text (if (numberp start) > (buffer-substring start end) > start)) > (beg 0) > (end (length text)) > va) > (while (setq beg (text-property-any beg end 'Hebrew 'DOS text)) > (setq va (aref text beg)) > (and (>= va #x05D0) ;; א > (<= va #x05EA) ;; ת > (aset text beg (- va #x550))) > (setq beg (1+ beg))) > (set-buffer (get-buffer-create " *heb-wrt*")) > (delete-region (point-min) (point-max)) > (insert text) > nil)) You don't have to make a working buffer. You can directly modify the current buffer because it's already a working buffer managed by Emacs itself. > There are some Problems: > 1. (describe-character-set 'hebrew-MSDOS-binary) exit with error: > Wrong type argument: char-or-string-p, [128 128 129 129 130 130 131 131 132 132 ...] > The vector is the :map value. Ah, I'll fix it soon. > 2. The `:post-read-conversion' function must return a number otherwise there is an error. > There is nothing about it in `define-coding-system' documentation. I'm going to fix the docstring as this. `:post-read-conversion' VALUE must be a function to call after some text is inserted and decoded by the coding system itself and before any functions in `after-insert-functions' are called. This function is passed one argument; the number of characters in the text to convert, with point at the start of the text. The function should leave point the same, and return the new character count. > 3. The documentation for `write-region-annotate-functions' has: > "The function should return a list of pairs of the form (POSITION . STRING), > consisting of strings to be effectively inserted at the specified positions > of the file being written (1 means to insert before the first byte written). > The POSITIONs must be sorted into increasing order." > This did not work at all. I had to use the alternate pathway: > An annotation function can return with a different buffer current. > Doing so removes the annotations returned by previous functions, and > resets START and END to `point-min' and `point-max' of the new buffer. For this part, I'm going to fix as this: `:pre-write-conversion' VALUE must be a function to call after all functions in `write-region-annotate-functions' and `buffer-file-format' are called, and before the text is encoded by the coding system itself. This function should convert the whole text in the current buffer. For backward compatibility, this funciton is passed two arguments which can be ignored. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2010-09-09 11:57 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-23 12:44 Usage of standard-display-table in MSDOS Kenichi Handa 2010-08-24 5:34 ` Stephen J. Turnbull 2010-08-24 11:13 ` Ehud Karni 2010-08-24 16:51 ` Eli Zaretskii 2010-08-25 13:04 ` Ehud Karni 2010-08-25 18:09 ` Eli Zaretskii 2010-08-26 15:26 ` Ehud Karni 2010-08-26 16:43 ` Eli Zaretskii 2010-08-27 13:35 ` Ehud Karni 2010-08-27 16:30 ` Eli Zaretskii 2010-08-27 10:24 ` Eli Zaretskii 2010-08-27 11:44 ` Kenichi Handa 2010-08-27 14:13 ` Eli Zaretskii 2010-08-28 4:18 ` Kenichi Handa 2010-08-28 7:22 ` Eli Zaretskii 2010-08-30 2:24 ` Kenichi Handa 2010-08-30 3:02 ` Eli Zaretskii 2010-09-01 3:21 ` Kenichi Handa 2010-09-01 9:20 ` Ehud Karni 2010-09-01 23:33 ` Ehud Karni 2010-09-02 5:19 ` Eli Zaretskii 2010-09-02 5:20 ` Kenichi Handa 2010-09-04 22:54 ` Ehud Karni 2010-09-06 1:30 ` Kenichi Handa 2010-09-02 12:32 ` Kenichi Handa 2010-09-04 23:32 ` Ehud Karni 2010-09-05 5:30 ` Eli Zaretskii 2010-09-06 5:14 ` Kenichi Handa 2010-08-29 10:16 ` Ehud Karni 2010-08-29 11:21 ` Eli Zaretskii 2010-08-29 11:49 ` Ehud Karni 2010-08-29 13:06 ` Ehud Karni 2010-08-29 13:50 ` Eli Zaretskii 2010-08-29 14:04 ` Eli Zaretskii 2010-09-07 21:11 ` Ehud Karni 2010-09-09 11:57 ` Kenichi Handa
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.