* Composing Hebrew diacriticals @ 2010-04-30 12:29 Eli Zaretskii 2010-05-05 2:39 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-04-30 12:29 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel AFAICS, Hebrew diacriticals are not currently composed with consonants, even in the pre-bidi display. Is that true? If it's true, what will it take to construct the appropriate composition rules for the diacriticals? I tried to read composite.el and some of the files in lisp/language that appear to use its facilities, but quickly got lost, probably because I know to little about this matter. The ELisp manual also doesn't document these facilities, unfortunately. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-04-30 12:29 Composing Hebrew diacriticals Eli Zaretskii @ 2010-05-05 2:39 ` Kenichi Handa 2010-05-05 15:49 ` David Kastrup ` (2 more replies) 0 siblings, 3 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-05 2:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83mxwlw2c0.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > AFAICS, Hebrew diacriticals are not currently composed with > consonants, even in the pre-bidi display. Is that true? Yes. > If it's true, what will it take to construct the appropriate > composition rules for the diacriticals? This will be ok. (set-char-table-range composition-function-table '(#x591 . #x5F4) '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) On Windows, I expect uniscribe already has capability to handle that. On GNU/Linux, unfortunately, m17n-db packages of various distribution is not yet updated. So, you have to install a few database files manually. I'll prepare those files soon so that GNU/Linux users can test hebrew composition easily. > I tried to read composite.el and some of the files in lisp/language > that appear to use its facilities, but quickly got lost, probably > because I know to little about this matter. The ELisp manual also > doesn't document these facilities, unfortunately. Sorry for my laziness. For composition, there are many stakeholders and I'm still wondering to what degree to document. For instance, you have to understand what OpenType font can do and what it can't do. The former part can be delegated to font-shape-gstring, but the latter part must be implemented by lisp. The characteristics of terminal (graphical or charater) must also be concerned. I'll provide a proper internal document (i.e. for Emacs developpers) in haste. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 2:39 ` Kenichi Handa @ 2010-05-05 15:49 ` David Kastrup 2010-05-05 20:51 ` Eli Zaretskii 2010-05-06 0:45 ` Kenichi Handa 2010-05-05 18:01 ` Eli Zaretskii 2010-05-06 14:59 ` Yair F. 2 siblings, 2 replies; 88+ messages in thread From: David Kastrup @ 2010-05-05 15:49 UTC (permalink / raw) To: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <83mxwlw2c0.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > >> AFAICS, Hebrew diacriticals are not currently composed with >> consonants, even in the pre-bidi display. Is that true? > > Yes. > >> If it's true, what will it take to construct the appropriate >> composition rules for the diacriticals? > Uh, with the current version of Emacs GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola I tried some copy&paste of Yiddish text including diacriticals into Emacs, with the result that without bidi-display-reordering set, the diacriticals were not composed with the letters, and with bidi-display-reordering set, they were composed just fine. So I am not sure whether you currently are discussing how to best fix something that is not actually broken. -- David Kastrup ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 15:49 ` David Kastrup @ 2010-05-05 20:51 ` Eli Zaretskii 2010-05-06 7:20 ` David Kastrup 2010-05-06 0:45 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-05 20:51 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel > From: David Kastrup <dak@gnu.org> > Date: Wed, 05 May 2010 17:49:35 +0200 > > Uh, with the current version of Emacs > > GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola > > I tried some copy&paste of Yiddish text including diacriticals into > Emacs, with the result that without bidi-display-reordering set, the > diacriticals were not composed with the letters, and with > bidi-display-reordering set, they were composed just fine. Please show a recipe to reproduce this. Compositions with reordering is something that wasn't coded yet (I think that's what Handa-san works on now), so I'm not sure how this could work. In any case, compositions without reordering ought to work. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 20:51 ` Eli Zaretskii @ 2010-05-06 7:20 ` David Kastrup 0 siblings, 0 replies; 88+ messages in thread From: David Kastrup @ 2010-05-06 7:20 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 991 bytes --] Eli Zaretskii <eliz@gnu.org> writes: >> From: David Kastrup <dak@gnu.org> >> Date: Wed, 05 May 2010 17:49:35 +0200 >> >> Uh, with the current version of Emacs >> >> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola >> >> I tried some copy&paste of Yiddish text including diacriticals into >> Emacs, with the result that without bidi-display-reordering set, the >> diacriticals were not composed with the letters, and with >> bidi-display-reordering set, they were composed just fine. > > Please show a recipe to reproduce this. Compositions with reordering > is something that wasn't coded yet (I think that's what Handa-san > works on now), so I'm not sure how this could work. In any case, > compositions without reordering ought to work. Copy and paste from <URL:http://en.wikipedia.org/wiki/Yiddish_language> into a buffer. Pasting the language is called לשון־אַשכּנז (loshn-ashkenaz with bidi-display-reordering unset, I get [-- Attachment #2: Screenshot.png --] [-- Type: image/png, Size: 548 bytes --] [-- Attachment #3: Type: text/plain, Size: 31 bytes --] and with it being set, I get [-- Attachment #4: Screenshot-1.png --] [-- Type: image/png, Size: 498 bytes --] [-- Attachment #5: Type: text/plain, Size: 205 bytes --] As you can see, the aleph is composed with the "a" vowel underneath, and the kaph gets a dagesh. So there clearly happens some composition. Whether GTK+ is involved, I have no idea. -- David Kastrup ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 15:49 ` David Kastrup 2010-05-05 20:51 ` Eli Zaretskii @ 2010-05-06 0:45 ` Kenichi Handa 2010-05-06 12:14 ` David Kastrup 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-06 0:45 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel In article <87eihqqrg0.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes: > Uh, with the current version of Emacs > GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola > I tried some copy&paste of Yiddish text including diacriticals into > Emacs, with the result that without bidi-display-reordering set, the > diacriticals were not composed with the letters, and with > bidi-display-reordering set, they were composed just fine. Does cursor movement treat a character and the combined diacritical mark as one unit? What is shown when you put a cursor on a character that is composed with a diacritical mark and type C-u C-x =? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-06 0:45 ` Kenichi Handa @ 2010-05-06 12:14 ` David Kastrup 2010-05-06 13:01 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: David Kastrup @ 2010-05-06 12:14 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <87eihqqrg0.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes: > >> Uh, with the current version of Emacs > >> GNU Emacs 24.0.50.3 (i686-pc-linux-gnu, GTK+ Version 2.20.0) of 2010-05-05 on lola > >> I tried some copy&paste of Yiddish text including diacriticals into >> Emacs, with the result that without bidi-display-reordering set, the >> diacriticals were not composed with the letters, and with >> bidi-display-reordering set, they were composed just fine. > > Does cursor movement treat a character and the combined > diacritical mark as one unit? No. > What is shown when you put a cursor on a character that is > composed with a diacritical mark and type C-u C-x =? Depends on where I put the cursor. -- David Kastrup ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-06 12:14 ` David Kastrup @ 2010-05-06 13:01 ` Kenichi Handa 0 siblings, 0 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-06 13:01 UTC (permalink / raw) To: David Kastrup; +Cc: emacs-devel In article <876331p6r3.fsf@lola.goethe.zz>, David Kastrup <dak@gnu.org> writes: >>> I tried some copy&paste of Yiddish text including diacriticals into >>> Emacs, with the result that without bidi-display-reordering set, the >>> diacriticals were not composed with the letters, and with >>> bidi-display-reordering set, they were composed just fine. > > > > Does cursor movement treat a character and the combined > > diacritical mark as one unit? > No. Then, it seems that your Yiddish font just has special left and right bearings for diacriticals that is suitable for the case that they are drawn before (i.e. at the left of) a base character. > > What is shown when you put a cursor on a character that is > > composed with a diacritical mark and type C-u C-x =? > Depends on where I put the cursor. Of course. I just wanted to see some example to check if they are really composed or not. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 2:39 ` Kenichi Handa 2010-05-05 15:49 ` David Kastrup @ 2010-05-05 18:01 ` Eli Zaretskii 2010-05-07 11:15 ` Kenichi Handa 2010-05-06 14:59 ` Yair F. 2 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-05 18:01 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Wed, 05 May 2010 11:39:28 +0900 > > > If it's true, what will it take to construct the appropriate > > composition rules for the diacriticals? > > This will be ok. > > (set-char-table-range > composition-function-table '(#x591 . #x5F4) > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > > On Windows, I expect uniscribe already has capability to > handle that. Thanks, but when I evaluate this expression and insert a consonant followed by a diacriticals, Emacs infloops. Do you think that this is something specific to uniscribe, or to how Emacs uses uniscribe? ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 18:01 ` Eli Zaretskii @ 2010-05-07 11:15 ` Kenichi Handa 2010-05-08 12:51 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-07 11:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel In article <83r5lqjkho.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > > Cc: emacs-devel@gnu.org > > Date: Wed, 05 May 2010 11:39:28 +0900 > > > > > If it's true, what will it take to construct the appropriate > > > composition rules for the diacriticals? > > > > This will be ok. > > > > (set-char-table-range > > composition-function-table '(#x591 . #x5F4) > > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > > > > On Windows, I expect uniscribe already has capability to > > handle that. > Thanks, but when I evaluate this expression and insert a consonant > followed by a diacriticals, Emacs infloops. Do you think that this is > something specific to uniscribe, or to how Emacs uses uniscribe? I'm not sure. I don't see infloops on GNU/Linux. But, investigating it based on the current problematic code is a waste of time. I'd like to concentrate on fixing the current code at the moment. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 11:15 ` Kenichi Handa @ 2010-05-08 12:51 ` Eli Zaretskii 0 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-08 12:51 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-devel@gnu.org > Date: Fri, 07 May 2010 20:15:03 +0900 > > > > (set-char-table-range > > > composition-function-table '(#x591 . #x5F4) > > > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > > > > > > On Windows, I expect uniscribe already has capability to > > > handle that. > > > Thanks, but when I evaluate this expression and insert a consonant > > followed by a diacriticals, Emacs infloops. Do you think that this is > > something specific to uniscribe, or to how Emacs uses uniscribe? > > I'm not sure. I don't see infloops on GNU/Linux. But, > investigating it based on the current problematic code is a > waste of time. I'm not sure what problematic code you have in mind. I see the infinite loop without turning bidi-display-reordering on, and also in Emacs 23.2. In the debugger, I see that it loops inside move_it_in_display_line_to, because set_iterator_to_next does not advance to the next character. This happens because autocmp_chars and composition_update_it produce a cmp_it structure with this contents: (gdb) p *cmp_it $62 = { stop_pos = 308, id = 4, ch = 1489, lookback = 0, nglyphs = 1, nchars = 0, nbytes = 0, from = 0, to = 1, width = 0 } Since nchars and nbytes are zero, the iterator does not move. The character whose codepoint is 1489 is u+05d1, HEBREW LETTER BET. I typed that character, followed by a u+05bc, HEBREW POINT DAGESH, after evaluating the expression you suggested (see above). Emacs loops after I type the DAGESH. This code is almost 100% platform independent, except where font-shape-gstring calls Uniscribe. Since you don't see the loop, what could be the reason for that, if it's not something specific to Uniscribe? Did I do something wrong? Thanks. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-05 2:39 ` Kenichi Handa 2010-05-05 15:49 ` David Kastrup 2010-05-05 18:01 ` Eli Zaretskii @ 2010-05-06 14:59 ` Yair F. 2010-05-06 17:41 ` Eli Zaretskii 2010-05-07 0:48 ` Kenichi Handa 2 siblings, 2 replies; 88+ messages in thread From: Yair F. @ 2010-05-06 14:59 UTC (permalink / raw) To: emacs-devel Kenichi Handa <handa <at> m17n.org> writes: > This will be ok. > > (set-char-table-range > composition-function-table '(#x591 . #x5F4) > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > There are some restrictions on which characters are allowed to be composed. I've started working on composition, and will update when it's ready. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-06 14:59 ` Yair F. @ 2010-05-06 17:41 ` Eli Zaretskii 2010-05-07 0:48 ` Kenichi Handa 1 sibling, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-06 17:41 UTC (permalink / raw) To: Yair F.; +Cc: emacs-devel > From: Yair F. <yair.f.lists@gmail.com > > Date: Thu, 6 May 2010 14:59:23 +0000 (UTC) > > I've started working on composition, and will update when it's ready. Thanks. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-06 14:59 ` Yair F. 2010-05-06 17:41 ` Eli Zaretskii @ 2010-05-07 0:48 ` Kenichi Handa 2010-05-07 4:41 ` Yair F 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-07 0:48 UTC (permalink / raw) To: Yair F.; +Cc: emacs-devel In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes: > Kenichi Handa <handa <at> m17n.org> writes: > > This will be ok. > > > > (set-char-table-range > > composition-function-table '(#x591 . #x5F4) > > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > > > There are some restrictions on which characters are allowed to be composed. Yes. If font_driver->shape is implemented correctly, font-shape-gstring should compose the given sequence with multiple grapheme clusters. But, I admit the above regexp is not optimal. For efficiency, the regexp should match only with a character sequence that is composed into one grapheme cluster. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 0:48 ` Kenichi Handa @ 2010-05-07 4:41 ` Yair F 2010-05-07 6:23 ` Kenichi Handa 2010-05-07 9:28 ` Eli Zaretskii 0 siblings, 2 replies; 88+ messages in thread From: Yair F @ 2010-05-07 4:41 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Fri, May 7, 2010 at 3:48 AM, Kenichi Handa <handa@m17n.org> wrote: > In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes: > >> Kenichi Handa <handa <at> m17n.org> writes: >> > This will be ok. >> > >> > (set-char-table-range >> > composition-function-table '(#x591 . #x5F4) >> > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) >> > > >> There are some restrictions on which characters are allowed to be composed. > Something like: (set-char-table-range composition-function-table '(#x591 . #x5F4) (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) and now Needs to define hebrew-composable-pattern > Yes. If font_driver->shape is implemented correctly, > font-shape-gstring should compose the given sequence with > multiple grapheme clusters. > > But, I admit the above regexp is not optimal. For > efficiency, the regexp should match only with a character > sequence that is composed into one grapheme cluster. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 4:41 ` Yair F @ 2010-05-07 6:23 ` Kenichi Handa 2010-05-07 10:00 ` Yair F 2010-05-07 9:28 ` Eli Zaretskii 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-07 6:23 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <x2hba5bff411005062141rbcadbcd5va8b1ead65f40aef8@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Something like: > (set-char-table-range > composition-function-table '(#x591 . #x5F4) > (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) > and now Needs to define hebrew-composable-pattern If what composed are only diacritical marks, and they are placed on any base characters, it is better to set that kind of list only for hebrew diacriticals for efficiency. So, the code will be something like this: (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...)) (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..])) (dolist (elt hebrew-diacritals-list) (set-char-table-range elt (list (vector regexp 1 'font-shape-gstring))))) Here "1" is for moving back one character to check matching with REGEXP. >> There are some restrictions on which characters are allowed to be composed. If that restrictions are more rigid, regexp should vary for each diacritical mark. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 6:23 ` Kenichi Handa @ 2010-05-07 10:00 ` Yair F 2010-05-07 11:11 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-07 10:00 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Fri, May 7, 2010 at 9:23 AM, Kenichi Handa <handa@m17n.org> wrote: > If what composed are only diacritical marks, and they are > placed on any base characters, it is better to set that kind > of list only for hebrew diacriticals for efficiency. So, > the code will be something like this: > > (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...)) > (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..])) > (dolist (elt hebrew-diacritals-list) > (set-char-table-range elt > (list (vector regexp 1 'font-shape-gstring))))) > > Here "1" is for moving back one character to check matching > with REGEXP. > >>> There are some restrictions on which characters are allowed to be composed. > > If that restrictions are more rigid, regexp should vary for > each diacritical mark. This is the composition regexp : I added whitespace and comments for readability \\( [\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base [\u05BC\u05BF]? ;; 0-1 marks of 1st class (dagesh) [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\| \u05D5 ; base \u05BC? ;; 0-1 marks of 1st class (dagesh) [\u05B0-\u05BB\u05C7]? ;; 0-1 marks of extended 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\| \u05E9 ; base \u05BC ;; 0-1 marks of 1st class (dagesh) [\u05C1\u05C2]? ;; 0-1 marks of 2nd class (shin dot) [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd class (niqud) [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) marks of 4th class \\) What would be the best way in this case? In the most extreme case there are 6 marks attached to base character. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 10:00 ` Yair F @ 2010-05-07 11:11 ` Kenichi Handa 0 siblings, 0 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-07 11:11 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <p2hba5bff411005070300lbc01fb06k9a753cf629b1b4c0@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > This is the composition regexp : I added whitespace and comments for readability > \\( > [\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3] ;; base > [\u05BC\u05BF]? ;; 0-1 marks of 1st > class (dagesh) > [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd > class (niqud) > [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) > marks of 4th class > \\| > \u05D5 ; base > \u05BC? ;; 0-1 marks of 1st > class (dagesh) > [\u05B0-\u05BB\u05C7]? ;; 0-1 marks of > extended 3rd class (niqud) > [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) > marks of 4th class > \\| > \u05E9 ; base > \u05BC ;; 0-1 marks of 1st > class (dagesh) > [\u05C1\u05C2]? ;; 0-1 marks of 2nd > class (shin dot) > [\u05B0-\u05B9\u05BB\u05C7]? ;; 0-1 marks of 3rd > class (niqud) > [\u0591-\u05AF\u05BD]* ;; 0-2 (possibly 3) > marks of 4th class > \\) > What would be the best way in this case? > In the most extreme case there are 6 marks attached to base character. Thank you for the data. As I'm now busy for fixing the display problem of bidi-composition interaction, I'll reply after that work is finished. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 4:41 ` Yair F 2010-05-07 6:23 ` Kenichi Handa @ 2010-05-07 9:28 ` Eli Zaretskii 2010-05-10 14:09 ` Yair F 1 sibling, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-07 9:28 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Fri, 7 May 2010 07:41:54 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: emacs-devel@gnu.org > > On Fri, May 7, 2010 at 3:48 AM, Kenichi Handa <handa@m17n.org> wrote: > > In article <loom.20100506T165338-12@post.gmane.org>, "Yair F." <yair.f.lists@gmail.com> writes: > > > >> Kenichi Handa <handa <at> m17n.org> writes: > >> > This will be ok. > >> > > >> > (set-char-table-range > >> > composition-function-table '(#x591 . #x5F4) > >> > '(["[\u0591-\u05F4]+" 0 font-shape-gstring])) > >> > > > > >> There are some restrictions on which characters are allowed to be composed. > > > > Something like: > > (set-char-table-range > composition-function-table '(#x591 . #x5F4) > (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) > > and now Needs to define hebrew-composable-pattern Yair, does the above work for you? For me (on MS-Windows), it infloops. I will try to find out where it loops later. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-07 9:28 ` Eli Zaretskii @ 2010-05-10 14:09 ` Yair F 2010-05-11 0:25 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-10 14:09 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa > Yair, does the above work for you? For me (on MS-Windows), it > infloops. I will try to find out where it loops later. > The code below sort-of works for me. However, composition only occurs sometimes. I believe it depends on the font being used including font size. How can I debug it further? (Kubuntu/GTK/XFT) (defconst hebrew-composable-pattern (concat "\\(" "[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3]" ;; base (except vav and shin) "[\u05BC\u05BF]?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (vowel) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) "\\|" "\u05D5" ;; base (vav) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05BB\u05C7]?" ;; 0-1 marks of extended 3rd class (vowel) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) "\\|" "\u05E9" ;; base (shin) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05C1\u05C2]?" ;; 0-1 marks of 2nd class (shin dot) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (vowel) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) "\\)") "Regexp matching a composable sequence of Hebrew characters.") (set-char-table-range composition-function-table '(#x591 . #x5F4) (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-10 14:09 ` Yair F @ 2010-05-11 0:25 ` Kenichi Handa 2010-05-11 12:20 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-11 0:25 UTC (permalink / raw) To: Yair F; +Cc: eliz, emacs-devel In article <AANLkTikXVDpVwaB6xGa_ktqw0fsQ7nXdvE9SZ_r0m6mM@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > > Yair, does the above work for you? For me (on MS-Windows), it > > infloops. I will try to find out where it loops later. > > > The code below sort-of works for me. > However, composition only occurs sometimes. I believe it depends on > the font being used including font size. > How can I debug it further? I'm now working on fixing the problem of bidi-composition interaction. Please wait for a while. --- Kenichi Handa handa@m17n.org > (Kubuntu/GTK/XFT) > (defconst hebrew-composable-pattern > (concat > "\\(" > "[\u05D0-\u05D4\u05D6-\u05E8\u05EA\u05F1-\u05F3]" > ;; base (except vav and shin) > "[\u05BC\u05BF]?" ;; 0-1 marks of 1st class (dagesh) > "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (vowel) > "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) > "\\|" > "\u05D5" ;; base (vav) > "\u05BC?" ;; 0-1 marks of 1st class (dagesh) > "[\u05B0-\u05BB\u05C7]?" ;; 0-1 marks of extended 3rd class (vowel) > "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) > "\\|" > "\u05E9" ;; base (shin) > "\u05BC?" ;; 0-1 marks of 1st class (dagesh) > "[\u05C1\u05C2]?" ;; 0-1 marks of 2nd class (shin dot) > "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (vowel) > "[\u0591-\u05AF\u05BD]*" ;; 0-2 (3?) marks of 4th class (cantill.) > "\\)") > "Regexp matching a composable sequence of Hebrew characters.") > (set-char-table-range > composition-function-table '(#x591 . #x5F4) > (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-11 0:25 ` Kenichi Handa @ 2010-05-11 12:20 ` Kenichi Handa 2010-05-11 16:22 ` Eli Zaretskii 2010-05-11 21:40 ` Yair F 0 siblings, 2 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-11 12:20 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, yair.f.lists, emacs-devel [-- Attachment #1: Type: text/plain, Size: 660 bytes --] In article <tl7d3x3cmja.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > I'm now working on fixing the problem of bidi-composition > interaction. Please wait for a while. It seems that my code has started to work. The attached is the image of displaying the following line: A "אֲעוֹלֵל 123 כַּגֶּפֶן" B. with Emacs. I got the two hebrew words from http://www.zemer.co.il/song.asp?id=626&artist=157, and inserted "123" between them just for testing nest. Hebrew chars are displayed by "dejavu sans mono" which has OpenType table for Hebrew. Could you please check the rendering (position of diacriticals, etc.)? [-- Attachment #2: Hebrew sample --] [-- Type: image/png, Size: 61425 bytes --] [-- Attachment #3: Type: text/plain, Size: 34 bytes --] --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-11 12:20 ` Kenichi Handa @ 2010-05-11 16:22 ` Eli Zaretskii 2010-05-12 8:04 ` Kenichi Handa 2010-05-11 21:40 ` Yair F 1 sibling, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-11 16:22 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: yair.f.lists@gmail.com, eliz@gnu.org, emacs-devel@gnu.org > Date: Tue, 11 May 2010 21:20:57 +0900 > > Could you please check the rendering (position of > diacriticals, etc.)? Looks okay to me. Thanks! ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-11 16:22 ` Eli Zaretskii @ 2010-05-12 8:04 ` Kenichi Handa 2010-05-12 17:35 ` Eli Zaretskii 2010-05-19 17:23 ` Eli Zaretskii 0 siblings, 2 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-12 8:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel In article <83pr12pfw6.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Could you please check the rendering (position of > > diacriticals, etc.)? > Looks okay to me. Thanks! Thank you for the check. Now at least rendering with bidi and composition is working. But, I found that cursor positioning sometimes failed even for a text that doesn't involve composition. For instance, when I have this text in a buffer or bidi-display-reordering t, ---------- A א B. בג ---------- and point is at (i.e. before) "A", typing C-e moves point to "ג". This happens with the vanilla trunk code. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 8:04 ` Kenichi Handa @ 2010-05-12 17:35 ` Eli Zaretskii 2010-05-12 19:05 ` Juanma Barranquero 2010-05-13 0:42 ` Kenichi Handa 2010-05-19 17:23 ` Eli Zaretskii 1 sibling, 2 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-12 17:35 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org > Date: Wed, 12 May 2010 17:04:09 +0900 > > I found that cursor positioning sometimes failed even for a text > that doesn't involve composition. For instance, when I have this > text in a buffer or bidi-display-reordering t, > ---------- > A א B. > בג > ---------- > and point is at (i.e. before) "A", typing C-e moves point to > "ג". > > This happens with the vanilla trunk code. Yes, cursor motion is a constant source of trouble with bidi reordering. Too many places assume strictly unidirectional display and rely on the fact that the first (leftmost) character of line N follows the last character of line N-1 in buffer's logical order. It takes time to learn all of the hidden assumptions and fix them. C-a and C-e are even trickier than C-f and C-b, because they move by buffer positions assuming that screen positions correspond to those 1:1. I already refactored the related functions twice, and recently came to a conclusion that I need to refactor them yet again, sigh... Thanks for the test case, I will add it to my other tests. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 17:35 ` Eli Zaretskii @ 2010-05-12 19:05 ` Juanma Barranquero 2010-05-13 3:06 ` Eli Zaretskii 2010-05-13 0:42 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Juanma Barranquero @ 2010-05-12 19:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel, Kenichi Handa On Wed, May 12, 2010 at 19:35, Eli Zaretskii <eliz@gnu.org> wrote: > Yes, cursor motion is a constant source of trouble with bidi > reordering. It's weird that it also happens in text with no bidi reordering whatsoever, just activating line-by-line-scrolling and moving over a control char (bug#6036). Juanma ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 19:05 ` Juanma Barranquero @ 2010-05-13 3:06 ` Eli Zaretskii 0 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-13 3:06 UTC (permalink / raw) To: Juanma Barranquero; +Cc: yair.f.lists, emacs-devel, handa > From: Juanma Barranquero <lekktu@gmail.com> > Date: Wed, 12 May 2010 21:05:20 +0200 > Cc: Kenichi Handa <handa@m17n.org>, yair.f.lists@gmail.com, emacs-devel@gnu.org > > On Wed, May 12, 2010 at 19:35, Eli Zaretskii <eliz@gnu.org> wrote: > > > Yes, cursor motion is a constant source of trouble with bidi > > reordering. > > It's weird that it also happens in text with no bidi reordering > whatsoever, just activating line-by-line-scrolling and moving over a > control char (bug#6036). That's a different, though related, problem. The cursor positioning code was completely rewritten in Emacs 24, to be able to support reordered lines, and that rewrite affects even lines that aren't actually reordered. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 17:35 ` Eli Zaretskii 2010-05-12 19:05 ` Juanma Barranquero @ 2010-05-13 0:42 ` Kenichi Handa 2010-05-14 8:10 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-13 0:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel In article <83fx1xowfj.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Yes, cursor motion is a constant source of trouble with bidi > reordering. Too many places assume strictly unidirectional display > and rely on the fact that the first (leftmost) character of line N > follows the last character of line N-1 in buffer's logical order. It > takes time to learn all of the hidden assumptions and fix them. C-a > and C-e are even trickier than C-f and C-b, because they move by > buffer positions assuming that screen positions correspond to those > 1:1. > I already refactored the related functions twice, and recently came to > a conclusion that I need to refactor them yet again, sigh... > Thanks for the test case, I will add it to my other tests. Ok, then, as soon as I conclude that at least the displaying looks correct, I'll commit my new code without testing it against cursor positioning. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-13 0:42 ` Kenichi Handa @ 2010-05-14 8:10 ` Kenichi Handa 2010-05-14 10:02 ` Eli Zaretskii [not found] ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com> 0 siblings, 2 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-14 8:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, yair.f.lists, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1061 bytes --] In article <tl7r5lgbpit.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > Ok, then, as soon as I conclude that at least the displaying > looks correct, I'll commit my new code without testing it > against cursor positioning. I've just committed a fix. Eli, please check the comments of set_iterator_to_next, and verify that I'm doing the right thing. I have not yet committed proper codes for Hebrew composition. I'm now testing with this simple version. (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+")) (set-char-table-range composition-function-table '(#x591 . #x5C7) (list (vector pattern 1 'font-shape-gstring) ["[\u0591-\u05C7]" 0 font-shape-gstring])) (set-char-table-range composition-function-table #x5C0 nil) (set-char-table-range composition-function-table #x5C6 nil)) (set-fontset-font t 'hebrew '("dejavu sans" . "unicode-bmp")) On GNU/Linux, you have to build Emacs with libotf and m17n-flt libraries, and having the attached file in ~/.m17n.d/HEBR-FF.flt. --- Kenichi Handa handa@m17n.org [-- Attachment #2: HEBR-FF.flt --] [-- Type: text/plain, Size: 1398 bytes --] ;; HEBR-FF.flt -- Font Layout Table for Hebrew OTF font ;; Copyright (C) 2010 AIST (H15PRO112) ;; See the end for copying conditions. (font layouter hebr-ff nil (font (nil nil unicode-bmp) :otf=hebr+mark)) (category (0x0591 0x05BF ?h) (0x05C1 0x05C5 ?h) (0x05C7 ?h) (0x05D0 0x05F2 ?H)) (generator (0 (cond ("Hhh*" < :otf=hebr=ccmp+mark >) ("h" [ = ]) ("." =)))) ;; Copyright (C) 2010 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H15PRO112 ;; This file is part of the m17n database; a sub-part of the m17n ;; library. ;; The m17n library is free software; you can redistribute it and/or ;; modify it under the terms of the GNU Lesser General Public License ;; as published by the Free Software Foundation; either version 2.1 of ;; the License, or (at your option) any later version. ;; The m17n library is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; Lesser General Public License for more details. ;; You should have received a copy of the GNU Lesser General Public ;; License along with the m17n library; if not, write to the Free ;; Software Foundation, Inc., 51 Franklin Street, Fifth Floor, ;; Boston, MA 02110-1301, USA. ;; Local Variables: ;; mode: lisp ;; End: ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 8:10 ` Kenichi Handa @ 2010-05-14 10:02 ` Eli Zaretskii 2010-05-14 11:58 ` Kenichi Handa [not found] ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com> 1 sibling, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-14 10:02 UTC (permalink / raw) To: Kenichi Handa, Jason Rumney; +Cc: yair.f.lists, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: eliz@gnu.org, yair.f.lists@gmail.com, emacs-devel@gnu.org > Date: Fri, 14 May 2010 17:10:33 +0900 > > I've just committed a fix. > > Eli, please check the comments of set_iterator_to_next, and > verify that I'm doing the right thing. It looks okay at a first glance, thank you! In the HELLO buffer, the RLM character is not composed with the following parenthesis, though. Is this a separate problem? I will work on the issues you raised in the comments. For now, I have just one response: in this fragment from set_iterator_to_next: /* Update IT's char/byte positions to point the first character of the next grapheme cluster, or to the character visually after the current composition. */ #if 0 /* Is it ok to do this directly? */ IT_CHARPOS (*it) += it->cmp_it.nchars; IT_BYTEPOS (*it) += it->cmp_it.nbytes; #else /* Or do we have to call bidi_get_next_char_visually repeatedly (perhaps not to confuse some internal state of bidi_it)? At least we must do this if we have consumed all grapheme clusters in the current composition because the next character will be in the different bidi level. */ for (i = 0; i < it->cmp_it.nchars; i++) bidi_get_next_char_visually (&it->bidi_it); the "#else" part is doing TRT. You cannot jump to a different place in the buffer/string behind the back of bidi_get_next_char_visually, because that would violate the integrity of its internal cache, which must correspond to the buffer/string positions 1:1. > I have not yet committed proper codes for Hebrew > composition. I'm now testing with this simple version. > > (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+")) > (set-char-table-range > composition-function-table '(#x591 . #x5C7) > (list (vector pattern 1 'font-shape-gstring) > ["[\u0591-\u05C7]" 0 font-shape-gstring])) > (set-char-table-range > composition-function-table #x5C0 nil) > (set-char-table-range > composition-function-table #x5C6 nil)) Could you please look at the message I posted in http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html? I still see the infloop, with the current trunk, even when bidi-display-reordering is set to nil, after I type BET and DAGESH, as described in that message. What kind of problems in the information that Uniscribe returns to Emacs could cause such a loop? If I type a different diacritical after BET, like PATAH, there's no infloop, but the display is incorrect: I see both the isolated PATAH and the composed BAT+PATAH after it. Jason, could you help me with this? It looks like some Uniscribe-specific issue. TIA ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 10:02 ` Eli Zaretskii @ 2010-05-14 11:58 ` Kenichi Handa 2010-05-14 13:29 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-14 11:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel, jasonr In article <837hn64x96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Eli, please check the comments of set_iterator_to_next, and > > verify that I'm doing the right thing. > It looks okay at a first glance, thank you! > In the HELLO buffer, the RLM character is not composed with the > following parenthesis, though. Is this a separate problem? Currently Emacs' category of RLM (U+200F) is not '^' (Combining). So it is not composed with the previous base character. BTW, I don't know which is better; showing RLM (and the other control characters) to users or not. If we show it, the display is a little bit annoying. If we hide it, it gets very difficult for users to manually edit it. > I will work on the issues you raised in the comments. For now, I have > just one response: in this fragment from set_iterator_to_next: > /* Update IT's char/byte positions to point the first > character of the next grapheme cluster, or to the > character visually after the current composition. */ > #if 0 > /* Is it ok to do this directly? */ > IT_CHARPOS (*it) += it->cmp_it.nchars; > IT_BYTEPOS (*it) += it->cmp_it.nbytes; > #else > /* Or do we have to call bidi_get_next_char_visually > repeatedly (perhaps not to confuse some internal > state of bidi_it)? At least we must do this if we > have consumed all grapheme clusters in the current > composition because the next character will be in the > different bidi level. */ > for (i = 0; i < it->cmp_it.nchars; i++) > bidi_get_next_char_visually (&it->bidi_it); > the "#else" part is doing TRT. You cannot jump to a different place > in the buffer/string behind the back of bidi_get_next_char_visually, > because that would violate the integrity of its internal cache, which > must correspond to the buffer/string positions 1:1. Ok, I see. > > I have not yet committed proper codes for Hebrew > > composition. I'm now testing with this simple version. > > > > (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+")) > > (set-char-table-range > > composition-function-table '(#x591 . #x5C7) > > (list (vector pattern 1 'font-shape-gstring) > > ["[\u0591-\u05C7]" 0 font-shape-gstring])) > > (set-char-table-range > > composition-function-table #x5C0 nil) > > (set-char-table-range > > composition-function-table #x5C6 nil)) > Could you please look at the message I posted in > http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html? > I still see the infloop, with the current trunk, even when > bidi-display-reordering is set to nil, after I type BET and DAGESH, as > described in that message. What kind of problems in the information > that Uniscribe returns to Emacs could cause such a loop? I'm now investigating that problem. But, as I don't have Emacs on Windows now, all I can do is just guessing. To debug C code of Emacs on Windows, which environment (cygwin, mingw, ...) is good? It will be good that there's an instruction simply showing just one method by saying something like: (1) Download this and this and this... (with actual URLs). (2) Install this, then do this, then install this, then do this... (3) Build Emacs under this terminal program. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 11:58 ` Kenichi Handa @ 2010-05-14 13:29 ` Eli Zaretskii 2010-05-14 14:06 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-14 13:29 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, yair.f.lists@gmail.com, emacs-devel@gnu.org > Date: Fri, 14 May 2010 20:58:00 +0900 > > In article <837hn64x96.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > Eli, please check the comments of set_iterator_to_next, and > > > verify that I'm doing the right thing. > > > It looks okay at a first glance, thank you! > > > In the HELLO buffer, the RLM character is not composed with the > > following parenthesis, though. Is this a separate problem? > > Currently Emacs' category of RLM (U+200F) is not '^' > (Combining). So it is not composed with the previous base > character. A separate problem, then. > BTW, I don't know which is better; showing RLM (and the > other control characters) to users or not. If we show it, > the display is a little bit annoying. If we hide it, it > gets very difficult for users to manually edit it. We need both, I think. But AFAIU, it would be easy to write a command that modifies the category you show above, and thus makes these characters appear and disappear. > > Could you please look at the message I posted in > > http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00251.html? > > I still see the infloop, with the current trunk, even when > > bidi-display-reordering is set to nil, after I type BET and DAGESH, as > > described in that message. What kind of problems in the information > > that Uniscribe returns to Emacs could cause such a loop? > > I'm now investigating that problem. Thanks. > But, as I don't have Emacs on Windows now, all I can do is > just guessing. > > To debug C code of Emacs on Windows, which environment > (cygwin, mingw, ...) is good? MinGW. (Cygwin is just a usual Unix build running on Windows using an X server, so I doubt if you will see there the same problem I do in the MinGW build, which uses Uniscribe to render characters.) > It will be good that there's an instruction simply showing just one > method by saying something like: > > (1) Download this and this and this... (with actual URLs). > (2) Install this, then do this, then install this, then do this... > (3) Build Emacs under this terminal program. This place may be a good starting point: http://ourcomments.org/Emacs/w32-build-emacs.html I use GCC, Binutils, GDB, and Make downloaded from the MinGW site (http://www.mingw.org/), `cp', `mv', and `rm' from the latest port of Coreutils from GnuWin32 (http://gnuwin32.sourceforge.net/). You will also need the MinGW Runtime (mingw-runtime-*.tar.gz) and the Windows API headers (w32api-*.tar.gz) from MinGW -- these are headers and libraries that are needed to compile and link against Windows system APIs. The XPM library, needed for displaying the tool bar, is available from ftp://ftp.gnu.org/gnu/emacs/windows, where you will also find a precompiled binary. I think that's about it. One other thing: to unpack .tar.gz archives on Windows, you can use the handy djtarnt.exe utility, which you can find here: ftp://ftp.gnu.org/old-gnu/emacs/windows/utilities/i386/ This utility supports only .tar.gz, not .tar.bz2. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 13:29 ` Eli Zaretskii @ 2010-05-14 14:06 ` Eli Zaretskii 0 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-14 14:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jasonr, yair.f.lists, emacs-devel, handa > Date: Fri, 14 May 2010 16:29:09 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org, jasonr@gnu.org > > The XPM library, needed for displaying the tool bar, is > available from ftp://ftp.gnu.org/gnu/emacs/windows, where you will > also find a precompiled binary. Of course, the precompiled binary is not what you want, since you want to build the latest trunk. Sorry for any confusion. ^ permalink raw reply [flat|nested] 88+ messages in thread
[parent not found: <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>]
* Re: Composing Hebrew diacriticals [not found] ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com> @ 2010-05-15 2:14 ` Kenichi Handa 2010-05-15 21:35 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-15 2:14 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Now most simple compositions (niqqud only) works. > I stll see these issues: > 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot > (U+5C2) which breaks the composition. In my environment, the sequence U+5E9 U+5C2 is surely composed. Please show me the output of this command. % MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")' Mine is this; [FLT] (hebr-ff (dejavu sans) [FLT] (SOURCE 05E9 05C2) [FLT] (STAGE 0 "Hh" (05E9 05C2) [FLT] (SUBPART 0 [FLT] (COND [FLT] (REGEX "^Hhh*" "Hh" 2 [FLT] <0 [FLT] :otf=hebr=ccmp+mark 2>)))) [FLT] (RESULT (1299 1216 0 0))) > 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is > composed to the right of the letter instead of left. It seems that "dejavu sans" font doesn't have 'ccmp' nor 'mark' feature applicable to that sequence. It is possible to detect such a case and adjust glyph-string by Lisp code. I'll try to implement that kind of code. > 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into > presentation form (U+FB1D). This should only happen with specific > control (Either CGJ or ZWJ I'll check). I tried to show that sequence with gedit and and firefox. Both show it with U+FB1D. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-15 2:14 ` Kenichi Handa @ 2010-05-15 21:35 ` Yair F 2010-05-17 4:35 ` Kenichi Handa 2010-05-17 13:53 ` Stefan Monnier 0 siblings, 2 replies; 88+ messages in thread From: Yair F @ 2010-05-15 21:35 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Sat, May 15, 2010 at 5:14 AM, Kenichi Handa <handa@m17n.org> wrote: > In article <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: >> Now most simple compositions (niqqud only) works. > >> I stll see these issues: >> 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot >> (U+5C2) which breaks the composition. > > In my environment, the sequence U+5E9 U+5C2 is surely > composed. Please show me the output of this command. > > % MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")' > > Mine is this; > > [FLT] (hebr-ff (dejavu sans) > [FLT] (SOURCE 05E9 05C2) > [FLT] (STAGE 0 "Hh" (05E9 05C2) > [FLT] (SUBPART 0 > [FLT] (COND > [FLT] (REGEX "^Hhh*" "Hh" 2 > [FLT] <0 > [FLT] :otf=hebr=ccmp+mark 2>)))) > [FLT] (RESULT (1299 1216 0 0))) Mine is [FLT] (hebr-ff (dejavu sans) [FLT] (SOURCE 05E9) [FLT] (STAGE 0 "H" (05E9) [FLT] (SUBPART 0 [FLT] (COND [FLT] (REGEX "^." "H" 1 [FLT] (COPY 0x5E9))))) [FLT] (RESULT (053D 1152 0 0))) > >> 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is >> composed to the right of the letter instead of left. > > It seems that "dejavu sans" font doesn't have 'ccmp' nor > 'mark' feature applicable to that sequence. > In that case, may I suggest switching to a different font? >> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into >> presentation form (U+FB1D). This should only happen with specific >> control (Either CGJ or ZWJ I'll check). > > I tried to show that sequence with gedit and and firefox. > Both show it with U+FB1D. Kate show this as well, Firefox doesn't. Nevertheless it's still a bug. Thank you for investing effort in Hebrew composition. Yair ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-15 21:35 ` Yair F @ 2010-05-17 4:35 ` Kenichi Handa 2010-05-17 17:32 ` Eli Zaretskii ` (2 more replies) 2010-05-17 13:53 ` Stefan Monnier 1 sibling, 3 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-17 4:35 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTin5fDs_thjY2I3S4anUDpgkqbzfZt2TBdzRo-8Y@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: >>> 1. Letter Shin (U+5E9) is not composed. It is followed by Shin-dot >>> (U+5C2) which breaks the composition. > > > > In my environment, the sequence U+5E9 U+5C2 is surely > > composed. Please show me the output of this command. > > > > % MDEBUG_FLT=3 emacs --eval '(message "\u05E9\u05C2")' [...] > Mine is > [FLT] (hebr-ff (dejavu sans) > [FLT] (SOURCE 05E9) > [FLT] (STAGE 0 "H" (05E9) > [FLT] (SUBPART 0 > [FLT] (COND > [FLT] (REGEX "^." "H" 1 > [FLT] (COPY 0x5E9))))) > [FLT] (RESULT (053D 1152 0 0))) Are you using the same setting as mine which I wrote as below in the previous mail? > I have not yet committed proper codes for Hebrew > composition. I'm now testing with this simple version. > > (let ((pattern "[\u05D0-\u05F2][\u0591-\u05BF\u05C1-\u05C5\u05C7]+")) > (set-char-table-range > composition-function-table '(#x591 . #x5C7) > (list (vector pattern 1 'font-shape-gstring) > ["[\u0591-\u05C7]" 0 font-shape-gstring])) > (set-char-table-range > composition-function-table #x5C0 nil) > (set-char-table-range > composition-function-table #x5C6 nil)) >>> 2. Letter Lamed (U+5DC) Followed by Holam (U+5B9) The point is >>> composed to the right of the letter instead of left. > > > > It seems that "dejavu sans" font doesn't have 'ccmp' nor > > 'mark' feature applicable to that sequence. > In that case, may I suggest switching to a different font? It's possible to write such a code (which will slow down the display), but if you don't have a better font, we anyway have to use the available font. By the way, do you have a better font than 'dejavu sans' for Hebrew? >>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into >>> presentation form (U+FB1D). This should only happen with specific >>> control (Either CGJ or ZWJ I'll check). > > > > I tried to show that sequence with gedit and and firefox. > > Both show it with U+FB1D. > Kate show this as well, Firefox doesn't. Nevertheless it's still a bug. Then what is the correct rendering of the sequence "\u05D9\u05B4"? Applying only `mark' feature? With dejavu sans font, U+FB1D has lower dot above baseline, but just applying `mark' feature results in displaying dot below baseline. See the attached image. The left one is the composition with `ccmp' GSUB feature (i.e. into U+FB1D) for the sequence "\u05D9\u200D\u05B4", and the right one is the composition with `mark' GPOS feature for the sequence "\u05D9\u05B4". Is this the correct rendering? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-17 4:35 ` Kenichi Handa @ 2010-05-17 17:32 ` Eli Zaretskii 2010-05-18 0:36 ` Kenichi Handa 2010-05-17 21:08 ` Yair F 2010-05-18 7:29 ` Composing Hebrew diacriticals Eli Zaretskii 2 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-17 17:32 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Mon, 17 May 2010 13:35:42 +0900 > Cc: emacs-devel@gnu.org > > See the attached image. I didn't receive any attachments. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-17 17:32 ` Eli Zaretskii @ 2010-05-18 0:36 ` Kenichi Handa 0 siblings, 0 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-18 0:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: yair.f.lists, emacs-devel [-- Attachment #1: Type: text/plain, Size: 247 bytes --] In article <83ljbibfiy.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > From: Kenichi Handa <handa@m17n.org> > > > > See the attached image. > I didn't receive any attachments. Oops, sorry, here it is. --- Kenichi Handa handa@m17n.org [-- Attachment #2: hebrew.png --] [-- Type: image/png, Size: 901 bytes --] ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-17 4:35 ` Kenichi Handa 2010-05-17 17:32 ` Eli Zaretskii @ 2010-05-17 21:08 ` Yair F 2010-05-20 2:09 ` Kenichi Handa 2010-05-18 7:29 ` Composing Hebrew diacriticals Eli Zaretskii 2 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-17 21:08 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Mon, May 17, 2010 at 7:35 AM, Kenichi Handa <handa@m17n.org> wrote: > In article <AANLkTin5fDs_thjY2I3S4anUDpgkqbzfZt2TBdzRo-8Y@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Are you using the same setting as mine which I wrote as > below in the previous mail? > No. I was using this: ;; For automatic composition. (defconst hebrew-composable-pattern (concat "\\(" "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]" ;; base (allows rafe) "[\u05BC\u05BF]?" ;; 0-1 marks of 1st class (dagesh/rafe) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05D5" ;; base (vav) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05BB\u05C7]?" ;; 0-1 marks of extended 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05E9" ;; base (shin) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05C1\u05C2]?" ;; 0-1 marks of 2nd class (shin dot) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05F1-\u05F3]" ;; base (yidish ligatures) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\)") "Regexp matching a composable sequence of Hebrew characters.") (set-char-table-range composition-function-table '(#x591 . #x5F4) (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) With your changes there is some composition. But this word doesn't compose properly: עַשֶּׁשֶׁת First Shin (u+05E9) composes with Dagesh (u+05BC), This shin-dot (u+05C1) isn't visible, the Segol (u+05B6) Goes under the previous base letter. what-cursor-position gives this: display: composed to form "שֶּׁ" (see below) Composed with the following character(s) "ֶּׁ" using this font: xft:-unknown-DejaVu Sans-normal-normal-normal-*-23-*-*-*-*-0-iso10646-1 by these glyphs: [0 3 0 4786 18 2 16 13 0 nil] [0 3 1473 1311 0 15 17 16 -14 nil] [0 3 1462 1300 0 5 11 -1 6 nil] MDEBUG_FLT=3 emacs - --eval ' (message "\u05E9\u05BC\u05C1\u05B6")' [FLT] (hebr-ff (dejavu sans) [FLT] (SOURCE 05E9 05BC 05C1 05B6) [FLT] (STAGE 0 "Hhhh" (05E9 05BC 05C1 05B6) [FLT] (SUBPART 0 [FLT] (COND [FLT] (REGEX "^Hhh*" "Hhhh" 4 [FLT] <0 [FLT] :otf=hebr=ccmp+mark 4>)))) [FLT] (RESULT (12B2 1152 0 0) (051F 0 0 0) (0514 0 0 0))) > By the way, do you have a better font than 'dejavu sans' for > Hebrew? There are 2 major options: Fonts from culmus package (http://culmus.sourceforge.net/): Miriam Mono (blends with couier), David (Serif), Nachlieli (OpenOffice default) or most of the others. The other option is using msttcorefonts. I'll try to approach Dejavu designers as well. > >>>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into >>>> presentation form (U+FB1D). This should only happen with specific >>>> control (Either CGJ or ZWJ I'll check). > Then what is the correct rendering of the sequence "\u05D9\u05B4"? Hiriq should be rendered below the baseline like under all other letters. > See the attached image. I'm sorry but the attachment was lost. Can you please resent it? I ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-17 21:08 ` Yair F @ 2010-05-20 2:09 ` Kenichi Handa 2010-05-25 1:45 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-20 2:09 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTinYS8B06ev3fWapc4eRqy_4Q2UQ7UPQggJRdmnw@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > > By the way, do you have a better font than 'dejavu sans' for > > Hebrew? > There are 2 major options: Fonts from culmus package > (http://culmus.sourceforge.net/): > Miriam Mono (blends with couier), David (Serif), Nachlieli (OpenOffice > default) or most of the others. > The other option is using msttcorefonts. I downloaded culmus fonts. It seems "David CLM" font is quite good. And, while testing with that font, I found a bug in the current Emacs code. I'll fix it soon. Please wait for a while. > I'll try to approach Dejavu designers as well. Thank you. Another problem I found with Dejavu Sans is that that lookup tables implementing "mark" feature for hebrew don't have proper LookupFlag to ignore previous combining marks (GDEF class 3). I think that's why it can't compose two or more diacricals correctly. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-20 2:09 ` Kenichi Handa @ 2010-05-25 1:45 ` Kenichi Handa 2010-05-25 20:56 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-25 1:45 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > I downloaded culmus fonts. It seems "David CLM" font is > quite good. And, while testing with that font, I found a > bug in the current Emacs code. I'll fix it soon. Please > wait for a while. I've just committed fixes. Please try the latest code. With David CLM font, just font-shape-gstring work well. But with Dejavu Sans font, as it lacks several OTF capabilities, we have to adjust the result of font-shape-gstring. For that, I implemented hebrew-shape-gstring. So now Emacs should be able to use both fonts. I have not yet adopted your code for setting up composition-function-table. It requires some modifications. For instance, we should register hebrew-shape-gstring only for combining characters for fast displaying of a Hebrew text that doesn't have diacriticals. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-25 1:45 ` Kenichi Handa @ 2010-05-25 20:56 ` Yair F 2010-05-26 0:36 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-25 20:56 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Tue, May 25, 2010 at 4:45 AM, Kenichi Handa <handa@m17n.org> wrote: > In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > I've just committed fixes. Please try the latest code. > > With David CLM font, just font-shape-gstring work well. I was not able to compose with this code (revision 100441) with any of CLM or MS fonts. MDEBUG_FLT doesn't produce output. Instead this appear in the *messages* buffer Error during redisplay: (wrong-type-argument arrayp nil) [148 times] > But > with Dejavu Sans font, as it lacks several OTF capabilities Dejavu sans still composes with no difference from my last report. , > I have not yet adopted your code for setting up > composition-function-table. It requires some modifications. > For instance, we should register hebrew-shape-gstring only > for combining characters for fast displaying of a Hebrew > text that doesn't have diacriticals. Good, it needs further refinement. In addition I found out that scholars sometimes compose Hebrew letters with diacritical marks form U+0300 block as well. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-25 20:56 ` Yair F @ 2010-05-26 0:36 ` Kenichi Handa 2010-05-26 4:37 ` Yair F 2010-05-26 13:28 ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier 0 siblings, 2 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-26 0:36 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTil_LKUI8n6kQvbj6CIDtPi3UQnyfDFp1dPunxOy@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Tue, May 25, 2010 at 4:45 AM, Kenichi Handa <handa@m17n.org> wrote: > > In article <tl7r5l79vdi.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > > I've just committed fixes. Please try the latest code. > > > > With David CLM font, just font-shape-gstring work well. > I was not able to compose with this code (revision 100441) with any of > CLM or MS fonts. > MDEBUG_FLT doesn't produce output. Instead this appear in the *messages* buffer > Error during redisplay: (wrong-type-argument arrayp nil) [148 times] Ummm, strange. I don't see that message. I can see correct Hebrew composing with this: % emacs -Q ESC : (setq-default bidi-display-reordering t) RET ESC : (set-fontset-font t 'hebrew '("David CLM" . "unicode-bmp")) Please run Emacs under gdb, set breakpoint at Fsignal, and find out which code causes the above error. % cd .../emacs/src % gdb emacs (gdb) br Fsignal (gdb) run -Q --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-26 0:36 ` Kenichi Handa @ 2010-05-26 4:37 ` Yair F 2010-05-26 6:00 ` Kenichi Handa 2010-05-26 13:28 ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier 1 sibling, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-26 4:37 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Wed, May 26, 2010 at 3:36 AM, Kenichi Handa <handa@m17n.org> wrote: > Please run Emacs under gdb, set breakpoint at Fsignal, and > find out which code causes the above error. It's been a while since I last run gdb and never for Emacs. If you need more information please specify the commands. (gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t) (setq debug-on-error t) (message "\u05E9\u05C2"))' Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t) (setq debug-on-error t) (message "\u05E9\u05C2"))' [Thread debugging using libthread_db enabled] Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 1686 register struct handler *allhandlers = handlerlist; (gdb) bt #0 Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 #1 0x08301401 in xsignal (error_symbol=142100970, data=143920942) at eval.c:1796 #2 0x08301467 in xsignal2 (error_symbol=142100970, arg1=142101594, arg2=142070194) at eval.c:1820 #3 0x082dea27 in wrong_type_argument (predicate=0, value=5) at data.c:118 #4 0x082e612e in Faref (array=142070194, idx=0) at data.c:2215 #5 0x0837c054 in Fbyte_code (bytestr=140298249, vector=140343389, maxdepth=8) at bytecode.c:980 #6 0x08305d62 in funcall_lambda (fun=140343341, nargs=1, arg_vector=0xbfffa9a0) at eval.c:3260 #7 0x08305786 in apply_lambda (fun=140343341, args=140362406, eval_flag=1) at eval.c:3184 #8 0x083035d1 in Feval (form=140362398) at eval.c:2456 #9 0x083001a8 in Flet (args=140362590) at eval.c:1072 #10 0x08302c6b in Feval (form=140362382) at eval.c:2361 #11 0x082fe7c0 in Fprogn (args=140363030) at eval.c:416 #12 0x08305c53 in funcall_lambda (fun=140363046, nargs=1, arg_vector=0xbfffad94) at eval.c:3253 #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130 #14 0x0837afb2 in Fbyte_code (bytestr=140347985, vector=140348053, maxdepth=24) at bytecode.c:680 #15 0x08305d62 in funcall_lambda (fun=140347925, nargs=5, arg_vector=0xbfffb1e0) at eval.c:3260 #16 0x08305541 in Ffuncall (nargs=6, args=0xbfffb1dc) at eval.c:3119 #17 0x083010a8 in internal_condition_case_n (bfun=0x8304689 <Ffuncall>, nargs=6, args=0xbfffb1dc, handlers=142070218, hfun=0x808f2b6 <safe_eval_handler>) at eval.c:1656 #18 0x0808f358 in safe_call (nargs=6, args=0xbfffb1dc) at xdisp.c:2454 #19 0x083b2c39 in autocmp_chars (rule=144490029, charpos=1, bytepos=1, limit=3, win=0x8b1dc70, face=0x8b10940, string=142070194) at composite.c:991 #20 0x083b57d4 in composition_reseat_it (cmp_it=0xbfffdab8, charpos=1, bytepos=1, endpos=3, w=0x8b1dc70, face=0x8b10940, string=142070194) at composite.c:1321 #21 0x0809ddf5 in next_element_from_buffer (it=0xbfffd600) at xdisp.c:6984 #22 0x0809a219 in get_next_display_element (it=0xbfffd600) at xdisp.c:5886 #23 0x0809e56a in move_it_in_display_line_to (it=0xbfffd600, to_charpos=3, to_x=-1, op=MOVE_TO_POS) at xdisp.c:7246 #24 0x0809fcdd in move_it_to (it=0xbfffd600, to_charpos=3, to_x=-1, to_y=-1, to_vpos=-1, op=8) at xdisp.c:7747 #25 0x080a5704 in resize_mini_window (w=0x8b1dc70, exact_p=0) at xdisp.c:9242 #26 0x080a4ed0 in display_echo_area_1 (a1=145874032, a2=142070194, a3=0, a4=0) at xdisp.c:9105 #27 0x080a3aba in with_echo_area_buffer (w=0x8b1dc70, which=0, fn=0x80a4eaa <display_echo_area_1>, a1=145874032, a2=142070194, a3=0, a4=0) at xdisp.c:8892 #28 0x080a4e78 in display_echo_area (w=0x8b1dc70) at xdisp.c:9073 #29 0x080a6a13 in echo_area_display (update_frame_p=1) at xdisp.c:9671 #30 0x080a2716 in message3_nolog (m=145559521, nbytes=4, multibyte=1) at xdisp.c:8568 #31 0x080a21e4 in message3 (m=145559521, nbytes=4, multibyte=1) at xdisp.c:8503 #32 0x082f68ff in Fmessage (nargs=1, args=0xbfffe070) at editfns.c:3418 #33 0x08302e13 in Feval (form=143921278) at eval.c:2387 #34 0x082fe7c0 in Fprogn (args=143921262) at eval.c:416 #35 0x08302c6b in Feval (form=143921350) at eval.c:2361 #36 0x08304fbf in Ffuncall (nargs=2, args=0xbfffe320) at eval.c:3073 #37 0x0837afb2 in Fbyte_code (bytestr=140089481, vector=140089957, maxdepth=40) at bytecode.c:680 #38 0x08305d62 in funcall_lambda (fun=140089453, nargs=1, arg_vector=0xbfffe614) at eval.c:3260 #39 0x08305541 in Ffuncall (nargs=2, args=0xbfffe610) at eval.c:3119 #40 0x0837afb2 in Fbyte_code (bytestr=140072329, vector=140073261, maxdepth=28) at bytecode.c:680 #41 0x08305d62 in funcall_lambda (fun=140072309, nargs=0, arg_vector=0xbfffe8f4) at eval.c:3260 #42 0x08305541 in Ffuncall (nargs=1, args=0xbfffe8f0) at eval.c:3119 #43 0x0837afb2 in Fbyte_code (bytestr=140069033, vector=140069253, maxdepth=24) at bytecode.c:680 #44 0x08305d62 in funcall_lambda (fun=140069013, nargs=0, arg_vector=0xbfffeb20) at eval.c:3260 #45 0x08305786 in apply_lambda (fun=140069013, args=142070194, eval_flag=1) at eval.c:3184 #46 0x083035d1 in Feval (form=142425222) at eval.c:2456 #47 0x0823350a in top_level_2 () at keyboard.c:1365 #48 0x08300d82 in internal_condition_case (bfun=0x82334f7 <top_level_2>, handlers=142100922, hfun=0x8232fe7 <cmd_error>) at eval.c:1510 #49 0x08233541 in top_level_1 () at keyboard.c:1373 #50 0x083007bf in internal_catch (tag=142099042, func=0x823350c <top_level_1>, arg=142070194) at eval.c:1246 #51 0x0823347a in command_loop () at keyboard.c:1328 #52 0x082326ee in recursive_edit_1 () at keyboard.c:950 #53 0x08232c0b in Frecursive_edit () at keyboard.c:1012 #54 0x0823088c in main (argc=6, argv=0xbffff324) at emacs.c:1801 Lisp Backtrace: "lgstring-header" (0xbfffa9a0) "let" (0xbfffac2c) "hebrew-shape-gstring" (0xbfffad94) "auto-compose-chars" (0xbfffb1e0) "message" (0xbfffe070) "progn" (0xbfffe22c) "eval" (0xbfffe324) "command-line-1" (0xbfffe614) "command-line" (0xbfffe8f4) "normal-top-level" (0xbfffeb20) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-26 4:37 ` Yair F @ 2010-05-26 6:00 ` Kenichi Handa 2010-05-26 16:12 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-26 6:00 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTilIzGMNuvACJucMikm5z0U-kik58W-5ezcB-X1G@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Wed, May 26, 2010 at 3:36 AM, Kenichi Handa <handa@m17n.org> wrote: > > Please run Emacs under gdb, set breakpoint at Fsignal, and > > find out which code causes the above error. > It's been a while since I last run gdb and never for Emacs. If you > need more information please specify the commands. > (gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default > bidi-display-reordering t) (setq debug-on-error t) (message > "\u05E9\u05C2"))' > Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn > 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t) > (setq debug-on-error t) (message "\u05E9\u05C2"))' > [Thread debugging using libthread_db enabled] > Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 > 1686 register struct handler *allhandlers = handlerlist; > (gdb) bt > #0 Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 [...] > #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130 Please show me the result of this: (gdb) up 13 (gdb) pp args[1] --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-26 6:00 ` Kenichi Handa @ 2010-05-26 16:12 ` Yair F 2010-05-27 7:27 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-26 16:12 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Wed, May 26, 2010 at 9:00 AM, Kenichi Handa <handa@m17n.org> wrote: >> (gdb) run -Q -fn 'David CLM-20' --eval '(progn (setq-default >> bidi-display-reordering t) (setq debug-on-error t) (message >> "\u05E9\u05C2"))' >> Starting program: /home/yair/emacs/src/emacs/trunk/src/emacs -Q -fn >> 'David CLM-20' --eval '(progn (setq-default bidi-display-reordering t) >> (setq debug-on-error t) (message "\u05E9\u05C2"))' >> [Thread debugging using libthread_db enabled] > >> Breakpoint 3, Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 >> 1686 register struct handler *allhandlers = handlerlist; >> (gdb) bt >> #0 Fsignal (error_symbol=142100970, data=143920942) at eval.c:1686 > [...] >> #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130 > > Please show me the result of this: > > (gdb) up 13 > (gdb) pp args[1] #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130 3130 val = funcall_lambda (fun, numargs, args + 1); (gdb) pp args[1] [[#<font-object "-bitstream-David CLM-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1"> 1513 1474] nil [0 0 1513 141 15 0 16 15 1 nil] [1 1 1474 113 0 3 6 17 -13 nil] nil nil nil nil nil nil] (gdb) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-26 16:12 ` Yair F @ 2010-05-27 7:27 ` Kenichi Handa 2010-05-27 21:59 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-27 7:27 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTinqUAXWipZRQoYQMnnTSWbSJ5rIppF4EjPlfx27@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > > (gdb) up 13 > > (gdb) pp args[1] > #13 0x08305606 in Ffuncall (nargs=2, args=0xbfffad90) at eval.c:3130 > 3130 val = funcall_lambda (fun, numargs, args + 1); > (gdb) pp args[1] > [[#<font-object "-bitstream-David > CLM-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1"> 1513 1474] nil [0 > 0 1513 141 15 0 16 15 1 nil] [1 1 1474 113 0 3 6 17 -13 nil] nil nil > nil nil nil nil] > (gdb) This indicates that somehow the "shape" method of a font driver didn't work correctly. Please set the break point at Ffont_shape_gstring, and investigate why that function doesn't reach this last code by stepping through the code using n(ext) command of gdb: L4600: return composition_gstring_put_cache (gstring, XINT (n)); By the way, the Lisp error was because of a bug in hebrew-shape-gstring. I've committed a fix. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-27 7:27 ` Kenichi Handa @ 2010-05-27 21:59 ` Yair F 2010-05-28 0:42 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-27 21:59 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Thu, May 27, 2010 at 10:27 AM, Kenichi Handa <handa@m17n.org> wrote: > This indicates that somehow the "shape" method of a font > driver didn't work correctly. Please set the break point at > Ffont_shape_gstring, and investigate why that function > doesn't reach this last code by stepping through the code > using n(ext) command of gdb: Ffont_shape_gstring fails because of otf == 0 p *ftfont_info $5 = { font = { size = 1075838994, next = 0x8bef578, props = {142227066, 145465234, 145465186, 142070194, 142222906, 102720, 102528, 102656, 108, 142070194, 142070194, 0, 144727838, 142070194, 145453985, 145453969, 143750657, 145465258}, max_width = 19, pixel_size = 27, height = 29, space_width = 10, average_width = 12, min_width = 10, ascent = 22, descent = 7, underline_thickness = 1, underline_position = 2, vertical_centering = 0, encoding_type = 200 '\310', baseline_offset = 0, relative_compose = 0, default_ascent = 0, font_encoder = 0x8b8b0e0, driver = 0x8771fa0, encoding_charset = -1, repertory_charset = -1 }, maybe_otf = 0, otf = 0x0, ft_size = 0x8ae9010, index = 35, matrix = { xx = 0, xy = 38, yx = 147122784, yy = 39 } } ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-27 21:59 ` Yair F @ 2010-05-28 0:42 ` Kenichi Handa 2010-06-01 8:58 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-05-28 0:42 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTilqW4R3C5RsSWEXkRMUOR6JH0pppYIMclqnqwAq@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Thu, May 27, 2010 at 10:27 AM, Kenichi Handa <handa@m17n.org> wrote: > > This indicates that somehow the "shape" method of a font > > driver didn't work correctly. Please set the break point at > > Ffont_shape_gstring, and investigate why that function > > doesn't reach this last code by stepping through the code > > using n(ext) command of gdb: > Ffont_shape_gstring fails because of otf == 0 > p *ftfont_info > $5 = { [...] > maybe_otf = 0, > otf = 0x0, Then please find why maybe_otf and otf are set to zero by stepping through the code of ftfont_get_otf which is called from ftfont_shape. BTW, are you sure that your Emacs is compiled with libotf? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-28 0:42 ` Kenichi Handa @ 2010-06-01 8:58 ` Yair F 0 siblings, 0 replies; 88+ messages in thread From: Yair F @ 2010-06-01 8:58 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote: > Then please find why maybe_otf and otf are set to zero by > stepping through the code of ftfont_get_otf which is called > from ftfont_shape. ftfont_get_otf sets otf only if maybe_otf != 0. maybe_otf is initialized from ft_face->face_flags in xftfont_open. For David CLM maybe_otf = 0 because ft_face->face_flags = 2577. For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Enabling bidi (was: Composing Hebrew diacriticals) 2010-05-26 0:36 ` Kenichi Handa 2010-05-26 4:37 ` Yair F @ 2010-05-26 13:28 ` Stefan Monnier 2010-05-26 17:14 ` Eli Zaretskii 1 sibling, 1 reply; 88+ messages in thread From: Stefan Monnier @ 2010-05-26 13:28 UTC (permalink / raw) To: emacs-devel > % emacs -Q > ESC : (setq-default bidi-display-reordering t) RET I think it's time we set bidi-display-reordering to t by default. Any objection? Stefan ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Enabling bidi (was: Composing Hebrew diacriticals) 2010-05-26 13:28 ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier @ 2010-05-26 17:14 ` Eli Zaretskii 2010-05-27 4:13 ` Enabling bidi Stefan Monnier 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-26 17:14 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Wed, 26 May 2010 09:28:13 -0400 > > > % emacs -Q > > ESC : (setq-default bidi-display-reordering t) RET > > I think it's time we set bidi-display-reordering to t by default. > Any objection? There are still 2 problems I'd like to fix before we do this: . Annoying unnecessary scrolls when moving cursor through continued lines (for one of its manifestations, see http://lists.gnu.org/archive/html/emacs-devel/2010-05/msg00634.html) . Cursor positioning problems with truncated lines (cursor gets stuck at end of lines truncated on the left) These problems affect anyone who turns on the bidi reordering, even if all the characters in the buffer are strict L2R and no reordering actually happens. The first problem will almost certainly be fixed this weekend (I already know where's the bug and how to fix that). The second one needs to be debugged, but I hope to get it nailed by this coming Sunday as well. However, if you think these problems are not grave enough to delay, go ahead and turn it on, and let's see how much I screwed up. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Enabling bidi 2010-05-26 17:14 ` Eli Zaretskii @ 2010-05-27 4:13 ` Stefan Monnier 2010-05-27 17:43 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Stefan Monnier @ 2010-05-27 4:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > However, if you think these problems are not grave enough to delay, go > ahead and turn it on, and let's see how much I screwed up. If you still have enough work to do as is, then we can leave it nil for now. But please set it to t as soon as you're done with such "obvious problems". Stefan ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Enabling bidi 2010-05-27 4:13 ` Enabling bidi Stefan Monnier @ 2010-05-27 17:43 ` Eli Zaretskii 0 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-27 17:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: emacs-devel@gnu.org > Date: Thu, 27 May 2010 00:13:01 -0400 > > If you still have enough work to do as is, then we can leave it nil > for now. But please set it to t as soon as you're done with such > "obvious problems". Will do. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-17 4:35 ` Kenichi Handa 2010-05-17 17:32 ` Eli Zaretskii 2010-05-17 21:08 ` Yair F @ 2010-05-18 7:29 ` Eli Zaretskii 2 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-18 7:29 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Mon, 17 May 2010 13:35:42 +0900 > Cc: emacs-devel@gnu.org > > >>> 3. Letter Yod (U+5D9) composed with Hiriq (U+5B4) is composed into > >>> presentation form (U+FB1D). This should only happen with specific > >>> control (Either CGJ or ZWJ I'll check). > > > > > > I tried to show that sequence with gedit and and firefox. > > > Both show it with U+FB1D. > > > Kate show this as well, Firefox doesn't. Nevertheless it's still a bug. > > Then what is the correct rendering of the sequence "\u05D9\u05B4"? > Applying only `mark' feature? With dejavu sans font, U+FB1D > has lower dot above baseline, but just applying `mark' > feature results in displaying dot below baseline. > > See the attached image. The left one is the composition > with `ccmp' GSUB feature (i.e. into U+FB1D) for the sequence > "\u05D9\u200D\u05B4", and the right one is the composition > with `mark' GPOS feature for the sequence "\u05D9\u05B4". > Is this the correct rendering? IMHO, U+FB1D looks quite ugly with this font, but I guess that's not really the issue here. Otherwise, looks good to me, although I'm far from being an expert on Hebrew typesetting. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-15 21:35 ` Yair F 2010-05-17 4:35 ` Kenichi Handa @ 2010-05-17 13:53 ` Stefan Monnier 1 sibling, 0 replies; 88+ messages in thread From: Stefan Monnier @ 2010-05-17 13:53 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, Kenichi Handa >> It seems that "dejavu sans" font doesn't have 'ccmp' nor >> 'mark' feature applicable to that sequence. > In that case, may I suggest switching to a different font? More constructive would be to first report the problem so that the font can be fixed (assuming the problem is indeed in the font data rather than in Emacs's use of it). Stefan ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 8:04 ` Kenichi Handa 2010-05-12 17:35 ` Eli Zaretskii @ 2010-05-19 17:23 ` Eli Zaretskii 1 sibling, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-19 17:23 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org > Date: Wed, 12 May 2010 17:04:09 +0900 > > I found that cursor positioning sometimes failed even for a text > that doesn't involve composition. For instance, when I have this > text in a buffer or bidi-display-reordering t, > ---------- > A א B. > בג > ---------- > and point is at (i.e. before) "A", typing C-e moves point to > "ג". Cursor motion problems aside, this turned out to be a genuine bug in move-end-of-line: it assumed that the first character in every line necessarily follows the newline which ends the previous line. This is, of course, false when characters are reordered for display under bidi-display-reordering. I think I fixed it now (revno 100369). ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-11 12:20 ` Kenichi Handa 2010-05-11 16:22 ` Eli Zaretskii @ 2010-05-11 21:40 ` Yair F 2010-05-12 3:15 ` Eli Zaretskii 1 sibling, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-11 21:40 UTC (permalink / raw) To: Kenichi Handa; +Cc: eliz, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3011 bytes --] Is there a way to get what-cursor-position show composition? On Tue, May 11, 2010 at 3:20 PM, Kenichi Handa <handa@m17n.org> wrote: > Could you please check the rendering (position of > diacriticals, etc.)? I see some problems, some not present in your image see my example attached: In the phrase: שָׁלוֹם לְמִשְׁתַּמְּשֵׁי אִמַאקְס The Holam point above Vav is not centred above the letter. The Dagesh points in the Tav and Mem letters is not centred but too close to the left side of the letter. In the phrase: A "אֲעוֹלֵל 123 כַּגֶּפֶן" B. The point Hataf-Patah under the letter Alef is not centred, but is placed slightly to left than where it should. This also visible in your sample. The last word: עַשֶּׁשֶׁת The Dagesh point on the first (right) Shin is not visible at all. (e.g. http://he.wiktionary.org/wiki/%D7%A2%D7%A9%D7%A9%D7%AA) The font used is xft:-unknown-DejaVu Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 Font version 2.29 When these sample are rendered correctly you might want to try more complex such as http://www.mechon-mamre.org/c/ct/c2831.htm V12 has a challenging example. I now use a refined composition regexp: (defconst hebrew-composable-pattern (concat "\\(" "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]" ;; base (allows rafe) "[\u05BC\u05BF]?" ;; 0-1 marks of 1st class (dagesh/rafe) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05D5" ;; base (vav) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05BB\u05C7]?" ;; 0-1 marks of extended 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05E9" ;; base (shin) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05C1\u05C2]?" ;; 0-1 marks of 2nd class (shin dot) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05F1-\u05F3]" ;; base (yidish ligatures) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\)") "Regexp matching a composable sequence of Hebrew characters.") (set-char-table-range composition-function-table '(#x591 . #x5F4) (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) [-- Attachment #2: c.png --] [-- Type: image/png, Size: 19876 bytes --] ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-11 21:40 ` Yair F @ 2010-05-12 3:15 ` Eli Zaretskii 2010-05-12 15:11 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-12 3:15 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Wed, 12 May 2010 00:40:59 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: eliz@gnu.org, emacs-devel@gnu.org > > Is there a way to get what-cursor-position show composition? Try "C-u C-x =". > I see some problems, some not present in your image see my example attached: How did you get the images you posted? I don't think Handa-san posted his changes to Emacs yet, at least I didn't see them. Did I miss something? ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 3:15 ` Eli Zaretskii @ 2010-05-12 15:11 ` Yair F 2010-05-12 17:43 ` Eli Zaretskii 2010-05-13 0:29 ` Kenichi Handa 0 siblings, 2 replies; 88+ messages in thread From: Yair F @ 2010-05-12 15:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa On Wed, May 12, 2010 at 6:15 AM, Eli Zaretskii <eliz@gnu.org> wrote: >> Date: Wed, 12 May 2010 00:40:59 +0300 >> From: Yair F <yair.f.lists@gmail.com> >> Cc: eliz@gnu.org, emacs-devel@gnu.org >> >> Is there a way to get what-cursor-position show composition? > > Try "C-u C-x =". > This would only show the information for the character after the point whether it's a based or composed mark. But it does not provide information if composition occurred. >> I see some problems, some not present in your image see my example attached: > > How did you get the images you posted? I don't think Handa-san posted > his changes to Emacs yet, at least I didn't see them. Did I miss > something? > Sorry, it is my mistake. I though that Handa-san already comitted his work. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 15:11 ` Yair F @ 2010-05-12 17:43 ` Eli Zaretskii 2010-05-12 22:01 ` Yair F 2010-05-13 0:29 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-12 17:43 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Wed, 12 May 2010 18:11:16 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: handa@m17n.org, emacs-devel@gnu.org > > On Wed, May 12, 2010 at 6:15 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> Date: Wed, 12 May 2010 00:40:59 +0300 > >> From: Yair F <yair.f.lists@gmail.com> > >> Cc: eliz@gnu.org, emacs-devel@gnu.org > >> > >> Is there a way to get what-cursor-position show composition? > > > > Try "C-u C-x =". > > > This would only show the information for the character after the point > whether it's a based or composed mark. But it does not provide > information if composition occurred. Sorry, I don't understand what you mean. If "C-u C-x =" shows you something like this: display: composed to form "ດີ" (see below) Composed with the following character(s) "ີ" using this font: uniscribe:-outline-DejaVu Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 by these glyphs: [0 1 3732 1091 8 1 7 12 3 nil] [0 1 3732 1117 0 0 8 12 3 [-8 1 0]] then you know that character at point was composed and that the composition did happen. What would you like to see instead? ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 17:43 ` Eli Zaretskii @ 2010-05-12 22:01 ` Yair F 2010-05-13 17:14 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-12 22:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa On Wed, May 12, 2010 at 8:43 PM, Eli Zaretskii <eliz@gnu.org> wrote: > Sorry, I don't understand what you mean. If "C-u C-x =" shows you > something like this: > > display: composed to form "ດີ" (see below) > > Composed with the following character(s) "ີ" using this font: > uniscribe:-outline-DejaVu Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 > by these glyphs: > [0 1 3732 1091 8 1 7 12 3 nil] > [0 1 3732 1117 0 0 8 12 3 [-8 1 0]] > > then you know that character at point was composed and that the > composition did happen. What would you like to see instead? > For Hebrew, the diplay is a bit different (no composition info): display: by this font (glyph code) xft:-unknown-DejaVu Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x53D) Character code properties: customize what to show name: HEBREW LETTER SHIN ... display: by this font (glyph code) xft:-unknown-DejaVu Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x51F) Character code properties: customize what to show name: HEBREW POINT SHIN DOT ... display: by this font (glyph code) xft:-unknown-DejaVu Sans-normal-normal-normal-*-27-*-*-*-*-0-iso10646-1 (#x516) Character code properties: customize what to show name: HEBREW POINT QAMATS While a Latin test gives: display: composed to form "Ȧ" (see below) Composed with the following character(s) "̇" using this font: xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-27-*-*-*-m-0-iso10646-1 by these glyphs: [0 1 65 36 16 1 16 20 0 nil] [0 1 775 652 16 7 10 20 -17 [-16 -6 0]] Character code properties: customize what to show name: LATIN CAPITAL LETTER A ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 22:01 ` Yair F @ 2010-05-13 17:14 ` Eli Zaretskii 2010-05-13 19:46 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-13 17:14 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Thu, 13 May 2010 01:01:38 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: handa@m17n.org, emacs-devel@gnu.org > > For Hebrew, the diplay is a bit different (no composition info): IIUC, this means no composition took place. Why did you expect a composition? If this is in stock Emacs 24.0.50, then there are no compositions defined for any of the Hebrew characters out of the box. This is why we need your work. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-13 17:14 ` Eli Zaretskii @ 2010-05-13 19:46 ` Yair F 2010-05-13 19:56 ` Eli Zaretskii 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-13 19:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa [-- Attachment #1: Type: text/plain, Size: 867 bytes --] On Thu, May 13, 2010 at 8:14 PM, Eli Zaretskii <eliz@gnu.org> wrote: >> Date: Thu, 13 May 2010 01:01:38 +0300 >> From: Yair F <yair.f.lists@gmail.com> >> Cc: handa@m17n.org, emacs-devel@gnu.org >> >> For Hebrew, the diplay is a bit different (no composition info): > > IIUC, this means no composition took place. Why did you expect a > composition? If this is in stock Emacs 24.0.50, then there are no > compositions defined for any of the Hebrew characters out of the box. > This is why we need your work. > Something strange happens here as these characters *are* composed (Shin+shin dot+qamats). One more thing: In the test case attached the Latin composition sometimes occurs and sometimes not. I haven't been able to identify why. All of this applies to current trunk built with the attached lisp/languages/hebrew.el (Kubuntu/gtk/xtf) [-- Attachment #2: hebrew-sample2.txt --] [-- Type: text/plain, Size: 143 bytes --] שָׁלוֹם לְמִשְׁתַּמְּשֵׁי אִמַאקְס A "אֲעוֹלֵל 123 כַּגֶּפֶן" B. עַשֶּׁשֶׁת Ȧ [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: hebrew.el --] [-- Type: text/x-emacs-lisp; name="hebrew.el", Size: 5304 bytes --] ;;; hebrew.el --- support for Hebrew -*- coding: iso-2022-7bit; no-byte-compile: t -*- ;; Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 ;; Free Software Foundation, Inc. ;; Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, ;; 2005, 2006, 2007, 2008, 2009, 2010 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H14PRO021 ;; Copyright (C) 2003 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H13PRO009 ;; Keywords: multilingual, Hebrew ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>. ;;; Commentary: ;; For Hebrew, the character set ISO8859-8 is supported. ;; See http://www.ecma.ch/ecma1/STAND/ECMA-121.HTM. ;; Windows-1255 is also supported. ;;; Code: (define-coding-system 'hebrew-iso-8bit "ISO 2022 based 8-bit encoding for Hebrew (MIME:ISO-8859-8)." :coding-type 'charset :mnemonic ?8 :charset-list '(iso-8859-8) :mime-charset 'iso-8859-8) (define-coding-system-alias 'iso-8859-8 'hebrew-iso-8bit) ;; These are for Explicit and Implicit directionality information, as ;; defined in RFC 1556. We don't yet support directional information ;; in bidi languages, so these aliases are a lie, especially as far as ;; iso-8859-8-e is concerned. FIXME. (define-coding-system-alias 'iso-8859-8-e 'hebrew-iso-8bit) (define-coding-system-alias 'iso-8859-8-i 'hebrew-iso-8bit) (set-language-info-alist "Hebrew" '((charset iso-8859-8) (coding-priority hebrew-iso-8bit) (coding-system hebrew-iso-8bit windows-1255 cp862) (nonascii-translation . iso-8859-8) (input-method . "hebrew") (unibyte-display . hebrew-iso-8bit) (sample-text . "Hebrew ^[,Hylem^[(B") (documentation . "Right-to-left writing is not yet supported."))) (set-language-info-alist "Windows-1255" '((coding-priority windows-1255) (coding-system windows-1255) (documentation . "\ Support for Windows-1255 encoding, e.g. for Yiddish. Right-to-left writing is not yet supported."))) (define-coding-system 'windows-1255 "windows-1255 (Hebrew) encoding (MIME: WINDOWS-1255)" :coding-type 'charset :mnemonic ?h :charset-list '(windows-1255) :mime-charset 'windows-1255) (define-coding-system-alias 'cp1255 'windows-1255) (define-coding-system 'cp862 "DOS codepage 862 (Hebrew)" :coding-type 'charset :mnemonic ?D :charset-list '(cp862) :mime-charset 'cp862) (define-coding-system-alias 'ibm862 'cp862) ;; For automatic composition. (defconst hebrew-composable-pattern (concat "\\(" "[\u05D6-\u05D9\u05DC-\u05E2\u05E5-\u05E8]" ;; base "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05D0-\u05D4\u05DA\u05DB\u05E4\u05E5-\u05EA]" ;; base (allows rafe) "[\u05BC\u05BF]?" ;; 0-1 marks of 1st class (dagesh/rafe) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05D5" ;; base (vav) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05B0-\u05BB\u05C7]?" ;; 0-1 marks of extended 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "\u05E9" ;; base (shin) "\u05BC?" ;; 0-1 marks of 1st class (dagesh) "[\u05C1\u05C2]?" ;; 0-1 marks of 2nd class (shin dot) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\|" "[\u05F1-\u05F3]" ;; base (yidish ligatures) "[\u05B0-\u05B9\u05BB\u05C7]?" ;; 0-1 marks of 3rd class (niqqud) "[\u0591-\u05AF\u05BD]*" ;; 0-2 (possibly 3) marks of 4th class "\\)") "Regexp matching a composable sequence of Hebrew characters.") ;;; Handa san suggest this. still needs to be understood ;; (let ((hebrew-diacritals-list '((FROM1 . TO1) (FROM2 . TO2) ...)) ;; (regexp "[..HEBREW_BASE_CHARS..][..HEBREW_DIACRITICALS..])) ;; (dolist (elt hebrew-diacritals-list) ;; (set-char-table-range elt ;; (list (vector regexp 1 'font-shape-gstring))))) (set-char-table-range composition-function-table '(#x591 . #x5F4) (list (vector hebrew-composable-pattern 0 'font-shape-gstring))) (provide 'hebrew) ;; arch-tag: 3ca04f32-3f1e-498e-af46-8267498ba5d9 ;;; hebrew.el ends here ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-13 19:46 ` Yair F @ 2010-05-13 19:56 ` Eli Zaretskii 2010-05-13 20:08 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Eli Zaretskii @ 2010-05-13 19:56 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Thu, 13 May 2010 22:46:03 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: handa@m17n.org, emacs-devel@gnu.org > > Something strange happens here as these characters *are* composed > (Shin+shin dot+qamats). When you move cursor through that "composed" character, does Emacs treat them as a single character? That is, is a single C-f enough to move all the way past Shin+shin dot+qamats? If not, then there is no composition; rather, the diacriticals are displayed with a proper offset so that they _appear_ to be composed with the base character (shin, in this case). ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-13 19:56 ` Eli Zaretskii @ 2010-05-13 20:08 ` Yair F 2010-05-14 2:35 ` Miles Bader 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-05-13 20:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, handa On Thu, May 13, 2010 at 10:56 PM, Eli Zaretskii <eliz@gnu.org> wrote: If not, then there is no > composition; rather, the diacriticals are displayed with a proper > offset so that they _appear_ to be composed with the base character > (shin, in this case). > You are right. However this is somewhat counter-intuitive and makes editing difficult. Leaving bidi Consider this example: d̊̂ (d+ring above+circumflex) The user would like to remove the ring component. How can it be done? Also When a composition occurs what-cursor-position give Unicode information only fot the base characters and not for other composed components. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-13 20:08 ` Yair F @ 2010-05-14 2:35 ` Miles Bader 2010-05-14 10:45 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Miles Bader @ 2010-05-14 2:35 UTC (permalink / raw) To: Yair F; +Cc: Eli Zaretskii, handa, emacs-devel Yair F <yair.f.lists@gmail.com> writes: > Consider this example: d̊̂ (d+ring above+circumflex) The user would > like to remove the ring component. How can it be done? _Editing_ commands (e.g., DEL) work on the individual components, even for composed characters -- so if you put d̊̂ in a buffer, position the cursor after it, and hit DEL, it will remove just the last accent, leaving "d̊". This duality works pretty well I think, and seems intuitive. > Also When a composition occurs what-cursor-position give Unicode > information only fot the base characters and not for other composed > components. Use "C-u C-x =" to see more info. -Miles -- 自らを空にして、心を開く時、道は開かれる ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 2:35 ` Miles Bader @ 2010-05-14 10:45 ` Yair F 2010-05-14 13:05 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 88+ messages in thread From: Yair F @ 2010-05-14 10:45 UTC (permalink / raw) To: Miles Bader; +Cc: Eli Zaretskii, handa, emacs-devel On Fri, May 14, 2010 at 5:35 AM, Miles Bader <miles@gnu.org> wrote: > Yair F <yair.f.lists@gmail.com> writes: >> Consider this example: d̊̂ (d+ring above+circumflex) The user would >> like to remove the ring component. How can it be done? > > _Editing_ commands (e.g., DEL) work on the individual components, even > for composed characters -- so if you put d̊̂ in a buffer, position the > cursor after it, and hit DEL, it will remove just the last accent, > leaving "d̊". This duality works pretty well I think, and seems > intuitive. > I still fail to understand how to directly access end edit individual componens. In the example above how can the ring component can be directly edited without editing the circumflex? A base character composed with 3 marks is not that infrequent. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 10:45 ` Yair F @ 2010-05-14 13:05 ` Eli Zaretskii 2010-05-14 13:15 ` Kenichi Handa 2010-05-15 0:46 ` Miles Bader 2 siblings, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-05-14 13:05 UTC (permalink / raw) To: Yair F; +Cc: handa, emacs-devel, miles > Date: Fri, 14 May 2010 13:45:41 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, handa@m17n.org, emacs-devel@gnu.org > > In the example above how can the ring component can be directly edited > without editing the circumflex? > > A base character composed with 3 marks is not that infrequent. Well, at least in Hebrew, everyday's written language almost never uses marks at all, so I think 3 marks would be limited to Biblical Hebrew or maybe poetry (and I'd be thrilled to know that Hebrew-writing poets start using Emacs for their craft ;-). If worse comes to worst, you can always disable auto-composition-mode, edit the text, then re-enable it. In the long run, it would be nice to have specialized commands for moving inside composed characters, but I suspect that making it happen would need non-trivial changes to the display engine. Volunteers are welcome to code that, but I personally have much more important things on my plate for the foreseeable future. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 10:45 ` Yair F 2010-05-14 13:05 ` Eli Zaretskii @ 2010-05-14 13:15 ` Kenichi Handa 2010-05-15 0:46 ` Miles Bader 2 siblings, 0 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-14 13:15 UTC (permalink / raw) To: Yair F; +Cc: eliz, emacs-devel, miles In article <AANLkTikiwXfiL0WUjkkSUQw9SbGYNYGLg-6rO_LhNWWe@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Fri, May 14, 2010 at 5:35 AM, Miles Bader <miles@gnu.org> wrote: > > Yair F <yair.f.lists@gmail.com> writes: >>> Consider this example: d̊̂ (d+ring above+circumflex) The user would >>> like to remove the ring component. How can it be done? > > > > _Editing_ commands (e.g., DEL) work on the individual components, even > > for composed characters -- so if you put d̊̂ in a buffer, position the > > cursor after it, and hit DEL, it will remove just the last accent, > > leaving "d̊". This duality works pretty well I think, and seems > > intuitive. > > > I still fail to understand how to directly access end edit individual componens. > In the example above how can the ring component can be directly edited > without editing the circumflex? > A base character composed with 3 marks is not that infrequent. I used to use this code for debugging. It allows you to put point within a composition. But, I found that it surely moves point but the screen is not updated properly now. I'll debug it after I finish more urgent tasks. (defun forward-char-intrusive () (interactive) (setq disable-point-adjustment t) (forward-char 1)) (defun backward-char-intrusive () (interactive) (setq disable-point-adjustment t) (forward-char -1)) (global-set-key (kbd "C-S-f") 'forward-char-intrusive) (global-set-key (kbd "C-S-b") 'backward-char-intrusive) --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-14 10:45 ` Yair F 2010-05-14 13:05 ` Eli Zaretskii 2010-05-14 13:15 ` Kenichi Handa @ 2010-05-15 0:46 ` Miles Bader 2 siblings, 0 replies; 88+ messages in thread From: Miles Bader @ 2010-05-15 0:46 UTC (permalink / raw) To: Yair F; +Cc: Eli Zaretskii, emacs-devel, handa Yair F <yair.f.lists@gmail.com> writes: > I still fail to understand how to directly access end edit individual componens. > In the example above how can the ring component can be directly edited > without editing the circumflex? No so far as I know (though it can be done simply enough in elisp), but as far as I can see, it's really not very important. You could easily write an elisp function to split apart the composed components to allow easier editing, e.g., by putting a space after the base character (a user could then re-compose them by deleting the space). However given the shortness of such sequences in all but the rarest cases, I expect most users simply wouldn't bother to use such a function, and would probably just delete the accents and re-add them if they were incorrect. -Miles -- Next to fried food, the South has suffered most from oratory. -- Walter Hines Page ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-05-12 15:11 ` Yair F 2010-05-12 17:43 ` Eli Zaretskii @ 2010-05-13 0:29 ` Kenichi Handa 1 sibling, 0 replies; 88+ messages in thread From: Kenichi Handa @ 2010-05-13 0:29 UTC (permalink / raw) To: Yair F; +Cc: eliz, emacs-devel In article <AANLkTimTBtuObr0aS7_cJ-XCCKJRW5e0rNT4HyGdAJUN@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: >>> Is there a way to get what-cursor-position show composition? > > > > Try "C-u C-x =". > > > This would only show the information for the character after the point > whether it's a based or composed mark. Then, it means that the diacritical mark is not really composed with the previous base character but is just displayed with a proper x-offset (lbearing). >>> I see some problems, some not present in your image see my example attached: > > > > How did you get the images you posted? I don't think Handa-san posted > > his changes to Emacs yet, at least I didn't see them. Did I miss > > something? > > > Sorry, it is my mistake. I though that Handa-san already comitted his work. Sorry for not writing that clearly. I need some more work to commit the code. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
[parent not found: <tl7fx0v9nra.fsf@m17n.org>]
* Re: Composing Hebrew diacriticals [not found] <tl7fx0v9nra.fsf@m17n.org> @ 2010-06-15 11:02 ` Kenichi Handa 2010-06-24 6:33 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-06-15 11:02 UTC (permalink / raw) To: emacs-devel; +Cc: yair.f.lists In article <AANLkTinkfapIXNSnij20psfpKU1ZKS-6wJsVIDbVaQ7i@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote: > > Then please find why maybe_otf and otf are set to zero by > > stepping through the code of ftfont_get_otf which is called > > from ftfont_shape. > ftfont_get_otf sets otf only if maybe_otf != 0. > maybe_otf is initialized from ft_face->face_flags in xftfont_open. > For David CLM maybe_otf = 0 because ft_face->face_flags = 2577. > For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649. That's very strange. Perhaps your David CLM font is different from mine. In freetype.h, FT_FACE_FLAG_SFNT is explained as this: /* FT_FACE_FLAG_SFNT :: */ /* Indicates that the face uses the `sfnt' storage scheme. For */ /* now, this means TrueType and OpenType. */ So, if the font doesn't have this flag set, it means that the font is surely not OTF. This is some info about my David CLM font. % ls -l DavidCLM-Medium.ttf -rw-r--r-- 1 handa handa 24156 2010-06-15 09:48 DavidCLM-Medium.ttf % fc-list 'david clm' capability :capability=otlayout\:hebr % od -t x1 DavidCLM-Medium.ttf |head 0000000 00 01 00 00 00 10 01 00 00 04 00 00 46 46 54 4d 0000020 4f 58 4a 2a 00 00 5e 40 00 00 00 1c 47 44 45 46 0000040 08 87 07 9c 00 00 50 24 00 00 00 6e 47 50 4f 53 0000060 c3 06 cd 7e 00 00 55 34 00 00 09 0a 47 53 55 42 0000100 48 82 52 49 00 00 50 94 00 00 04 9e 4f 53 2f 32 0000120 89 5b 2c ee 00 00 01 88 00 00 00 56 63 6d 61 70 0000140 ae 86 db a7 00 00 05 3c 00 00 02 0a 63 76 74 20 0000160 00 28 02 f8 00 00 07 48 00 00 00 04 67 61 73 70 0000200 ff ff 00 03 00 00 50 1c 00 00 00 08 67 6c 79 66 0000220 62 9d 8f 85 00 00 08 fc 00 00 3c 34 68 65 61 64 --- Kenichi Handa handa@m17n.org PS. I got WiFi (WiMAX) now, and the Internet access has been much improved. :-) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-15 11:02 ` Kenichi Handa @ 2010-06-24 6:33 ` Kenichi Handa 2010-06-25 10:16 ` Eli Zaretskii 2010-06-28 16:40 ` Yair F 0 siblings, 2 replies; 88+ messages in thread From: Kenichi Handa @ 2010-06-24 6:33 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel Yair, could you please check your David CLM font with these commands? % ls -l DavidCLM-Medium.ttf % fc-list 'david clm' capability % od -t x1 DavidCLM-Medium.ttf |head --- Kenichi Handa handa@m17n.org PS. I left the hospital yesterday. :-) In article <tl7eig8pnim.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > In article <AANLkTinkfapIXNSnij20psfpKU1ZKS-6wJsVIDbVaQ7i@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > > On Fri, May 28, 2010 at 3:42 AM, Kenichi Handa <handa@m17n.org> wrote: > > > Then please find why maybe_otf and otf are set to zero by > > > stepping through the code of ftfont_get_otf which is called > > > from ftfont_shape. > > ftfont_get_otf sets otf only if maybe_otf != 0. > > maybe_otf is initialized from ft_face->face_flags in xftfont_open. > > For David CLM maybe_otf = 0 because ft_face->face_flags = 2577. > > For Dejavu Sans maybe_otf = 8 because ft_face->face_flags = 2649. > That's very strange. Perhaps your David CLM font is > different from mine. > In freetype.h, FT_FACE_FLAG_SFNT is explained as this: > /* FT_FACE_FLAG_SFNT :: */ > /* Indicates that the face uses the `sfnt' storage scheme. For */ > /* now, this means TrueType and OpenType. */ > So, if the font doesn't have this flag set, it means that > the font is surely not OTF. > This is some info about my David CLM font. > % ls -l DavidCLM-Medium.ttf > -rw-r--r-- 1 handa handa 24156 2010-06-15 09:48 DavidCLM-Medium.ttf > % fc-list 'david clm' capability > :capability=otlayout\:hebr > % od -t x1 DavidCLM-Medium.ttf |head > 0000000 00 01 00 00 00 10 01 00 00 04 00 00 46 46 54 4d > 0000020 4f 58 4a 2a 00 00 5e 40 00 00 00 1c 47 44 45 46 > 0000040 08 87 07 9c 00 00 50 24 00 00 00 6e 47 50 4f 53 > 0000060 c3 06 cd 7e 00 00 55 34 00 00 09 0a 47 53 55 42 > 0000100 48 82 52 49 00 00 50 94 00 00 04 9e 4f 53 2f 32 > 0000120 89 5b 2c ee 00 00 01 88 00 00 00 56 63 6d 61 70 > 0000140 ae 86 db a7 00 00 05 3c 00 00 02 0a 63 76 74 20 > 0000160 00 28 02 f8 00 00 07 48 00 00 00 04 67 61 73 70 > 0000200 ff ff 00 03 00 00 50 1c 00 00 00 08 67 6c 79 66 > 0000220 62 9d 8f 85 00 00 08 fc 00 00 3c 34 68 65 61 64 > --- > Kenichi Handa > handa@m17n.org > PS. I got WiFi (WiMAX) now, and the Internet access has > been much improved. :-) ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-24 6:33 ` Kenichi Handa @ 2010-06-25 10:16 ` Eli Zaretskii 2010-06-28 16:40 ` Yair F 1 sibling, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-06-25 10:16 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Thu, 24 Jun 2010 15:33:06 +0900 > Cc: yair.f.lists@gmail.com, emacs-devel@gnu.org > > PS. I left the hospital yesterday. :-) Glad to hear that. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-24 6:33 ` Kenichi Handa 2010-06-25 10:16 ` Eli Zaretskii @ 2010-06-28 16:40 ` Yair F 2010-06-29 8:07 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Yair F @ 2010-06-28 16:40 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel Sorry for the late response. Apparently the Culmus fonts are type1: /usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font text (DavidCLM-Medium 0.101) But MS fonts are ttf, and they doesn't compose either. /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data On Thu, Jun 24, 2010 at 9:33 AM, Kenichi Handa <handa@m17n.org> wrote: > Yair, could you please check your David CLM font with these > commands? > > % ls -l DavidCLM-Medium.ttf > % fc-list 'david clm' capability > % od -t x1 DavidCLM-Medium.ttf |head > > --- > Kenichi Handa > handa@m17n.org > > PS. I left the hospital yesterday. :-) This the best news! ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-28 16:40 ` Yair F @ 2010-06-29 8:07 ` Kenichi Handa 2010-06-29 18:57 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-06-29 8:07 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTilenRSGCRXJNj8TtdXqUlyoBOuk-PGld8geCah1@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Sorry for the late response. > Apparently the Culmus fonts are type1: > /usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font > text (DavidCLM-Medium 0.101) How did you install that font? I donwloaded culmus-0.104.tar.gz from this page: http://sourceforge.net/projects/culmus/files/culmus/0.104/ and extracted DavidCLM-Medium.ttf from that tarball, and put it under ~/.fonts. Please try that (and uninstall the above type1 font), and check if Emacs can use TrueType version of that font correctly. > But MS fonts are ttf, and they doesn't compose either. > /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data I tried that font too. That font doesn't have OpenType tables for hebrew script. % fc-list arial family capability Arial Arial:capability=otlayout\:arab But, the function hebrew-shape-gstring has workaround code for such fonts, and in my environment, hebrew diacriticals are surely composed (although the positioning is not optimal). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-29 8:07 ` Kenichi Handa @ 2010-06-29 18:57 ` Yair F 2010-06-30 5:27 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-06-29 18:57 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1806 bytes --] On Tue, Jun 29, 2010 at 11:07 AM, Kenichi Handa <handa@m17n.org> wrote: > In article <AANLkTilenRSGCRXJNj8TtdXqUlyoBOuk-PGld8geCah1@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > >> Sorry for the late response. >> Apparently the Culmus fonts are type1: >> /usr/share/fonts/X11/Type1/DavidCLM-Medium.pfa: PostScript Type 1 font >> text (DavidCLM-Medium 0.101) > > How did you install that font? I donwloaded > culmus-0.104.tar.gz from this page: This is from culmus package on ubuntu (and debian as well as most distributions as well). I would assume most Hebrew speakers on X based paltform will have these two packages installed. Most Hebrew based remixes package it. > http://sourceforge.net/projects/culmus/files/culmus/0.104/ > and extracted DavidCLM-Medium.ttf from that tarball, and put > it under ~/.fonts. > > Please try that (and uninstall the above type1 font), and > check if Emacs can use TrueType version of that font > correctly. I Tried with Keter-YG which is IMO the best Hebrew font, and Indeed the rendring looks OK with my sample (See attached). This font comes from culmus-ancient. The "problem" with that fornt that it is indeed have an ancient look. > >> But MS fonts are ttf, and they doesn't compose either. >> /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data > > I tried that font too. That font doesn't have OpenType > tables for hebrew script. > > % fc-list arial family capability > Arial > Arial:capability=otlayout\:arab > > But, the function hebrew-shape-gstring has workaround code > for such fonts, and in my environment, hebrew diacriticals > are surely composed (although the positioning is not > optimal). I would say that the positioning is not sufficient See attached of same file. [-- Attachment #2: arial.png --] [-- Type: image/png, Size: 23621 bytes --] [-- Attachment #3: keter.png --] [-- Type: image/png, Size: 29097 bytes --] ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-29 18:57 ` Yair F @ 2010-06-30 5:27 ` Kenichi Handa [not found] ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com> 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-06-30 5:27 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTintXoyqvqO5Mqqbyci-AKuBqMYRyp7TBnVUKT-Z@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: >>> But MS fonts are ttf, and they doesn't compose either. >>> /usr/share/fonts/truetype/msttcorefonts/Arial.ttf: TrueType font data > > > > I tried that font too. =A0That font doesn't have OpenType > > tables for hebrew script. > > > > % fc-list arial family capability > > Arial > > Arial:capability=3Dotlayout\:arab > > > > But, the function hebrew-shape-gstring has workaround code > > for such fonts, and in my environment, hebrew diacriticals > > are surely composed (although the positioning is not > > optimal). > I would say that the positioning is not sufficient See attached of same fil= > e. Comparing images of different font of unfamiliar (for me) script is very difficult. Please tell me exactly what character sequence requires more than positioning, and show me images of only that sequence. Anyway, for fonts that don't have OpenType tables for Hebrew script, we can do nothing other than artificially adjusting glyph position. Have you seen any other application rendering Hebrew well with that Arial font? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
[parent not found: <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>]
* Fwd: Composing Hebrew diacriticals [not found] ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com> @ 2010-06-30 21:48 ` Yair F 2010-07-01 5:59 ` Miles Bader 2010-07-01 5:52 ` Kenichi Handa 1 sibling, 1 reply; 88+ messages in thread From: Yair F @ 2010-06-30 21:48 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2226 bytes --] Here is a shorter version for the list ---------- Forwarded message ---------- From: Yair F <yair.f.lists@gmail.com> Date: Thu, Jul 1, 2010 at 12:28 AM Subject: Re: Composing Hebrew diacriticals To: Kenichi Handa <handa@m17n.org> Cc: emacs-devel@gnu.org I apologize for the size of this message. > Comparing images of different font of unfamiliar (for me) > script is very difficult. Please tell me exactly what > character sequence requires more than positioning, and show > me images of only that sequence. Sorry about that Please find hebrew-sample2.txt the source file. Arial-anottated.png is this file displayed using emacs with Arial font. The numbers in red refer to the following comments the general flow is top-bottom right-left: 1. Shin-Dot should be rendered near the right leg. currently it is rendered above the centre leg, this is unreradable. 2. All points below should be horizontally centred relative to the base letter. Currently it seems that they are align to the left. Exception for this rule is letters that have a single leg downward such as ו, ר, ד, ז the points should be rendered directly under the leg for these letters. 3. The Shva point touches Qof's leg. the result is unreadable. 4. The Dagesh point is hidden within the Shin letter. 5. This is not Hebrew, but the combining dot above should be composed with the letter A. 6. The Holam point should be left to the leg, and not right. Result is unreadable. 7. Shuruq point should be left to the vav letter, and not right. Result is unreadable. For reference on correct rendering I also attach The same file using Keter YG. > > Anyway, for fonts that don't have OpenType tables for Hebrew > script, we can do nothing other than artificially adjusting > glyph position. Have you seen any other application > rendering Hebrew well with that Arial font? Openoffice and Firefox correctly render Hebrew points. The poetry site you mentioned http://www.zemer.co.il/song.asp?id=393 uses David and being correctly rendered. Kate (using pango?) also better render using Arial, David-CLM. It has some other issues though, but the result is mostly readable. See attached sample under Kate. [-- Attachment #2: hebrew-sample2.txt --] [-- Type: text/plain, Size: 339 bytes --] שָׁלוֹם לְמִשְׁתַּמְּשֵׁי אִמַאקְס A "אֲעוֹלֵל 123 כַּגֶּפֶן" B. עַשֶּׁשֶׁת Ȧ לֹא שַׁרְתִּי לָךְ אַרְצִי, וְלֹא פֵּאַרְתִּי שְׁמֵךְ רַק קוֹל תְּרוּעַת הַגִּיל בְּיוֹם יִגַּהּ הָאוֹר [-- Attachment #3: Arial-anottated.png --] [-- Type: image/png, Size: 50244 bytes --] ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-06-30 21:48 ` Fwd: " Yair F @ 2010-07-01 5:59 ` Miles Bader 0 siblings, 0 replies; 88+ messages in thread From: Miles Bader @ 2010-07-01 5:59 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel Just FYI, the Emacs rendering of your sample text looks correct in my Gnus buffer, using the Truetype version of "Lucida Sans". [by "correct" I mean, (1) it handles all the points you describe correctly, and (2) everything looks "nice".] -Miles -- Suburbia: where they tear out the trees and then name streets after them. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals [not found] ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com> 2010-06-30 21:48 ` Fwd: " Yair F @ 2010-07-01 5:52 ` Kenichi Handa 2010-07-01 20:30 ` Yair F 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-07-01 5:52 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 9138 bytes --] In article <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Sorry about that Please find hebrew-sample2.txt the source file. > Arial-anottated.png is this file displayed using emacs with Arial font. > The numbers in red refer to the following comments the general flow is > top-bottom right-left: > 1. Shin-Dot should be rendered near the right leg. currently it is > rendered above the centre leg, this is unreradable. > 2. All points below should be horizontally centred relative to the > base letter. Currently it seems that they are align to the left. > Exception for this rule is letters that have a single leg downward > such as =D7=95, =D7=A8, =D7=93, =D7=96 the points should be rendered direct= > ly under the > leg for these letters. > 3. The Shva point touches Qof's leg. the result is unreadable. > 4. The Dagesh point is hidden within the Shin letter. > 5. This is not Hebrew, but the combining dot above should be composed > with the letter A. > 6. The Holam point should be left to the leg, and not right. Result is > unreadable. > 7. Shuruq point should be left to the vav letter, and not right. > Result is unreadable. All those are glyph positioning problems and can be improved by adding more code to hebrew-shape-gstring. > > Anyway, for fonts that don't have OpenType tables for Hebrew > > script, we can do nothing other than artificially adjusting > > glyph position. =C2=A0Have you seen any other application > > rendering Hebrew well with that Arial font? > Openoffice and Firefox correctly render Hebrew points. ??? When I open your hebrew-sample2.txt with oowriter, and specify Arial font, the rendering is almost (exactly?) the same as that of Emacs (see the attached image). I confirmed that Firefox (and all applications using Pango/harfbuzz; e.g. gedit) surely do better hebrew rendering with Arial. By reading the code of Pango, I found that it has a fallback shaping engine that is used for a font of no hebrew GPOS OpenType tables. Here's the excerpt from pango/module/hebrew-shaper.c. You'll see that it checks various character combinations and adjust glyph offsets properly. But the code has many magic numbers (e.g. 3.5, 0.7, 0.5, 1/3, 3/5, ...). I think it's a dirty & ad-hoc hack. Theoretically, it is possible to do the same thing in the function hebrew-shape-gstring. But, is it really worth doing that? Isn't it enough to tell Hebrew users to use properly desinged OpenType fonts? ============================================================ void hebrew_shaper_get_cluster_kerning(gunichar *cluster, gint cluster_length, PangoRectangle ink_rect[], /* input and output */ gint width[], gint x_offset[], gint y_offset[]) { int i; int base_ink_x_offset, base_ink_y_offset, base_ink_width, base_ink_height; gunichar base_char = cluster[0]; x_offset[0] = 0; y_offset[0] = 0; if (cluster_length == 1) { /* Make lone 'vav dot' have zero width */ if (base_char == UNI_SHIN_DOT || base_char == UNI_SIN_DOT || base_char == UNI_HOLAM ) { x_offset[0] = -ink_rect[0].x - ink_rect[0].width; width[0] = 0; } return; } base_ink_x_offset = ink_rect[0].x; base_ink_y_offset = ink_rect[0].y; base_ink_width = ink_rect[0].width; base_ink_height = ink_rect[0].height; /* Do heuristics */ for (i=1; i<cluster_length; i++) { int gl = cluster[i]; x_offset[i] = 0; y_offset[i] = 0; /* Check if it is a point */ if (gl < 0x5B0 || gl >= 0x05D0) continue; /* Center dot of VAV */ if (gl == UNI_MAPIQ && base_char == UNI_VAV) { x_offset[i] = base_ink_x_offset - ink_rect[i].x; /* If VAV is a vertical bar without a roof, then we need to make room for the dot by increasing the cluster width. But how can I check if that is the case?? */ /* This is wild, but it does the job of differentiating between two M$ fonts... Base the decision on the aspect ratio of the vav... */ if (base_ink_height > base_ink_width * 3.5) { int j; double space = 0.7; double kern = 0.5; /* Shift all characters to make place for the mapiq */ for (j=0; j<i; j++) x_offset[j] += ink_rect[i].width*(1+space-kern); width[cluster_length-1] += ink_rect[i].width*(1+space-kern); x_offset[i] -= ink_rect[i].width*(kern); } } /* Dot over SHIN */ else if (gl == UNI_SHIN_DOT && base_char == UNI_SHIN) { x_offset[i] = base_ink_x_offset + base_ink_width - ink_rect[i].x - ink_rect[i].width; } /* Dot over SIN */ else if (gl == UNI_SIN_DOT && base_char == UNI_SHIN) { x_offset[i] = base_ink_x_offset - ink_rect[i].x; } /* VOWEL DOT above to any other character than SHIN or VAV should stick out a bit to the left. */ else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM) && base_char != UNI_SHIN && base_char != UNI_VAV) { x_offset[i] = base_ink_x_offset -ink_rect[i].x - ink_rect[i].width * 3/ 2; } /* VOWELS under resh or vav are right aligned, if they are narrower than the characters. Otherwise they are centered. */ else if ((base_char == UNI_VAV || base_char == UNI_RESH || base_char == UNI_YOD || base_char == UNI_DALED ) && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) || gl == UNI_QUBUTS) && ink_rect[i].width < base_ink_width ) { x_offset[i] = base_ink_x_offset + base_ink_width - ink_rect[i].x - ink_rect[i].width; } /* VOWELS under FINAL KAF are offset centered and offset in y */ else if ((base_char == UNI_FINAL_KAF ) && ((gl >= UNI_SHEVA && gl <= UNI_QAMATS) || gl == UNI_QUBUTS)) { /* x are at 1/3 to take into accoun the stem */ x_offset[i] = base_ink_x_offset - ink_rect[i].x + base_ink_width * 1/3 - ink_rect[i].width/2; /* Center in y */ y_offset[i] = base_ink_y_offset - ink_rect[i].y + base_ink_height * 1/2 - ink_rect[i].height/2; } /* MAPIQ in PE or FINAL PE */ else if (gl == UNI_MAPIQ && (base_char == UNI_PE || base_char == UNI_FINAL_PE)) { x_offset[i]= base_ink_x_offset - ink_rect[i].x + base_ink_width * 2/3 - ink_rect[i].width/2; /* Another option is to offset the MAPIQ in y... glyphs->glyphs[cluster_start_idx+i].geometry.y_offset -= base_ink_height/5; */ } /* MAPIQ in SHIN should be moved a bit to the right */ else if (gl == UNI_MAPIQ && base_char == UNI_SHIN) { x_offset[i]= base_ink_x_offset - ink_rect[i].x + base_ink_width * 3/5 - ink_rect[i].width/2; } /* MAPIQ in YUD is right aligned */ else if (gl == UNI_MAPIQ && base_char == UNI_YOD) { x_offset[i]= base_ink_x_offset - ink_rect[i].x; /* Lower left in y */ y_offset[i] = base_ink_y_offset - ink_rect[i].y + base_ink_height - ink_rect[i].height*1.75; if (base_ink_height > base_ink_width * 2) { int j; double space = 0.7; double kern = 0.5; /* Shift all cluster characters to make space for mapiq */ for (j=0; j<i; j++) x_offset[j] += ink_rect[i].width*(1+space-kern); width[cluster_length-1] += ink_rect[i].width*(1+space-kern); } } /* VOWEL DOT next to any other character */ else if ((gl == UNI_SIN_DOT || gl == UNI_HOLAM) && (base_char != UNI_VAV)) { x_offset[i] = base_ink_x_offset -ink_rect[i].x; } /* Move nikud of taf a bit ... */ else if (base_char == UNI_TAV && gl == UNI_MAPIQ) { x_offset[i] = base_ink_x_offset - ink_rect[i].x + base_ink_width * 5/8 - ink_rect[i].width/2; } /* Move center dot of characters with a right stem and no left stem. */ else if (gl == UNI_MAPIQ && (base_char == UNI_BET || base_char == UNI_DALED || base_char == UNI_KAF || base_char == UNI_GIMMEL )) { x_offset[i] = base_ink_x_offset - ink_rect[i].x + base_ink_width * 3/8 - ink_rect[i].width/2; } /* Right align wide nikud under QOF */ else if (base_char == UNI_QOF && ( (gl >= UNI_HATAF_SEGOL && gl <= UNI_HATAF_QAMATZ) || (gl >= UNI_TSERE && gl<= UNI_QAMATS) || (gl == UNI_QUBUTS))) { x_offset[i] = base_ink_x_offset + base_ink_width - ink_rect[i].x - ink_rect[i].width; } /* Center by default */ else { x_offset[i] = base_ink_x_offset - ink_rect[i].x + base_ink_width/2 - ink_rect[i].width/2; } } } ============================================================ > The poetry site > you mentioned http://www.zemer.co.il/song.asp?id=3D393 uses David and > being correctly rendered. > Kate (using pango?) also better render using Arial, David-CLM. It has > some other issues though, but the result is mostly readable. As Kate is a KDE application, I think it's not using Pango. But, if it renders Hebrew with Arial well, it (or rendering module of KDE/Qt) should have the similar ad-hoc code. --- Kenichi Handa handa@m17n.org [-- Attachment #2: oowriter-arial.png --] [-- Type: image/png, Size: 79797 bytes --] ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-01 5:52 ` Kenichi Handa @ 2010-07-01 20:30 ` Yair F 2010-07-02 7:51 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Yair F @ 2010-07-01 20:30 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Thu, Jul 1, 2010 at 8:52 AM, Kenichi Handa <handa@m17n.org> wrote: > All those are glyph positioning problems and can be improved >by adding more code to hebrew-shape-gstring. What else problem do you expect? So far I see no other problems regarding bidi or compositions. > ??? When I open your hebrew-sample2.txt with oowriter, and > specify Arial font, the rendering is almost (exactly?) the > same as that of Emacs (see the attached image). urrent oo p You are right. Maybe it was with a special Hebrew oo version I don't have it now, or maybe on other OS. current oo practice is "use proper fonts" :( >I think it's a dirty & > ad-hoc hack. > > Theoretically, it is possible to do the same thing in the > function hebrew-shape-gstring. But, is it really worth > doing that? Isn't it enough to tell Hebrew users to use > properly desinged OpenType fonts? The sad answer on free systems is that there are nealy no such fonts. The common answer for "Why is Hebrew so ugly on Linux?" is "Install Culmus and msttcorefonts". I guess that is the reason for the twaks you mentioned. > > As Kate is a KDE application, I think it's not using Pango. > But, if it renders Hebrew with Arial well, it (or rendering > module of KDE/Qt) should have the similar ad-hoc code. Maybe, as you can see I don't know much about rending engines. An additional and possibly less ugly path is to use presentation forms when available.(UFB20) There are additional forms in the private use area. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-01 20:30 ` Yair F @ 2010-07-02 7:51 ` Kenichi Handa 2010-07-12 8:17 ` Kenichi Handa 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-07-02 7:51 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTil7SZlvtMBLmfz3DG_wHKKki72LwSIITx53w0tf@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > On Thu, Jul 1, 2010 at 8:52 AM, Kenichi Handa <handa@m17n.org> wrote: > > All those are glyph positioning problems and can be improved > >by adding more code to hebrew-shape-gstring. > What else problem do you expect? Sorry, I just misread what you wrote "I would say that the positioning is not sufficient" as "there should be more work other than positioning". > >I think it's a dirty & > > ad-hoc hack. > > > > Theoretically, it is possible to do the same thing in the > > function hebrew-shape-gstring. But, is it really worth > > doing that? Isn't it enough to tell Hebrew users to use > > properly desinged OpenType fonts? > The sad answer on free systems is that there are nealy no such fonts. > The common answer for "Why is Hebrew so ugly on Linux?" is "Install > Culmus and msttcorefonts". > I guess that is the reason for the twaks you mentioned. Sign... > An additional and possibly less ugly path is to use presentation forms > when available.(UFB20) There are additional forms in the private use > area. Hmmm, that seems to be a practical approach provided that the presentation forms covers most of frequently used character combinations. I'll try to implement it. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-02 7:51 ` Kenichi Handa @ 2010-07-12 8:17 ` Kenichi Handa 2010-07-12 21:10 ` Yair F 0 siblings, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-07-12 8:17 UTC (permalink / raw) To: Kenichi Handa; +Cc: yair.f.lists, emacs-devel In article <tl7hbki48zt.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > > An additional and possibly less ugly path is to use presentation forms > > when available.(UFB20) There are additional forms in the private use > > area. > Hmmm, that seems to be a practical approach provided that > the presentation forms covers most of frequently used > character combinations. I'll try to implement it. I've just comitted the code to do that. I tested with the Arial font and it seems that the most of points you listed are solved now except for this: 5. This is not Hebrew, but the combining dot above should be composed with the letter A. It seems that Arial font doesn't have a glyph of #x307. When you set both the default font and the font for #x307 to "dejavu sans mono", #x307 is composed with the preceding "A". --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-12 8:17 ` Kenichi Handa @ 2010-07-12 21:10 ` Yair F 2010-07-13 4:11 ` Kenichi Handa 2010-07-13 12:01 ` Eli Zaretskii 0 siblings, 2 replies; 88+ messages in thread From: Yair F @ 2010-07-12 21:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Mon, Jul 12, 2010 at 11:17 AM, Kenichi Handa <handa@m17n.org> wrote: > I've just comitted the code to do that. I tested with the > Arial font and it seems that the most of points you listed > are solved now except for this: Now it's much much better, Thank you! Here are some more improvements needed: The placement of Holam (05B9) point seems to be top-center. It should be top-left instead. Specifically for Lamed (0CDC) base letter it should be to the left of the top vertical leg. Some fonts have presentation-form for that at E804. Sheva (05B0) and Qamats (05B8) points should be shifted above baseline to approximatly center-center position when composed with Final Kaf (05DA). Again some fonts pre-compose it at E802 and E803 respectively. Currently I'm trying to hunt-down a problem when sometimes when transient-mode is active some characters suddenly stop composing. Once I get a recepie, I'll let you know. Thanks Again to you end Eli, Yair ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-12 21:10 ` Yair F @ 2010-07-13 4:11 ` Kenichi Handa 2010-07-13 4:47 ` Yair F 2010-07-13 12:01 ` Eli Zaretskii 1 sibling, 1 reply; 88+ messages in thread From: Kenichi Handa @ 2010-07-13 4:11 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel In article <AANLkTimYSLAA4UTDjZ2MF5NhqOxtX_m7_oqQ6TanAMZl@mail.gmail.com>, Yair F <yair.f.lists@gmail.com> writes: > Now it's much much better, Thank you! > Here are some more improvements needed: > The placement of Holam (05B9) point seems to be top-center. It should > be top-left instead. > Specifically for Lamed (0CDC) base letter it should be to the left of > the top vertical leg. > Some fonts have presentation-form for that at E804. But, E804 is in a Private Use Area, and there's no way to check if the glyph there (if any) is a Hebrew glyph or not. Or, are there any consensus among Hebrew font designers? > Currently I'm trying to hunt-down a problem when sometimes when > transient-mode is > active some characters suddenly stop composing. Once I get a recepie, > I'll let you know. I see. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-13 4:11 ` Kenichi Handa @ 2010-07-13 4:47 ` Yair F 0 siblings, 0 replies; 88+ messages in thread From: Yair F @ 2010-07-13 4:47 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel On Tue, Jul 13, 2010 at 7:11 AM, Kenichi Handa <handa@m17n.org> wrote: > But, E804 is in a Private Use Area, and there's no way to > check if the glyph there (if any) is a Hebrew glyph or not. By glyph name? > > Or, are there any consensus among Hebrew font designers? It is available on some font, some of them those who don't give enough information to do proper rendring. ^ permalink raw reply [flat|nested] 88+ messages in thread
* Re: Composing Hebrew diacriticals 2010-07-12 21:10 ` Yair F 2010-07-13 4:11 ` Kenichi Handa @ 2010-07-13 12:01 ` Eli Zaretskii 1 sibling, 0 replies; 88+ messages in thread From: Eli Zaretskii @ 2010-07-13 12:01 UTC (permalink / raw) To: Yair F; +Cc: emacs-devel, handa > Date: Tue, 13 Jul 2010 00:10:04 +0300 > From: Yair F <yair.f.lists@gmail.com> > Cc: emacs-devel@gnu.org > > Currently I'm trying to hunt-down a problem when sometimes when > transient-mode is active some characters suddenly stop composing. Is this in a buffer that's bidi-reordered for display? If so, does the problem go away if you turn off bidi-display-reordering? When bidi reordering is in effect, both face resolution and character composition need to examine buffer text backwards, because text properties and character compositions are still defined in logical order. It's possible that face resolution somehow interferes with character composition in that case. Let me know if I can help. ^ permalink raw reply [flat|nested] 88+ messages in thread
end of thread, other threads:[~2010-07-13 12:01 UTC | newest] Thread overview: 88+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-30 12:29 Composing Hebrew diacriticals Eli Zaretskii 2010-05-05 2:39 ` Kenichi Handa 2010-05-05 15:49 ` David Kastrup 2010-05-05 20:51 ` Eli Zaretskii 2010-05-06 7:20 ` David Kastrup 2010-05-06 0:45 ` Kenichi Handa 2010-05-06 12:14 ` David Kastrup 2010-05-06 13:01 ` Kenichi Handa 2010-05-05 18:01 ` Eli Zaretskii 2010-05-07 11:15 ` Kenichi Handa 2010-05-08 12:51 ` Eli Zaretskii 2010-05-06 14:59 ` Yair F. 2010-05-06 17:41 ` Eli Zaretskii 2010-05-07 0:48 ` Kenichi Handa 2010-05-07 4:41 ` Yair F 2010-05-07 6:23 ` Kenichi Handa 2010-05-07 10:00 ` Yair F 2010-05-07 11:11 ` Kenichi Handa 2010-05-07 9:28 ` Eli Zaretskii 2010-05-10 14:09 ` Yair F 2010-05-11 0:25 ` Kenichi Handa 2010-05-11 12:20 ` Kenichi Handa 2010-05-11 16:22 ` Eli Zaretskii 2010-05-12 8:04 ` Kenichi Handa 2010-05-12 17:35 ` Eli Zaretskii 2010-05-12 19:05 ` Juanma Barranquero 2010-05-13 3:06 ` Eli Zaretskii 2010-05-13 0:42 ` Kenichi Handa 2010-05-14 8:10 ` Kenichi Handa 2010-05-14 10:02 ` Eli Zaretskii 2010-05-14 11:58 ` Kenichi Handa 2010-05-14 13:29 ` Eli Zaretskii 2010-05-14 14:06 ` Eli Zaretskii [not found] ` <AANLkTilcNB_ntRY_EVS9EyMrqS3GRAp3rHGiXL_3YZuR@mail.gmail.com> 2010-05-15 2:14 ` Kenichi Handa 2010-05-15 21:35 ` Yair F 2010-05-17 4:35 ` Kenichi Handa 2010-05-17 17:32 ` Eli Zaretskii 2010-05-18 0:36 ` Kenichi Handa 2010-05-17 21:08 ` Yair F 2010-05-20 2:09 ` Kenichi Handa 2010-05-25 1:45 ` Kenichi Handa 2010-05-25 20:56 ` Yair F 2010-05-26 0:36 ` Kenichi Handa 2010-05-26 4:37 ` Yair F 2010-05-26 6:00 ` Kenichi Handa 2010-05-26 16:12 ` Yair F 2010-05-27 7:27 ` Kenichi Handa 2010-05-27 21:59 ` Yair F 2010-05-28 0:42 ` Kenichi Handa 2010-06-01 8:58 ` Yair F 2010-05-26 13:28 ` Enabling bidi (was: Composing Hebrew diacriticals) Stefan Monnier 2010-05-26 17:14 ` Eli Zaretskii 2010-05-27 4:13 ` Enabling bidi Stefan Monnier 2010-05-27 17:43 ` Eli Zaretskii 2010-05-18 7:29 ` Composing Hebrew diacriticals Eli Zaretskii 2010-05-17 13:53 ` Stefan Monnier 2010-05-19 17:23 ` Eli Zaretskii 2010-05-11 21:40 ` Yair F 2010-05-12 3:15 ` Eli Zaretskii 2010-05-12 15:11 ` Yair F 2010-05-12 17:43 ` Eli Zaretskii 2010-05-12 22:01 ` Yair F 2010-05-13 17:14 ` Eli Zaretskii 2010-05-13 19:46 ` Yair F 2010-05-13 19:56 ` Eli Zaretskii 2010-05-13 20:08 ` Yair F 2010-05-14 2:35 ` Miles Bader 2010-05-14 10:45 ` Yair F 2010-05-14 13:05 ` Eli Zaretskii 2010-05-14 13:15 ` Kenichi Handa 2010-05-15 0:46 ` Miles Bader 2010-05-13 0:29 ` Kenichi Handa [not found] <tl7fx0v9nra.fsf@m17n.org> 2010-06-15 11:02 ` Kenichi Handa 2010-06-24 6:33 ` Kenichi Handa 2010-06-25 10:16 ` Eli Zaretskii 2010-06-28 16:40 ` Yair F 2010-06-29 8:07 ` Kenichi Handa 2010-06-29 18:57 ` Yair F 2010-06-30 5:27 ` Kenichi Handa [not found] ` <AANLkTim3sQzyJ4YQkOzfRHCFhztgLG-CA2vlM84lbwoq@mail.gmail.com> 2010-06-30 21:48 ` Fwd: " Yair F 2010-07-01 5:59 ` Miles Bader 2010-07-01 5:52 ` Kenichi Handa 2010-07-01 20:30 ` Yair F 2010-07-02 7:51 ` Kenichi Handa 2010-07-12 8:17 ` Kenichi Handa 2010-07-12 21:10 ` Yair F 2010-07-13 4:11 ` Kenichi Handa 2010-07-13 4:47 ` Yair F 2010-07-13 12:01 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).