* Re: Re: Arabic support [not found] <1934111520.880681283871336127.JavaMail.root@zimbra3-e1.priv.proxad.net> @ 2010-09-07 15:08 ` mhibti 2010-09-13 6:40 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: mhibti @ 2010-09-07 15:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel You are right ! After invoking emacs -q I could do the following : - copy from emacs to another application (it worked). - copy arabic text from two different applications to emacs it works correctly exepted that tashkeel seems lost when the source include it. But after verification if I try to mark the region in question the tashkeel appears :) In my dot emacs i found what may be the cause of my problem. '(selection-coding-system (quote utf-8-dos)) '(unify-8859-on-decoding-mode t) '(unify-8859-on-encoding-mode t) Thanks ----- Mail Original ----- De: "Eli Zaretskii" <eliz@gnu.org> À: mhibti@free.fr Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org Envoyé: Mardi 7 Septembre 2010 06h39:53 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [emacs-bidi] Re: Arabic support > Date: Tue, 7 Sep 2010 05:34:04 +0200 (CEST) > From: mhibti@free.fr > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org > > It is simple. When copying a text in arabic (see pictures) and > pasting in the emacs buffer, the result is a series of "?" > characters (see resu.jpg). From which application did you copy the Arabic text? Do you see the same problems with other applications, or just with this one? Does it work to copy/paste from one Emacs instance to another? (I mean actually invoking "emacs -Q" twice, not pasting from one Emacs frame to another in the same session.) Does it work to copy FROM Emacs to another application that is not Emacs? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Arabic support 2010-09-07 15:08 ` Re: Arabic support mhibti @ 2010-09-13 6:40 ` Eli Zaretskii 2010-09-16 2:07 ` Kenichi Handa 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-13 6:40 UTC (permalink / raw) To: mhibti, handa; +Cc: emacs-bidi, emacs-devel > Date: Tue, 7 Sep 2010 17:08:01 +0200 (CEST) > From: mhibti@free.fr > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org > > - copy arabic text from two different applications to emacs > it works correctly exepted that tashkeel seems lost when the source include it. > But after verification if I try to mark the region in question the tashkeel appears :) It's probably some bad interaction between compositions and the handling of faces in the bidirectional display. Perhaps Handa-san could take a look at xdisp.c:handle_stop_backwards and how it is called inside next_element_from_buffer -- there might be some bugs there whereby only part of the composed sequence is redrawn when the region is extended or contracted. > In my dot emacs i found what may be the cause of my problem. > > > '(selection-coding-system (quote utf-8-dos)) This one is your problem: you should never do that on MS-Windows. > '(unify-8859-on-decoding-mode t) > '(unify-8859-on-encoding-mode t) These are obsolete, as everything is always unified in Emacs 24. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Arabic support 2010-09-13 6:40 ` Eli Zaretskii @ 2010-09-16 2:07 ` Kenichi Handa 2010-09-22 3:54 ` Kenichi Handa 0 siblings, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-09-16 2:07 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel In article <83eicy5epd.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > It's probably some bad interaction between compositions and the > handling of faces in the bidirectional display. I found that the problem is that the current composition for Arabic requires that a whole word must be composed. So, if there's a face change within a word, Arabic composition function is given just a partial word, and that results in the incorrect Arabic shaping. This is a difficult problem, and I need a time to find a solution. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Arabic support 2010-09-16 2:07 ` Kenichi Handa @ 2010-09-22 3:54 ` Kenichi Handa 2010-09-22 7:33 ` Eli Zaretskii 2010-09-22 12:27 ` Thamer Mahmoud 0 siblings, 2 replies; 31+ messages in thread From: Kenichi Handa @ 2010-09-22 3:54 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel, emacs-bidi In article <tl77himmofd.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > In article <83eicy5epd.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > It's probably some bad interaction between compositions and the > > handling of faces in the bidirectional display. > I found that the problem is that the current composition for > Arabic requires that a whole word must be composed. So, if > there's a face change within a word, Arabic composition > function is given just a partial word, and that results in > the incorrect Arabic shaping. This is a difficult problem, > and I need a time to find a solution. I've just installed a fix to trunk. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Arabic support 2010-09-22 3:54 ` Kenichi Handa @ 2010-09-22 7:33 ` Eli Zaretskii 2010-09-22 12:27 ` Thamer Mahmoud 1 sibling, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-09-22 7:33 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, handa > From: Kenichi Handa <handa@m17n.org> > Cc: eliz@gnu.org, emacs-bidi@gnu.org, mhibti@free.fr, emacs-devel@gnu.org > Date: Wed, 22 Sep 2010 12:54:26 +0900 > > In article <tl77himmofd.fsf@m17n.org>, Kenichi Handa <handa@m17n.org> writes: > > > In article <83eicy5epd.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > It's probably some bad interaction between compositions and the > > > handling of faces in the bidirectional display. > > > I found that the problem is that the current composition for > > Arabic requires that a whole word must be composed. So, if > > there's a face change within a word, Arabic composition > > function is given just a partial word, and that results in > > the incorrect Arabic shaping. This is a difficult problem, > > and I need a time to find a solution. > > I've just installed a fix to trunk. Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-22 3:54 ` Kenichi Handa 2010-09-22 7:33 ` Eli Zaretskii @ 2010-09-22 12:27 ` Thamer Mahmoud 2010-09-27 5:56 ` Kenichi Handa 1 sibling, 1 reply; 31+ messages in thread From: Thamer Mahmoud @ 2010-09-22 12:27 UTC (permalink / raw) To: emacs-bidi; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 930 bytes --] Kenichi Handa <handa@m17n.org> writes: >> I found that the problem is that the current composition for >> Arabic requires that a whole word must be composed. So, if >> there's a face change within a word, Arabic composition >> function is given just a partial word, and that results in >> the incorrect Arabic shaping. This is a difficult problem, >> and I need a time to find a solution. > > I've just installed a fix to trunk. I can confirm that the issue with unshaped glyphs while highlighting words is now fixed. Thanks. However, long Arabic strings still have unshaped middle parts and bad margin. See the attached screenshot which is the output of M-30-<BAA> in an empty buffer. Also the following code produces duplicate strings, compared to when auto-composition-mode is off. (let () (setq bidi-display-reordering t) (insert "\n\n") (insert "كمنت") (insert "ببببببببببببببببببب")) [-- Attachment #2: screenshot --] [-- Type: image/png, Size: 992 bytes --] [-- Attachment #3: Type: text/plain, Size: 12 bytes --] -- Thamer [-- Attachment #4: Type: text/plain, Size: 140 bytes --] _______________________________________________ emacs-bidi mailing list emacs-bidi@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-bidi ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Re: Arabic support 2010-09-22 12:27 ` Thamer Mahmoud @ 2010-09-27 5:56 ` Kenichi Handa 0 siblings, 0 replies; 31+ messages in thread From: Kenichi Handa @ 2010-09-27 5:56 UTC (permalink / raw) To: Thamer Mahmoud; +Cc: emacs-bidi, emacs-devel In article <87aananet1.fsf@zemblan.newkuwait.org>, Thamer Mahmoud <thamer.mahmoud@gmail.com> writes: > However, long Arabic strings still have unshaped middle parts and bad > margin. See the attached screenshot which is the output of M-30-<BAA> > in an empty buffer. Ah, I found what is wrong. In "struct glyph", we now have only 4 bits to store indices into an LGSTRING. struct { /* Flag to tell if the composition is automatic or not. */ unsigned automatic : 1; /* ID of the composition. */ unsigned id : 23; /* Start and end indices of glyphs of the composition. */ unsigned from : 4; unsigned to : 4; } cmp; So, we could handle at most 16 glyphs in one composition. I've just installed a fix to remove that restriction (theoretically we still have a restriction of at most 0x7FFFFFFF glyphs in one composition). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Arabic support @ 2010-08-26 1:10 Kenichi Handa 2010-08-27 9:56 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-08-26 1:10 UTC (permalink / raw) To: emacs-devel I've just committed changes to trunk for Arabic shaping. If there're any Arabic users in this list, please check the displaying of Arabic text. On GNU/Linux system, you must compile Emacs with libotf and m17n-lib (configure script should detect them automatically). --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-26 1:10 Kenichi Handa @ 2010-08-27 9:56 ` Eli Zaretskii 2010-08-30 7:47 ` Kenichi Handa 2010-09-06 13:45 ` Thamer Mahmoud 0 siblings, 2 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-08-27 9:56 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Thu, 26 Aug 2010 10:10:05 +0900 > > I've just committed changes to trunk for Arabic shaping. If > there're any Arabic users in this list, please check the > displaying of Arabic text. On GNU/Linux system, you must > compile Emacs with libotf and m17n-lib (configure script > should detect them automatically). Thanks. However, today's build behaves very strangely in a GUI session on MS-Windows. For starters, cursor motion seems to jump across many characters in the "Arabic" line of etc/HELLO. For example, typing C-f in that line, I first move one character at a time across "Arabic", as expected, then the cursor jumps to the right paren of the leftmost parenthesized part, again as expected, and then I see the following strange behavior: . C-f moves one character to the left, to buffer position 758, as expected. . the next C-f jumps across many characters on the screen and lands on position 764. . another C-f jumps to what is reported as position 765, but on the screen those are several characters, maybe 5 or 6. . another C-f moves to the left paren at position 766, as expected. . yet another C-f moves to position 767, but on the screen the cursor jumps back into one of the characters it jumped across when it landed on position 765 two C-f keypresses earlier. . if I type C-b 4 times from this point, I enter a "trap", whereby typing C-b jumps between two characters, whose buffer positions are 764 and 765. The only way to get out of the trap is with C-a or C-e or C-f. I don't read Arabic, so I cannot really say whether any of this is expected behavior. (The "trap" with C-b is certainly not the expected behavior.) Do you see anything similar on X? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-27 9:56 ` Eli Zaretskii @ 2010-08-30 7:47 ` Kenichi Handa 2010-08-30 14:06 ` Eli Zaretskii 2010-09-06 13:45 ` Thamer Mahmoud 1 sibling, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-08-30 7:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel In article <83bp8oml9c.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Thanks. However, today's build behaves very strangely in > a GUI session on MS-Windows. For starters, cursor motion > seems to jump across many characters in the "Arabic" line > of etc/HELLO. For example, typing C-f in that line, I > first move one character at a time across "Arabic", as > expected, then the cursor jumps to the right paren of the > leftmost parenthesized part, again as expected, and then I > see the following strange behavior: I can't see that strange behaviour on GNU/Linux. Amit Aronovitch <aronovitch@gmail.com> also reported that rendering and cursor movement are ok on Debian. So, I suspect that the problem is specific to Windows. In Emacs, bidi reordering is done by Emacs itself, so the `shape' method of font backend should not reorder glyphs. But, perhaps Uniscribe backend reorders Arabic text, right? --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-30 7:47 ` Kenichi Handa @ 2010-08-30 14:06 ` Eli Zaretskii 2010-09-01 2:17 ` Kenichi Handa 2010-09-02 13:48 ` Jason Rumney 0 siblings, 2 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-08-30 14:06 UTC (permalink / raw) To: Kenichi Handa, jasonr; +Cc: emacs-bidi, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Mon, 30 Aug 2010 16:47:08 +0900 > > I can't see that strange behaviour on GNU/Linux. Amit > Aronovitch <aronovitch@gmail.com> also reported that > rendering and cursor movement are ok on Debian. So, I > suspect that the problem is specific to Windows. Looks like that, yes. > In Emacs, bidi reordering is done by Emacs itself, so the `shape' > method of font backend should not reorder glyphs. But, perhaps > Uniscribe backend reorders Arabic text, right? No, not AFAIK. We call the ScriptItemize API of Uniscribe with NULL as the 4th and 5th arguments, which AFAIU should disable reordering. Perhaps Jason could chime in and tell if I'm right here. Btw, does the current code support Arabic ligatures and shaping on GNU/Linux? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-30 14:06 ` Eli Zaretskii @ 2010-09-01 2:17 ` Kenichi Handa 2010-09-01 6:11 ` Eli Zaretskii 2010-09-02 13:48 ` Jason Rumney 1 sibling, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-09-01 2:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, jasonr In article <E1Oq50d-0006YC-8u@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > In Emacs, bidi reordering is done by Emacs itself, so the `shape' > > method of font backend should not reorder glyphs. But, perhaps > > Uniscribe backend reorders Arabic text, right? > No, not AFAIK. We call the ScriptItemize API of Uniscribe with NULL > as the 4th and 5th arguments, which AFAIU should disable reordering. > Perhaps Jason could chime in and tell if I'm right here. I read the function uniscribe_shape roughly. It has this code: for (i = 0; i < nitems; i++) { int nglyphs, nchars_in_run, rtl = items[i].a.fRTL ? -1 : 1; [...] if (SUCCEEDED (result)) { int j, nclusters, from, to; from = rtl > 0 ? 0 : nchars_in_run - 1; Doesn't it mean uniscribe_shape reorders glyphs? > Btw, does the current code support Arabic ligatures and shaping on > GNU/Linux? I don't know about ligatures, but at least these should be supported by libotf and m17n-lib with OpenType fonts. o glyph substitution of consonants depending on where it is; beginning, middle, or end of a word. o glyph positioning of vowels --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-01 2:17 ` Kenichi Handa @ 2010-09-01 6:11 ` Eli Zaretskii 2010-09-01 7:08 ` Kenichi Handa 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-01 6:11 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Wed, 01 Sep 2010 11:17:03 +0900 > > I read the function uniscribe_shape roughly. It has this > code: > > for (i = 0; i < nitems; i++) > { > int nglyphs, nchars_in_run, rtl = items[i].a.fRTL ? -1 : 1; > [...] > if (SUCCEEDED (result)) > { > int j, nclusters, from, to; > > from = rtl > 0 ? 0 : nchars_in_run - 1; > > Doesn't it mean uniscribe_shape reorders glyphs? This reorders a single LGSTRING, according to my reading. Isn't an LGSTRING a single grapheme cluster, rather than several distinct characters? Btw, where's the documentation of LGSTRING? The commentary to uniscribe_shape says to look in font-make-gstring, but I cannot find that, neither as function nor as variable. In general, everything about compositions and lgstrings needs a lot more of documentation. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-01 6:11 ` Eli Zaretskii @ 2010-09-01 7:08 ` Kenichi Handa 2010-09-01 17:55 ` Eli Zaretskii 2010-09-02 11:53 ` Eli Zaretskii 0 siblings, 2 replies; 31+ messages in thread From: Kenichi Handa @ 2010-09-01 7:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, jasonr In article <E1OqgXc-0001rS-49@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Doesn't it mean uniscribe_shape reorders glyphs? > This reorders a single LGSTRING, according to my reading. Isn't an > LGSTRING a single grapheme cluster, rather than several distinct > characters? No, LGSTRING may contain multiple grapheme clusters. In the case of arabic, we make LGSTRING for one Arabic word then shape it (otherwise, the shaper can't know where in a word a consonant appears). So, usually LGSTRING contains multiple grapheme clusters for Arabic. Glyphs constituting a grapheme cluster has the same value in LGLYPH_FROM (G) and LGLYPH_TO (G) where G is a LGRYPH given by LGSTRING_GLYPH (LGLYPH, IDX). > Btw, where's the documentation of LGSTRING? The commentary to > uniscribe_shape says to look in font-make-gstring, but I cannot find > that, neither as function nor as variable. In general, everything > about compositions and lgstrings needs a lot more of documentation. I renamed font-make-gstring to composition-get-gstring and moved the code to composite.c. The above macros for accessing LGSTRING and LGLYPH are in composite.h. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-01 7:08 ` Kenichi Handa @ 2010-09-01 17:55 ` Eli Zaretskii 2010-09-02 2:13 ` Jason Rumney 2010-09-02 11:53 ` Eli Zaretskii 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-01 17:55 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Wed, 01 Sep 2010 16:08:50 +0900 > > No, LGSTRING may contain multiple grapheme clusters. In the > case of arabic, we make LGSTRING for one Arabic word then > shape it (otherwise, the shaper can't know where in a word a > consonant appears). So, usually LGSTRING contains multiple > grapheme clusters for Arabic. I indeed see under a debugger that the variable rtl gets a negative value when HELLO is displayed, which means uniscribe_shape tries to reorder the glyphs, which is wrong, because they are already reordered by xdisp.c. But there's something else at work here, because even if I force rtl to be always 1, the display is still wrong and only slightly different. Also, it looks like uniscribe_shape is repeatedly called from font-shape-gstring to shape the same text that is progressively shortened. For example, the first call will be with a 7-character string whose contents is {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} The next call is with a 6-character string whose contents is {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. Note that the first 7-character string is the first word of the Arabic greeting, properly bidi-reordered for display. Are these series of calls expected? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-01 17:55 ` Eli Zaretskii @ 2010-09-02 2:13 ` Jason Rumney 0 siblings, 0 replies; 31+ messages in thread From: Jason Rumney @ 2010-09-02 2:13 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Kenichi Handa On 02/09/2010 01:55, Eli Zaretskii wrote: > Also, it looks like uniscribe_shape is repeatedly called from > font-shape-gstring to shape the same text that is progressively > shortened. For example, the first call will be with a 7-character > string whose contents is > > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} > > The next call is with a 6-character string whose contents is > > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} > > then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. > This seems consistent with the output that is produced, which seems to repeat sequences within the text. Regarding the double reordering, when I originally wrote the code for uniscribe, I did not understand how the bidi support for Emacs was going to work, so I included some bidi reordering in the uniscribe handling. Probably this needs to be reviewed now. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-01 7:08 ` Kenichi Handa 2010-09-01 17:55 ` Eli Zaretskii @ 2010-09-02 11:53 ` Eli Zaretskii 2010-09-02 13:01 ` Kenichi Handa 1 sibling, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-02 11:53 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Wed, 01 Sep 2010 16:08:50 +0900 > > LGSTRING may contain multiple grapheme clusters. In the > case of arabic, we make LGSTRING for one Arabic word then > shape it (otherwise, the shaper can't know where in a word a > consonant appears). Where can I find the code which decides how to break text into LGSTRINGs? I'd like to see such code for both Arabic and Hebrew, unless it's the same code. For example, can characters like digits or other neutrals be included in the same LGSTRING with Arabic and Hebrew? Or will an LGSTRING always include characters from one script only? I'm asking because it's possible that we will need to modify w32uniscribe.c to reorder R2L characters before we pass them to the Uniscribe ScriptShape API, to let it see the characters in the logical order it expects them. That's if it turns out that Uniscribe cannot otherwise shape them correctly. TIA ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-02 11:53 ` Eli Zaretskii @ 2010-09-02 13:01 ` Kenichi Handa 2010-09-02 14:04 ` Eli Zaretskii 0 siblings, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-09-02 13:01 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, jasonr In article <E1Or8Lz-0004if-He@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > Where can I find the code which decides how to break text into > LGSTRINGs? I'd like to see such code for both Arabic and Hebrew, > unless it's the same code. A not-yet-shaped LGSTRING is created by autocmp_chars (composite.c) from a character sequence matching with a regular expression PATTERN stored in a composition-function-table. This pattern is "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), and a more complicated regex for Hebrew (lisp/language/hebrew.el). > For example, can characters like digits or other neutrals be included > in the same LGSTRING with Arabic and Hebrew? Or will an LGSTRING > always include characters from one script only? LGSTRING always includes characters of the same font. So, even if you wrote PATTERN to include the other neutrals, if a user's font setting (or environment) decides to user a different font for those neutrals, they are not included in LGSTRING. By default, Emacs tries to use the same font for characters in the same script. In addition, even if you setup fonts to use the same font for, for instance, Hebrew and those neutrals, "shape" method of a font-backend may not support them. In that case, the composition fails anyway. > I'm asking because it's possible that we will need to modify > w32uniscribe.c to reorder R2L characters before we pass them to the > Uniscribe ScriptShape API, to let it see the characters in the logical > order it expects them. That's if it turns out that Uniscribe cannot > otherwise shape them correctly. ??? Currently characters and glyphs in LGSTRING are always in logical order. A "shape" method should also shape that LGSTRING in logical order. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-02 13:01 ` Kenichi Handa @ 2010-09-02 14:04 ` Eli Zaretskii 2010-09-03 1:00 ` Kenichi Handa 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-02 14:04 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Thu, 02 Sep 2010 22:01:07 +0900 > > A not-yet-shaped LGSTRING is created by autocmp_chars > (composite.c) from a character sequence matching with a > regular expression PATTERN stored in a > composition-function-table. This pattern is > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), > and a more complicated regex for Hebrew > (lisp/language/hebrew.el). Thanks. So character compositions are used not only to compose several characters into one glyph, but also to break text into individually shaped chunks, is that right? If so, auto-composition-mode cannot be turned off for scripts that need this kind of "grouped shaping" without degrading the presentation of these scripts to the point of illegibility? > > I'm asking because it's possible that we will need to modify > > w32uniscribe.c to reorder R2L characters before we pass them to the > > Uniscribe ScriptShape API, to let it see the characters in the logical > > order it expects them. That's if it turns out that Uniscribe cannot > > otherwise shape them correctly. > > ??? Currently characters and glyphs in LGSTRING are always > in logical order. See my mail from yesterday, where I describe that I see in GDB that Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual order: http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html That is why I asked the question in the first place. What am I missing? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-02 14:04 ` Eli Zaretskii @ 2010-09-03 1:00 ` Kenichi Handa 2010-09-03 9:16 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Kenichi Handa @ 2010-09-03 1:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, jasonr In article <E1OrAPF-0000Gn-K7@fencepost.gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > A not-yet-shaped LGSTRING is created by autocmp_chars > > (composite.c) from a character sequence matching with a > > regular expression PATTERN stored in a > > composition-function-table. This pattern is > > "[\u0600-\u06FF]+" for Arabic (lisp/language/misc-lang.el), > > and a more complicated regex for Hebrew > > (lisp/language/hebrew.el). > Thanks. So character compositions are used not only to compose > several characters into one glyph, but also to break text into > individually shaped chunks, is that right? Yes. > If so, auto-composition-mode cannot be turned off for scripts that > need this kind of "grouped shaping" without degrading the presentation > of these scripts to the point of illegibility? Yes. And auto-composition-mode cannot be turned off for any scripts that it is not enough to display glyphs corresponding to characters; they are all Indics, some East Asians, Arabic, Hebrew, etc. In this respect, Ababic is not special. Even for some Indics, LGSTRING may contain multibyte grapheme clusters. > > > I'm asking because it's possible that we will need to modify > > > w32uniscribe.c to reorder R2L characters before we pass them to the > > > Uniscribe ScriptShape API, to let it see the characters in the logical > > > order it expects them. That's if it turns out that Uniscribe cannot > > > otherwise shape them correctly. > > > > ??? Currently characters and glyphs in LGSTRING are always > > in logical order. > See my mail from yesterday, where I describe that I see in GDB that > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual > order: > http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html In this mail, you wrote: > Also, it looks like uniscribe_shape is repeatedly called from > font-shape-gstring to shape the same text that is progressively > shortened. For example, the first call will be with a 7-character > string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} and this character sequence is surely in logical order. So I don't know why you think uniscribe_shape is given a LGSTRING of visual order. > The next call is with a 6-character string whose contents is > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} > then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. > Note that the first 7-character string is the first word of the Arabic > greeting, properly bidi-reordered for display. > Are these series of calls expected? No. I don't know why that happens on Windows. On Ubuntu, when I visit a file that contains only these lines: ------------------------------------------------------------ Arabic السّلام ;;; Local Variables: ;;; bidi-display-reordering: t ;;; End: ------------------------------------------------------------ font-shape-gstring is called just once. As the lgstring is getting shorter each time, it seems that composition fails each time. autocmp_chars is mainly called from composition_reseat_it. Could you please trace the code after the first call of autocmp_chars, and find why Emacs descides that a composition fails. --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 1:00 ` Kenichi Handa @ 2010-09-03 9:16 ` Eli Zaretskii 2010-09-03 10:18 ` David Kastrup 2010-09-03 11:08 ` Kenichi Handa 2010-09-03 13:25 ` Eli Zaretskii 2010-09-04 15:29 ` Eli Zaretskii 2 siblings, 2 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-09-03 9:16 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Fri, 03 Sep 2010 10:00:02 +0900 > > > If so, auto-composition-mode cannot be turned off for scripts that > > need this kind of "grouped shaping" without degrading the presentation > > of these scripts to the point of illegibility? > > Yes. And auto-composition-mode cannot be turned off for any > scripts that it is not enough to display glyphs > corresponding to characters; they are all Indics, some East > Asians, Arabic, Hebrew, etc. Are you sure Hebrew belongs to this list? What Hebrew characters need to be shaped together, but still displayed as separate glyphs (as opposed to the diacriticals which are composed into the same glyph with the base character)? > > The next call is with a 6-character string whose contents is > > > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627} > > > then a 5-character string {0x627, 0x644, 0x633, 0x651, 0x644}, etc. > > As the lgstring is getting shorter each time, it seems that > composition fails each time. > > autocmp_chars is mainly called from composition_reseat_it. > Could you please trace the code after the first call of > autocmp_chars, and find why Emacs descides that a > composition fails. Will do. Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 9:16 ` Eli Zaretskii @ 2010-09-03 10:18 ` David Kastrup 2010-09-03 11:08 ` Kenichi Handa 1 sibling, 0 replies; 31+ messages in thread From: David Kastrup @ 2010-09-03 10:18 UTC (permalink / raw) To: emacs-bidi; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Kenichi Handa <handa@m17n.org> >> Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org >> Date: Fri, 03 Sep 2010 10:00:02 +0900 >> >> > If so, auto-composition-mode cannot be turned off for scripts that >> > need this kind of "grouped shaping" without degrading the presentation >> > of these scripts to the point of illegibility? >> >> Yes. And auto-composition-mode cannot be turned off for any >> scripts that it is not enough to display glyphs >> corresponding to characters; they are all Indics, some East >> Asians, Arabic, Hebrew, etc. > > Are you sure Hebrew belongs to this list? What Hebrew characters need > to be shaped together, but still displayed as separate glyphs (as > opposed to the diacriticals which are composed into the same glyph > with the base character)? I'd think that the letter combinations tsvey vovn וו and tsvey yudn יי (in Yiddish likely represented with their own characters װ and ײ, also of interest ױ) might call for common shaping in more sophisticated fonts. But I have no actual clue. -- David Kastrup ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 9:16 ` Eli Zaretskii 2010-09-03 10:18 ` David Kastrup @ 2010-09-03 11:08 ` Kenichi Handa 2010-09-03 14:54 ` Eli Zaretskii 1 sibling, 1 reply; 31+ messages in thread From: Kenichi Handa @ 2010-09-03 11:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, jasonr In article <83r5hbi3ur.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > Yes. And auto-composition-mode cannot be turned off for any > > scripts that it is not enough to display glyphs > > corresponding to characters; they are all Indics, some East > > Asians, Arabic, Hebrew, etc. > Are you sure Hebrew belongs to this list? What Hebrew characters need > to be shaped together, but still displayed as separate glyphs (as > opposed to the diacriticals which are composed into the same glyph > with the base character)? ??? I didn't write such a thing. What I listed are scripts "that it is not enough to display glyphs corresponding to characters". More precisely, "... that it is not enough to display glyphs corresponding to characters at normal positions suggested by each glyph metrics.". --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 11:08 ` Kenichi Handa @ 2010-09-03 14:54 ` Eli Zaretskii 0 siblings, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-09-03 14:54 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, jasonr, emacs-devel > From: Kenichi Handa <handa@m17n.org> > Date: Fri, 03 Sep 2010 20:08:55 +0900 > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org > > In article <83r5hbi3ur.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > > > > Yes. And auto-composition-mode cannot be turned off for any > > > scripts that it is not enough to display glyphs > > > corresponding to characters; they are all Indics, some East > > > Asians, Arabic, Hebrew, etc. > > > Are you sure Hebrew belongs to this list? What Hebrew characters need > > to be shaped together, but still displayed as separate glyphs (as > > opposed to the diacriticals which are composed into the same glyph > > with the base character)? > > ??? I didn't write such a thing. What I listed are scripts > "that it is not enough to display glyphs corresponding to > characters". More precisely, "... that it is not enough to > display glyphs corresponding to characters at normal > positions suggested by each glyph metrics.". Sorry for my misunderstanding. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 1:00 ` Kenichi Handa 2010-09-03 9:16 ` Eli Zaretskii @ 2010-09-03 13:25 ` Eli Zaretskii 2010-09-04 7:13 ` Eli Zaretskii 2010-09-04 15:29 ` Eli Zaretskii 2 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-03 13:25 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Fri, 03 Sep 2010 10:00:02 +0900 > > > > > I'm asking because it's possible that we will need to modify > > > > w32uniscribe.c to reorder R2L characters before we pass them to the > > > > Uniscribe ScriptShape API, to let it see the characters in the logical > > > > order it expects them. That's if it turns out that Uniscribe cannot > > > > otherwise shape them correctly. > > > > > > ??? Currently characters and glyphs in LGSTRING are always > > > in logical order. > > > See my mail from yesterday, where I describe that I see in GDB that > > Arabic characters in LGSTRINGs arrive to uniscribe_shape in visual > > order: > > > http://lists.gnu.org/archive/html/emacs-devel/2010-09/msg00029.html > > In this mail, you wrote: > > > Also, it looks like uniscribe_shape is repeatedly called from > > font-shape-gstring to shape the same text that is progressively > > shortened. For example, the first call will be with a 7-character > > string whose contents is > > > {0x627, 0x644, 0x633, 0x651, 0x644, 0x627, 0x645} > > and this character sequence is surely in logical order. So > I don't know why you think uniscribe_shape is given a > LGSTRING of visual order. Sorry, you are right. I got fooled by the fact that the end of the string is almost a mirror image of its beginning. There's something I'm missing in how character compositions and font shaping work together with bidi reordering. I need to understand that to figure out what, if anything, needs to be fixed in uniscribe_shape to get it to work correctly. So let me describe how the bidi reordering works and my understanding of how it interacts with character compositions, and ask you to correct any inaccuracies and fill in the blanks. Thanks in advance. There are two use-cases that bidi reordering supports. The first one is reordering in left-to-right paragraphs, containing mostly L2R text with embedded R2L characters. I will call this "the L2R paragraph" case. The other use-case is reordering in right-to-left paragraphs, which typically almost entirely consist of R2L characters with embedded L2R letters, digits, and other characters that are displayed left to right. I call this "the R2L paragraph" case. For L2R paragraphs, runs of R2L characters are delivered in reverse order (ignoring for the moment complications caused by directional override control characters). When the bidi iterator bumps into an R2L character, it scans forward until the end of the run, then begins to go back delivering the characters, thus reversing them on display. When the run of R2L characters is exhausted, the iterator jumps to the end of the run and resumes its normal forward scan. For R2L paragraphs, runs of R2L characters are delivered in their buffer's logical order, without reversing them. L2R characters in such paragraphs _are_ reversed, by the same process of scanning forward past them, then delivering them back to front. This produces a mirror image of the line as it should be displayed, wherein the character to be displayed the rightmost is the first glyph we produce. To mirror the line into its correct order, the PRODUCE_GLYPHS macro, which calls the produce_glyphs method of the terminal-specific redisplay interface, _prepends_ each new glyph to those already produced for the glyph row, rather than appending them in the L2R paragraph case. To illustrate, if we have a buffer with the following contents (capital letters represent R2L characters): ABCD foo then the bidi iterator will produce the characters in this order: ABCD oof and then PRODUCE_GLYPHS will mirror them into foo DCBA which is the correct visual order. Note that in both cases, the glyph row generated by the above procedure is drawn from left to right by the terminal-specific method that delivers glyphs to the glass. That method draws glyphs one by one in the order they are stored in the glyph row. No reordering happens on this level, and in fact this level is totally ignorant about the text directionality. Enter character compositions. During the buffer scan that delivers characters to PRODUCE_GLYPHS, if the next character to be delivered is a composed character, then composition_reseat_it and next_element_from_composition are called. If they succeed to compose the character with one or more following characters, the whole sequence of characters that where composed is recorded in the glyph row as a single element of type IT_COMPOSITION. This single element is expanded into the actual font glyphs when the glyph row is drawn by the terminal-specific draw_glyphs method. The bidi reordering treats this single element as if it were a single glyph, and thus does not reorder its font glyphs. So this single element winds up in the glyph row in the position corresponding to the first character of the composed sequence. The question is: in what order should the font glyphs be held in the LGSTRING returned by the font driver's `shape' method? Let's take an example. Suppose we have a L2R paragraph in a buffer with this contents: foobar ABCDE and suppose that "ABCDE" will be shaped by the font driver's `shape' method into a logical-order sequence of glyphs "XYZ". Since this is a L2R paragraph, and since no reordering will happen to "XYZ" when it is delivered to the glass, it must be stored in the LGSTRING in the visual order, i.e. "ZYX", with X being the first character to be read and the rightmost to display, Y the second, etc. Now suppose we have a R2L paragraph: ABCDE foobar The mirroring of the glyph row in PRODUCE_GLYPHS will now produce foobar XYZ because it treats "XYZ" as a single element. Again, no reordering will happen to "XYZ" when it is drawn on the terminal. So again, we need "XYZ" to be stored in visual order, i.e. "ZYX". You say that the contents of LGSTRING passed to the `shape' method are in logical order. The conclusion from the above seems to be that we need to have the `shape' method reorder the shaped glyphs into visual order. Is that what happens with the libotf driver? does it indeed reorder R2L glyphs it returns after reshaping? If not, how does a reshaped sequence of glyphs winds up correctly on display? Even if everything I said above is correct, there are complications. ABCDE could be inside an embedding with left to right override, like this: foobar RLO ABCDE PDF This should be displayed as foobar ABCDE i.e., "ABCDE" is not reordered, but displayed in the logical order, as forced by RLO. Therefore, the reshaped "XYZ" should also be displayed left to right: foobar XYZ But, if I understand correctly how composition works, the auto-composed sequence in this case will still be just "XYZ", without the RLO and PDF control characters. So the `shape' method of the font driver will still see just "XYZ" in the LGSTRING, without the control characters, and will reorder "XYZ", which is incorrect. If we need the `shape' method to reorder glyphs, then in order for it do its job correctly, we need to give it the entire bidi context of the string we are asking it to reshape. In the above example, we need to tell it about the override directive, i.e. pass it "ABCDE" with surrounding RLO and PDF controls. This flies in the face of the current design, which separates reordering from glyph shaping. So the conclusion is that we need the `shape' method to return the reshaped glyphs in the logical order, and then reorder them afterwards. If this is correct, we need to make 2 changes: . change the interface to the `shape' method, so that the reshaped LGSTRING holds glyphs in the logical order . modify fill_gstring_glyph_string to reorder glyphs when it puts them into a glyph_string structure Am I missing something? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 13:25 ` Eli Zaretskii @ 2010-09-04 7:13 ` Eli Zaretskii 2010-09-06 6:04 ` Kenichi Handa 0 siblings, 1 reply; 31+ messages in thread From: Eli Zaretskii @ 2010-09-04 7:13 UTC (permalink / raw) To: handa; +Cc: emacs-bidi, jasonr, emacs-devel > Date: Fri, 03 Sep 2010 16:25:49 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: emacs-bidi@gnu.org, emacs-devel@gnu.org, jasonr@gnu.org > > Am I missing something? I think I found what I was missing. This part: > During the buffer scan that delivers characters to PRODUCE_GLYPHS, if > the next character to be delivered is a composed character, then > composition_reseat_it and next_element_from_composition are called. > If they succeed to compose the character with one or more following > characters, the whole sequence of characters that where composed is > recorded in the glyph row as a single element of type IT_COMPOSITION. > This single element is expanded into the actual font glyphs when the > glyph row is drawn by the terminal-specific draw_glyphs method. The > bidi reordering treats this single element as if it were a single > glyph, and thus does not reorder its font glyphs. So this single > element winds up in the glyph row in the position corresponding to the > first character of the composed sequence. is inaccurate, and therefore leads to incorrect conclusions. A (hopefully) more correct description is this: During the buffer scan that delivers characters to PRODUCE_GLYPHS, if the next character to be delivered is a composed character, then composition_reseat_it and next_element_from_composition are called. If they succeed to compose the character with one or more following characters, the whole sequence of characters that where composed is recorded in the `struct composition_it' object that is part of the buffer iterator. The composed sequence could produce one or more font glyphs (called "grapheme clusters") on the screen. Each of these grapheme clusters is then delivered to PRODUCE_GLYPHS in the direction corresponding to the current bidi scan direction. In particular, if the bidi iterator currently scans the buffer backwards, the grapheme clusters are delivered back to front. This reorders the grapheme clusters as appropriate for the current bidi context. If this is correct, then the conclusion is that the font driver's `shape' method should return the grapheme clusters in LGSTRING in logical order; they will be reordered correctly by next_element_from_composition, composition_reseat_it, and set_iterator_to_next, as described above. Did I get it right this time? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-04 7:13 ` Eli Zaretskii @ 2010-09-06 6:04 ` Kenichi Handa 0 siblings, 0 replies; 31+ messages in thread From: Kenichi Handa @ 2010-09-06 6:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, jasonr, emacs-devel In article <83d3suhtg4.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > I think I found what I was missing. This part: [...] > is inaccurate, and therefore leads to incorrect conclusions. A > (hopefully) more correct description is this: > During the buffer scan that delivers characters to PRODUCE_GLYPHS, > if the next character to be delivered is a composed character, then > composition_reseat_it and next_element_from_composition are called. > If they succeed to compose the character with one or more following > characters, the whole sequence of characters that where composed is > recorded in the `struct composition_it' object that is part of the > buffer iterator. The composed sequence could produce one or more > font glyphs (called "grapheme clusters") on the screen. Each of > these grapheme clusters is then delivered to PRODUCE_GLYPHS in the > direction corresponding to the current bidi scan direction. In > particular, if the bidi iterator currently scans the buffer > backwards, the grapheme clusters are delivered back to front. This > reorders the grapheme clusters as appropriate for the current bidi > context. > If this is correct, Yes. > then the conclusion is that the font driver's > `shape' method should return the grapheme clusters in LGSTRING in > logical order; they will be reordered correctly by > next_element_from_composition, composition_reseat_it, and > set_iterator_to_next, as described above. > Did I get it right this time? Yes. One additional comment. A grapheme cluster in LGSTRING may contain multiple glyphs, and the order of those glyphs depends on a font backend, or even on a font, and are given to `draw' method of a font backend without reordering. It's the responsibility of 'shape' method to produce those glyphs in the order that 'draw' method expects. For instance, if LGSTRING has these LGLYPHS in this order: G0: (glyph for a char at position N) G1: (first glyph for chars at position N+1 to N+2) G2: (second glyph for chars at position N+1 to N+2) G3: (first glyph for chars at position N+3 to N+4) G4: (second glyph for chars at position N+3 to N+4) G5: (third glyph for chars at position N+3 to N+4) G6: (glyph for a char at position N+5) and we are producing glyphs backward, 'draw' method is given glyphs in this order: glyphs from G6 to G6 (both inclusive) glyphs from G3 to G5 (both inclusive) glyphs from G1 to G2 (both inclusive) glyphs from G0 to G0 (both inclusive) --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-03 1:00 ` Kenichi Handa 2010-09-03 9:16 ` Eli Zaretskii 2010-09-03 13:25 ` Eli Zaretskii @ 2010-09-04 15:29 ` Eli Zaretskii 2 siblings, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-09-04 15:29 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-bidi, emacs-devel, jasonr > From: Kenichi Handa <handa@m17n.org> > Cc: jasonr@gnu.org, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Fri, 03 Sep 2010 10:00:02 +0900 > > As the lgstring is getting shorter each time, it seems that > composition fails each time. No, the composition was succeeding. The problem was that uniscribe_shape tried to reorder the grapheme clusters returned by ScriptShape and ScriptPlace, and the result was that the FROM and TO members of the LGSTRING object were not as set_iterator_to_next expected. This caused the iterator to fail skipping the characters that were already composed, it would instead move only one character ahead. Thanks to all the hints and useful information in this thread, I think I succeeded to fix the code in uniscribe_shape, so now the display of Arabic looks okay to me. Arabic input also seems to work; at least Emacs no longer crashes. People who actually speak Arabic please check the latest development code to see that it indeed works correctly. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-30 14:06 ` Eli Zaretskii 2010-09-01 2:17 ` Kenichi Handa @ 2010-09-02 13:48 ` Jason Rumney 2010-09-02 14:49 ` Eli Zaretskii 1 sibling, 1 reply; 31+ messages in thread From: Jason Rumney @ 2010-09-02 13:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-bidi, emacs-devel, Kenichi Handa Eli Zaretskii <eliz@gnu.org> writes: > No, not AFAIK. We call the ScriptItemize API of Uniscribe with NULL > as the 4th and 5th arguments, which AFAIU should disable reordering. > Perhaps Jason could chime in and tell if I'm right here. The documentation seems to imply that, but it looks like items[i].a.fRTL is being set anyway according to how uniscribe thinks the direction should be. As well as removing the code that takes notice of the rtl flag and tries to reverse the output, you will probably have to set items[i].a.fLogicalOrder to 1 before calling ScriptShape to ensure logical order output from ScriptShape. -- Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-09-02 13:48 ` Jason Rumney @ 2010-09-02 14:49 ` Eli Zaretskii 0 siblings, 0 replies; 31+ messages in thread From: Eli Zaretskii @ 2010-09-02 14:49 UTC (permalink / raw) To: Jason Rumney; +Cc: emacs-bidi, emacs-devel, handa > From: Jason Rumney <jasonr@gnu.org> > Cc: Kenichi Handa <handa@m17n.org>, emacs-bidi@gnu.org, emacs-devel@gnu.org > Date: Thu, 02 Sep 2010 21:48:29 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > No, not AFAIK. We call the ScriptItemize API of Uniscribe with NULL > > as the 4th and 5th arguments, which AFAIU should disable reordering. > > Perhaps Jason could chime in and tell if I'm right here. > > The documentation seems to imply that, but it looks like items[i].a.fRTL > is being set anyway according to how uniscribe thinks the direction > should be. My interpretation of this is that the fRTL flag is set according to the explicit directionality of the character deduced solely from its codepoint, e.g. it is TRUE for Hebrew and Arabic letters and FALSE for the rest. By contrast, a "full Unicode bidirectional analysis" that ScriptItemize is advertised to perform when these arguments are non-NULL includes the full implementation of UAX#9, under which embeddings and implicit levels can affect the fRTL flag for characters whose inherent attributes would say otherwise. But that's a guess; the MS documentation is not very explicit on this, to say the least. > As well as removing the code that takes notice of the rtl flag and tries > to reverse the output, you will probably have to set > items[i].a.fLogicalOrder to 1 before calling ScriptShape to ensure > logical order output from ScriptShape. Right, thanks for the hint. However, given what Handa-san wrote, I'm now utterly confused regarding the issue of ordering between Emacs and Uniscribe. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Arabic support 2010-08-27 9:56 ` Eli Zaretskii 2010-08-30 7:47 ` Kenichi Handa @ 2010-09-06 13:45 ` Thamer Mahmoud 1 sibling, 0 replies; 31+ messages in thread From: Thamer Mahmoud @ 2010-09-06 13:45 UTC (permalink / raw) To: emacs-devel; +Cc: emacs-bidi [-- Attachment #1: Type: text/plain, Size: 1812 bytes --] >> From: Kenichi Handa <handa@m17n.org> >> Date: Thu, 26 Aug 2010 10:10:05 +0900 >> >> I've just committed changes to trunk for Arabic shaping. If >> there're any Arabic users in this list, please check the >> displaying of Arabic text. On GNU/Linux system, you must >> compile Emacs with libotf and m17n-lib (configure script >> should detect them automatically). > Thanks for working on this. Here is my take: * Attached are two screenshots showing the Arabic line from the HELLO file rendered by gedit and Emacs using the same font (Nazli-20 from ttf-farsiweb). Notice that in Emacs not all fonts have their LAM and ALIF properly replaced by the LAM-ALIF ligature. Also the diacritics (SHADDA) appears lower and less legible for the same font. * The third attachment shows that when highlighting a region of an Arabic word, the cursor at the edges of the visible selection "breaks" the shaping and reshapes the characters around it into their isolated form. This creates a wave-effect of moving characters with some visible artifacts and bad indention issues. * While the cursor is at a composed character (e.g., SEEN+SHADDA), pressing C-p moves point unexpectedly to the beginning of the current line. * I do at least see one "trap" with C-p, although it is hard to reproduce. You can try moving 4 or 5 lines below the Arabic line in the HELLO file, then move upward using 4-5 C-p and get the cursor at the SEEN+SHADDA. After which any further C-p jumps between SEEN and LAM-ALIF, never going to the previous line. * For those using Debian (Squeeze), I had to install not just the libm17n and libm17n-dev packages, but also m17n-db. It seems that the configure script doesn't detect or know about the status of (the Debian-specific) m17n-db. Thanks again, Thamer [-- Attachment #2: Emacs rendering --] [-- Type: image/png, Size: 3296 bytes --] [-- Attachment #3: Gedit rendering --] [-- Type: image/png, Size: 4864 bytes --] [-- Attachment #4: Select Region --] [-- Type: image/png, Size: 2628 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2010-09-27 5:56 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1934111520.880681283871336127.JavaMail.root@zimbra3-e1.priv.proxad.net> 2010-09-07 15:08 ` Re: Arabic support mhibti 2010-09-13 6:40 ` Eli Zaretskii 2010-09-16 2:07 ` Kenichi Handa 2010-09-22 3:54 ` Kenichi Handa 2010-09-22 7:33 ` Eli Zaretskii 2010-09-22 12:27 ` Thamer Mahmoud 2010-09-27 5:56 ` Kenichi Handa 2010-08-26 1:10 Kenichi Handa 2010-08-27 9:56 ` Eli Zaretskii 2010-08-30 7:47 ` Kenichi Handa 2010-08-30 14:06 ` Eli Zaretskii 2010-09-01 2:17 ` Kenichi Handa 2010-09-01 6:11 ` Eli Zaretskii 2010-09-01 7:08 ` Kenichi Handa 2010-09-01 17:55 ` Eli Zaretskii 2010-09-02 2:13 ` Jason Rumney 2010-09-02 11:53 ` Eli Zaretskii 2010-09-02 13:01 ` Kenichi Handa 2010-09-02 14:04 ` Eli Zaretskii 2010-09-03 1:00 ` Kenichi Handa 2010-09-03 9:16 ` Eli Zaretskii 2010-09-03 10:18 ` David Kastrup 2010-09-03 11:08 ` Kenichi Handa 2010-09-03 14:54 ` Eli Zaretskii 2010-09-03 13:25 ` Eli Zaretskii 2010-09-04 7:13 ` Eli Zaretskii 2010-09-06 6:04 ` Kenichi Handa 2010-09-04 15:29 ` Eli Zaretskii 2010-09-02 13:48 ` Jason Rumney 2010-09-02 14:49 ` Eli Zaretskii 2010-09-06 13:45 ` Thamer Mahmoud
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.