* how reliable is rendering of complex scripts? @ 2015-10-02 5:39 Werner LEMBERG 2015-10-02 7:30 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Werner LEMBERG @ 2015-10-02 5:39 UTC (permalink / raw) To: emacs-devel Folks, I wonder how reliable emacs displays complex scripts like Devanagari or Arabic. For example, the maintainers of the HarfBuzz library did extensive comparisons of the rendering results with the MS engine to iron out zillions of small buglets in OpenType handling. AFAIK, Emacs relies on the m17n libraries, at least on GNU/Linux (no idea about other environments), controlling the OpenType handling (partially?) with Lisp code – are there test suites to compare the results? Werner ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-02 5:39 how reliable is rendering of complex scripts? Werner LEMBERG @ 2015-10-02 7:30 ` Eli Zaretskii 2015-10-04 4:39 ` Werner LEMBERG 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2015-10-02 7:30 UTC (permalink / raw) To: Werner LEMBERG; +Cc: Kenichi Handa, emacs-devel (CC'ing Handa-san, who wrote most of the relevant display code in Emacs, and also the m17n shaping engine libraries.) > Date: Fri, 02 Oct 2015 07:39:00 +0200 (CEST) > From: Werner LEMBERG <wl@gnu.org> > > I wonder how reliable emacs displays complex scripts like Devanagari > or Arabic. AFAIK, no one has ever performed a study about this, let alone repeated it when the relevant standards changed. > For example, the maintainers of the HarfBuzz library did extensive > comparisons of the rendering results with the MS engine to iron out > zillions of small buglets in OpenType handling. At least on MS-Windows, Emacs uses the MS engine directly, so some of similar buglets should not affect us on Windows. > AFAIK, Emacs relies on the m17n libraries, at least on GNU/Linux (no > idea about other environments), controlling the OpenType handling > (partially?) with Lisp code That is correct. And while the shaping engines, like libm17n-flt and Uniscribe, are beyond the scope of Emacs maintenance, the supporting Lisp and C code is on our table. However, we currently lack a maintainer in that area (have been lacking for a long time), so I guess we are not up to speed with the latest developments. I'm talking first and foremost about the definitions of character-composition patterns, which tell Emacs which sequences of characters should be rendered as a single grapheme cluster. There's a lot to do in this area for various languages. > are there test suites to compare the results? There's a test suite for bidirectional display, but it only tests the reordering of characters for display, not the shaping. There's nothing else, AFAIK. If you, or someone else, can work on adding one, that'd be great. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-02 7:30 ` Eli Zaretskii @ 2015-10-04 4:39 ` Werner LEMBERG 2015-10-04 7:07 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Werner LEMBERG @ 2015-10-04 4:39 UTC (permalink / raw) To: eliz; +Cc: handa, emacs-devel >> I wonder how reliable emacs displays complex scripts like >> Devanagari or Arabic. > > AFAIK, no one has ever performed a study about this, let alone > repeated it when the relevant standards changed. This essentially means that Emacs developers wait for users to report bad renderings, right? >> For example, the maintainers of the HarfBuzz library did extensive >> comparisons of the rendering results with the MS engine to iron out >> zillions of small buglets in OpenType handling. > > At least on MS-Windows, Emacs uses the MS engine directly, so some > of similar buglets should not affect us on Windows. Well, this makes Emacs on MS-Windows really superior to other platforms in this area, which is less than ideal... I mean `superior' in the sense that the rendering results on MS-Windows are well tested and can be trusted in general, something that is missing otherwise. >> AFAIK, Emacs relies on the m17n libraries, at least on GNU/Linux (no >> idea about other environments), controlling the OpenType handling >> (partially?) with Lisp code > > That is correct. And while the shaping engines, like libm17n-flt > and Uniscribe, are beyond the scope of Emacs maintenance, the > supporting Lisp and C code is on our table. However, we currently > lack a maintainer in that area (have been lacking for a long time), > so I guess we are not up to speed with the latest developments. I'm > talking first and foremost about the definitions of > character-composition patterns, which tell Emacs which sequences of > characters should be rendered as a single grapheme cluster. There's > a lot to do in this area for various languages. Given that HarfBuzz is very mature today, and that it has been extensively tested against Windows rendering results, and that it also contains a large corpus of test cases for complex ligatures together with a simple test TTY program (`hb-shape'), I suggest that someone (probably Ken'ichi) writes a similar test program for libm17n so that diffing would be possible. https://github.com/behdad/harfbuzz/tree/master/test/shaping https://github.com/behdad/harfbuzz/tree/master/util >> are there test suites to compare the results? > > There's a test suite for bidirectional display, but it only tests > the reordering of characters for display, not the shaping. There's > nothing else, AFAIK. If you, or someone else, can work on adding > one, that'd be great. In case someone is working on this issue, asking the HarfBuzz developer for assistance might be a good thing. I guess that they have even larger corpora that could be probably provided for testing purposes. Werner ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 4:39 ` Werner LEMBERG @ 2015-10-04 7:07 ` Eli Zaretskii 2015-10-04 8:09 ` Werner LEMBERG 2015-10-04 18:06 ` John Wiegley 0 siblings, 2 replies; 15+ messages in thread From: Eli Zaretskii @ 2015-10-04 7:07 UTC (permalink / raw) To: Werner LEMBERG; +Cc: handa, emacs-devel > Date: Sun, 04 Oct 2015 06:39:52 +0200 (CEST) > Cc: emacs-devel@gnu.org, handa@gnu.org > From: Werner LEMBERG <wl@gnu.org> > > >> I wonder how reliable emacs displays complex scripts like > >> Devanagari or Arabic. > > > > AFAIK, no one has ever performed a study about this, let alone > > repeated it when the relevant standards changed. > > This essentially means that Emacs developers wait for users to report > bad renderings, right? No, it means Emacs developers lack resources, knowledge, and motivated individuals aboard to do that. If you or someone else volunteers, that'd be great. Where we do have resources, we are generally ahead of HarfBuzz: e.g., the implementation of UAX#9 compliant with Unicode 6.3 - 8.0 was in Emacs many moons before it was anywhere else in the Free Software world, including HarfBuzz. > >> For example, the maintainers of the HarfBuzz library did extensive > >> comparisons of the rendering results with the MS engine to iron out > >> zillions of small buglets in OpenType handling. > > > > At least on MS-Windows, Emacs uses the MS engine directly, so some > > of similar buglets should not affect us on Windows. > > Well, this makes Emacs on MS-Windows really superior to other > platforms in this area, which is less than ideal... I mean `superior' > in the sense that the rendering results on MS-Windows are well tested > and can be trusted in general, something that is missing otherwise. That's not entirely true, because, as I said, part of the data and algorithms needed for complex script layout is in Emacs, and is used on all supported platforms, not just by the Windows build. > Given that HarfBuzz is very mature today, and that it has been > extensively tested against Windows rendering results, and that it also > contains a large corpus of test cases for complex ligatures together > with a simple test TTY program (`hb-shape'), I suggest that someone > (probably Ken'ichi) writes a similar test program for libm17n so that > diffing would be possible. > > https://github.com/behdad/harfbuzz/tree/master/test/shaping > https://github.com/behdad/harfbuzz/tree/master/util Thanks for the pointer. I will see if we can extract something from there to see how Emacs displays those scripts. > >> are there test suites to compare the results? > > > > There's a test suite for bidirectional display, but it only tests > > the reordering of characters for display, not the shaping. There's > > nothing else, AFAIK. If you, or someone else, can work on adding > > one, that'd be great. > > In case someone is working on this issue, asking the HarfBuzz > developer for assistance might be a good thing. I guess that they > have even larger corpora that could be probably provided for testing > purposes. No one is working on this, to the best of my knowledge. Once again, we lack individuals on board who understand these issues well enough to do that. A single bug in libm17n-flt reported a few months ago, discovered in a script whose support is not even in Emacs, was fixed by Handa-san, and that's about all the activity that I remember. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 7:07 ` Eli Zaretskii @ 2015-10-04 8:09 ` Werner LEMBERG 2015-10-04 9:12 ` Eli Zaretskii 2015-10-04 18:06 ` John Wiegley 1 sibling, 1 reply; 15+ messages in thread From: Werner LEMBERG @ 2015-10-04 8:09 UTC (permalink / raw) To: eliz; +Cc: handa, emacs-devel >> This essentially means that Emacs developers wait for users to >> report bad renderings, right? > > No, it means Emacs developers lack resources, knowledge, and > motivated individuals aboard to do that. If you or someone else > volunteers, that'd be great. I can only indirectly volunteer, namely by maintaining FreeType :-/ >> Well, this makes Emacs on MS-Windows really superior to other >> platforms in this area, which is less than ideal... I mean >> `superior' in the sense that the rendering results on MS-Windows >> are well tested and can be trusted in general, something that is >> missing otherwise. > > That's not entirely true, because, as I said, part of the data and > algorithms needed for complex script layout is in Emacs, and is used > on all supported platforms, not just by the Windows build. OK. >> https://github.com/behdad/harfbuzz/tree/master/test/shaping >> https://github.com/behdad/harfbuzz/tree/master/util > > I will see if we can extract something from there to see how Emacs > displays those scripts. The nice thing about `hb-shape' is that it doesn't output graphics but a textual representation of the GSUB and GPOS manipulation results (including reordering of the input). In other words, Emacs could use the same output format in a special test mode, making comparisons very simple – and automated! Werner ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 8:09 ` Werner LEMBERG @ 2015-10-04 9:12 ` Eli Zaretskii 2015-10-04 9:40 ` Werner LEMBERG 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2015-10-04 9:12 UTC (permalink / raw) To: Werner LEMBERG; +Cc: handa, emacs-devel > Date: Sun, 04 Oct 2015 10:09:03 +0200 (CEST) > Cc: emacs-devel@gnu.org, handa@gnu.org > From: Werner LEMBERG <wl@gnu.org> > > >> https://github.com/behdad/harfbuzz/tree/master/test/shaping > >> https://github.com/behdad/harfbuzz/tree/master/util > > > > I will see if we can extract something from there to see how Emacs > > displays those scripts. > > The nice thing about `hb-shape' is that it doesn't output graphics but > a textual representation of the GSUB and GPOS manipulation results > (including reordering of the input). In other words, Emacs could use > the same output format in a special test mode, making comparisons very > simple – and automated! Too bad. It means these tests cannot be easily used by Emacs, because (1) the shaping engine is not part of Emacs, and (2) to have something even approximately close, Someone™ will have to add code to composite.c, composite.el, etc. to produce such a textual description, and for that, that Someone™ will have to study and understand the code there enough to write the additions. Unless Handa-san will find time to do the main parts of this, of course. Perhaps a good first step would be for someone to produce pictures of the rendered texts from those tests (using Harfbuzz or anything else that can be used as reference), and then we could compare that with what Emacs produces for the same texts, and see how good or bad we are doing. Thanks. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 9:12 ` Eli Zaretskii @ 2015-10-04 9:40 ` Werner LEMBERG 2015-10-04 9:57 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Werner LEMBERG @ 2015-10-04 9:40 UTC (permalink / raw) To: eliz; +Cc: handa, emacs-devel >> The nice thing about `hb-shape' is that it doesn't output graphics >> but a textual representation of the GSUB and GPOS manipulation >> results (including reordering of the input). In other words, Emacs >> could use the same output format in a special test mode, making >> comparisons very simple – and automated! > > Too bad. Hmm. I consider this as a benefit, since comparison of images is much harder, especially if the exact environment differs. In particular, for two identical, correct GSUB/GPOS representations of a text string, you can easily get different rendering results for GDI and DWrite ClearType (on Windows), normal grayscale rendering and auto-hinting from FreeType, grayscale rendering on older Windows versions and Mac OS X, different color filters used on different platforms etc., etc. > It means these tests cannot be easily used by Emacs, because [...] Well, if you need graphic output, use `hb-view', which creates images of text strings. > Perhaps a good first step would be for someone to produce pictures > of the rendered texts from those tests (using Harfbuzz or anything > else that can be used as reference), and then we could compare that > with what Emacs produces for the same texts, and see how good or bad > we are doing. This is *exactly* what should be avoided IMHO. It's both far too much work and too imprecise. Werner ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 9:40 ` Werner LEMBERG @ 2015-10-04 9:57 ` Eli Zaretskii 0 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2015-10-04 9:57 UTC (permalink / raw) To: Werner LEMBERG; +Cc: handa, emacs-devel > Date: Sun, 04 Oct 2015 11:40:17 +0200 (CEST) > Cc: emacs-devel@gnu.org, handa@gnu.org > From: Werner LEMBERG <wl@gnu.org> > > > Perhaps a good first step would be for someone to produce pictures > > of the rendered texts from those tests (using Harfbuzz or anything > > else that can be used as reference), and then we could compare that > > with what Emacs produces for the same texts, and see how good or bad > > we are doing. > > This is *exactly* what should be avoided IMHO. It's both far too much > work and too imprecise. Not if it's the only practical alternative (besides doing nothing). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 7:07 ` Eli Zaretskii 2015-10-04 8:09 ` Werner LEMBERG @ 2015-10-04 18:06 ` John Wiegley 2015-10-04 19:45 ` Eli Zaretskii 1 sibling, 1 reply; 15+ messages in thread From: John Wiegley @ 2015-10-04 18:06 UTC (permalink / raw) To: emacs-devel >>>>> Eli Zaretskii <eliz@gnu.org> writes: > No one is working on this, to the best of my knowledge. Once again, we lack > individuals on board who understand these issues well enough to do that. Just FYI, I use Emacs to write in Arabic script fairly often (chatting in Persian via ERC). I haven't noticed any specific rendering or right-to-left issues yet, but do count me as someone with knowledge of and concern for these issues. I know nothing about the other non-Latin scripts, however, or what needs they might have. John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 18:06 ` John Wiegley @ 2015-10-04 19:45 ` Eli Zaretskii 2015-10-04 21:43 ` John Wiegley 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2015-10-04 19:45 UTC (permalink / raw) To: John Wiegley; +Cc: emacs-devel > From: "John Wiegley" <johnw@newartisans.com> > Date: Sun, 04 Oct 2015 11:06:32 -0700 > > Just FYI, I use Emacs to write in Arabic script fairly often (chatting in > Persian via ERC). I haven't noticed any specific rendering or right-to-left > issues yet, but do count me as someone with knowledge of and concern for these > issues. Thanks. One of the things that always bothered me are the composition rules for Arabic and Persian (see the end of lisp/language/misc-lang.el). Perhaps you could take a look at them, in particular the ZWJ and ZWNJ related rules, and other similar stuff. I think someone said in the past we lack some rules there. Also, the Harfbuzz test suite includes several text files that are supposed to exercise Arabic and Persian shaping, so if you know what is the correct display in all of the cases there, perhaps you could see if Emacs displays them correctly. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 19:45 ` Eli Zaretskii @ 2015-10-04 21:43 ` John Wiegley 2015-10-05 6:05 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: John Wiegley @ 2015-10-04 21:43 UTC (permalink / raw) To: emacs-devel >>>>> Eli Zaretskii <eliz@gnu.org> writes: > One of the things that always bothered me are the composition rules for > Arabic and Persian (see the end of lisp/language/misc-lang.el). Perhaps you > could take a look at them, in particular the ZWJ and ZWNJ related rules, and > other similar stuff. I think someone said in the past we lack some rules > there. Hmm, it looks fine here, but maybe I don't quite understand yet. Z and W are always "final" letters, making the J stand separate in ZWJ. The connective between N and J in ZWNJ looks as I'd expect. John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-04 21:43 ` John Wiegley @ 2015-10-05 6:05 ` Eli Zaretskii 2015-10-05 18:18 ` John Wiegley 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2015-10-05 6:05 UTC (permalink / raw) To: John Wiegley; +Cc: emacs-devel > From: John Wiegley <johnw@newartisans.com> > Date: Sun, 04 Oct 2015 14:43:12 -0700 > > >>>>> Eli Zaretskii <eliz@gnu.org> writes: > > > One of the things that always bothered me are the composition rules for > > Arabic and Persian (see the end of lisp/language/misc-lang.el). Perhaps you > > could take a look at them, in particular the ZWJ and ZWNJ related rules, and > > other similar stuff. I think someone said in the past we lack some rules > > there. > > Hmm, it looks fine here, but maybe I don't quite understand yet. Z and W are > always "final" letters, making the J stand separate in ZWJ. The connective > between N and J in ZWNJ looks as I'd expect. I meant ZWJ and ZWNJ the characters, U+200C and U+200D. Or did I misunderstand you? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-05 6:05 ` Eli Zaretskii @ 2015-10-05 18:18 ` John Wiegley 2015-10-05 19:22 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: John Wiegley @ 2015-10-05 18:18 UTC (permalink / raw) To: emacs-devel >>>>> Eli Zaretskii <eliz@gnu.org> writes: > I meant ZWJ and ZWNJ the characters, U+200C and U+200D. Or did I > misunderstand you? Ah, it was I who misunderstood, Eli. I don't use these characters when typing in Emacs, only when preparing texts in layout applications. The ZWNJ seems to work fine, but the ZWJ does not produce the behavior documented here: http://persian.nmelrc.org/persianword/zwj.htm That is, ZWJ does not extend the heh into a "heh do chesm" (heh with two eyes), it just adds space before or after the final-form (circular) heh. John ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-05 18:18 ` John Wiegley @ 2015-10-05 19:22 ` Eli Zaretskii 2015-10-06 16:17 ` Eli Zaretskii 0 siblings, 1 reply; 15+ messages in thread From: Eli Zaretskii @ 2015-10-05 19:22 UTC (permalink / raw) To: John Wiegley, Kenichi Handa; +Cc: emacs-devel > From: John Wiegley <johnw@newartisans.com> > Date: Mon, 05 Oct 2015 11:18:57 -0700 > > >>>>> Eli Zaretskii <eliz@gnu.org> writes: > > > I meant ZWJ and ZWNJ the characters, U+200C and U+200D. Or did I > > misunderstand you? > > Ah, it was I who misunderstood, Eli. I don't use these characters when typing > in Emacs, only when preparing texts in layout applications. > > The ZWNJ seems to work fine, but the ZWJ does not produce the behavior > documented here: > > http://persian.nmelrc.org/persianword/zwj.htm > > That is, ZWJ does not extend the heh into a "heh do chesm" (heh with two > eyes), it just adds space before or after the final-form (circular) heh. I tried to fix that, but I'm not sure it helped. Maybe I simply don't understand how to write entries for composition-function-table. Perhaps Handa-san could help. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: how reliable is rendering of complex scripts? 2015-10-05 19:22 ` Eli Zaretskii @ 2015-10-06 16:17 ` Eli Zaretskii 0 siblings, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2015-10-06 16:17 UTC (permalink / raw) To: johnw, handa; +Cc: emacs-devel > Date: Mon, 05 Oct 2015 22:22:15 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: emacs-devel@gnu.org > > > The ZWNJ seems to work fine, but the ZWJ does not produce the behavior > > documented here: > > > > http://persian.nmelrc.org/persianword/zwj.htm > > > > That is, ZWJ does not extend the heh into a "heh do chesm" (heh with two > > eyes), it just adds space before or after the final-form (circular) heh. > > I tried to fix that, but I'm not sure it helped. Maybe I simply don't > understand how to write entries for composition-function-table. > Perhaps Handa-san could help. I found a subtle bug in the Emacs bidi reordering engine that affected this. After fixing it, the display is more reasonable in these cases. The only remaining problem is display of sequences where the zero-width controls are not between 2 Arabic letters, e.g. a u+0647 followed by ZWJ and a newline, in a left-to-right paragraph, a relatively rare situation (it does work correctly in R2L paragraphs). The special requirements for treating ZWJ and ZWNJ during the bidi reordering are hard to implement, and the advice in the UBA is incompatible with the design of the Emacs display engine. I will have to think about this some more. One thing that still bothers me is that even though the shaping of the Arabic letters is clearly affected by ZWJ and ZWNJ after the fix, the "C-u C-x =" command doesn't say there was a character composition there. Perhaps Handa-san could explain why is that. Is it a bug? ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2015-10-06 16:17 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-10-02 5:39 how reliable is rendering of complex scripts? Werner LEMBERG 2015-10-02 7:30 ` Eli Zaretskii 2015-10-04 4:39 ` Werner LEMBERG 2015-10-04 7:07 ` Eli Zaretskii 2015-10-04 8:09 ` Werner LEMBERG 2015-10-04 9:12 ` Eli Zaretskii 2015-10-04 9:40 ` Werner LEMBERG 2015-10-04 9:57 ` Eli Zaretskii 2015-10-04 18:06 ` John Wiegley 2015-10-04 19:45 ` Eli Zaretskii 2015-10-04 21:43 ` John Wiegley 2015-10-05 6:05 ` Eli Zaretskii 2015-10-05 18:18 ` John Wiegley 2015-10-05 19:22 ` Eli Zaretskii 2015-10-06 16:17 ` Eli Zaretskii
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.