On Thu, Dec 13, 2018 at 3:31 PM Khaled Hosny wrote: > > The HarfBuzz rendering of Arabic is the correct one in this screenshot. > Thanks. So here's the status so far: Rendering of Namaste as seen in C-h h (M-x view-hello-file): | | harfbuzz | m17b | |----------+----------+---------| | Hindi | correct | correct | | Gujarati | wrong | correct | | Arabic | correct | wrong | > For debugging the such rendering differences, the actual font used by > Emacs for a given part of the text need to be known, I am using Mukta Vaani font for Gujarati. It is a free font and be downloaded from https://ektype.in/mukta-vaani.html. The string being rendered is "નમસ્તે". By placing the cursor on each of those characters and doing C-u x = (on the m17n build), I get: (1) ન position: 1610 of 3509 (46%), column: 32 character: ન (displayed as ન) (codepoint 2728, #o5250, #xaa8) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point in charset: 0x3968 script: gujarati syntax: w which means: word category: .:Base, L:Left-to-right (strong) to input: type "C-x 8 RET aa8" or "C-x 8 RET GUJARATI LETTER NA" buffer code: #xE0 #xAA #xA8 file code: #xE0 #xAA #xA8 (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x234) Character code properties: customize what to show name: GUJARATI LETTER NA general-category: Lo (Letter, Other) decomposition: (2728) ('ન') There are text properties here: charset mule-unicode-0100-24ff (2) મ position: 1611 of 3509 (46%), column: 33 character: મ (displayed as મ) (codepoint 2734, #o5256, #xaae) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point in charset: 0x396E script: gujarati syntax: w which means: word category: .:Base, L:Left-to-right (strong) to input: type "C-x 8 RET aae" or "C-x 8 RET GUJARATI LETTER MA" buffer code: #xE0 #xAA #xAE file code: #xE0 #xAA #xAE (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x239) Character code properties: customize what to show name: GUJARATI LETTER MA general-category: Lo (Letter, Other) decomposition: (2734) ('મ') There are text properties here: charset mule-unicode-0100-24ff (3) સ્તે position: 1612 of 3509 (46%), column: 34 character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point in charset: 0x3978 script: gujarati syntax: w which means: word category: .:Base, L:Left-to-right (strong) to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA" buffer code: #xE0 #xAA #xB8 file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix) display: composed to form "સ્તે" (see below) Composed with the following character(s) "્તે" using this font: xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 by these glyphs: [0 3 0 645 8 0 11 11 0 [0 0 8]] [0 3 2724 560 11 1 11 11 1 nil] [0 3 2759 589 0 -9 -2 16 -11 [-1 0 0]] Character code properties: customize what to show name: GUJARATI LETTER SA general-category: Lo (Letter, Other) decomposition: (2744) ('સ') There are text properties here: charset mule-unicode-0100-24ff ===== On harfbuzz build, the "સ્તે" part is different.. I can place the cursor separately on સ્ and તે, do C-u x = and I get: (3.1) સ્ position: 1612 of 3509 (46%), column: 34 character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point in charset: 0x3978 script: gujarati syntax: w which means: word category: .:Base, L:Left-to-right (strong) to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA" buffer code: #xE0 #xAA #xB8 file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x241) Character code properties: customize what to show name: GUJARATI LETTER SA general-category: Lo (Letter, Other) decomposition: (2744) ('સ') There are text properties here: charset mule-unicode-0100-24ff (3.2) તે position: 1614 of 3509 (46%), column: 35 character: ત (displayed as ત) (codepoint 2724, #o5244, #xaa4) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point in charset: 0x3964 script: gujarati syntax: w which means: word category: .:Base, L:Left-to-right (strong) to input: type "C-x 8 RET aa4" or "C-x 8 RET GUJARATI LETTER TA" buffer code: #xE0 #xAA #xA4 file code: #xE0 #xAA #xA4 (encoded by coding system utf-8-unix) display: by this font (glyph code) xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x230) Character code properties: customize what to show name: GUJARATI LETTER TA general-category: Lo (Letter, Other) decomposition: (2724) ('ત') There are text properties here: charset mule-unicode-0100-24ff then the text and > the font can be checked against vanilla HarfBuzz (e.g. using the hb-view > command line tool); if it gives the same rendering then it is either a > HarfBuzz or font issue, if not then it is a bug in the HarfBuzz > integration code in Emacs. >