From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#50951: Fwd: bug#50951: 28.0.50; Urdu text is not displayed correctly Date: Sat, 02 Oct 2021 15:18:28 +0300 Message-ID: <83sfxjbox7.fsf@gnu.org> References: <83mtnsc63i.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37996"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 50951@debbugs.gnu.org To: Rah Guzar Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 02 14:19:33 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mWdzH-0009fH-Uy for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 02 Oct 2021 14:19:31 +0200 Original-Received: from localhost ([::1]:32804 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mWdzG-00085t-UP for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 02 Oct 2021 08:19:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56994) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mWdyo-00084J-Fu for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 08:19:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46605) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mWdyo-000354-8M for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 08:19:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mWdyn-0000VD-V3 for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 08:19:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 02 Oct 2021 12:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50951 X-GNU-PR-Package: emacs Original-Received: via spool by 50951-submit@debbugs.gnu.org id=B50951.16331771321915 (code B ref 50951); Sat, 02 Oct 2021 12:19:01 +0000 Original-Received: (at 50951) by debbugs.gnu.org; 2 Oct 2021 12:18:52 +0000 Original-Received: from localhost ([127.0.0.1]:58151 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mWdye-0000Un-8u for submit@debbugs.gnu.org; Sat, 02 Oct 2021 08:18:52 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:51792) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mWdyd-0000Uc-IB for 50951@debbugs.gnu.org; Sat, 02 Oct 2021 08:18:51 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:51872) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mWdyY-0002uw-8D; Sat, 02 Oct 2021 08:18:46 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:1503 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mWdyX-0004WM-RP; Sat, 02 Oct 2021 08:18:46 -0400 In-Reply-To: (message from Rah Guzar on Sat, 2 Oct 2021 13:43:47 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:216135 Archived-At: > From: Rah Guzar > Date: Sat, 2 Oct 2021 13:43:47 +0200 > > Let us consider the word نہیں > > It is composed of four letters. I will use character field from `describe-char` for each of them below > 1) ن‎ (displayed as ن‎) (codepoint 1606, #o3106, #x646) > 2) ہ‎ (displayed as ہ‎) (codepoint 1729, #o3301, #x6c1) > 3) ی‎ (displayed as ی‎) (codepoint 1740, #o3314, #x6cc) > 4) ں‎ (displayed as ں‎) (codepoint 1722, #o3272, #x6ba) > > It should be displayed with all 4 characters joined together, instead they are all displayed individually. What font displays them individually? You should be able to tell that if you type "C-u C-x =" on one of these characters. For me, they display joined together. > If I change to `NotoNastaliqUrdu` this word is displayed correctly. But there is problem with حرف > > It consist of three letters, > 1) ح‎ (displayed as ح‎) (codepoint 1581, #o3055, #x62d) > 2) ر‎ (displayed as ر‎) (codepoint 1585, #o3061, #x631) > 3) ف‎ (displayed as ف‎) (codepoint 1601, #o3101, #x641) > > The first two characters should be joined and the last one should be on its own. This seems to be the case. > But the two groups are rendered on top of each other making it illegible. > > So isn't this a matter of finding a proper font, in particularly given > the "Nastaliq vs Naskh" issues? NotoNastaliqUrdu is not the only font > supporting Nastaliq, so perhaps other fonts fare better? > > My knowledge here is very deficient but my impression is Nastaliq and Naskh are styles and shouldn't affect > composition. > NotoNastaliqUrdu was the only Urdu font available from my distro. Libreoffice which also uses harfbuzz > renders it > correctly so I didn't try another font at first. Like emacs libreoffice also uses a Naskh font by default but all the > characters are joined properly. > > I did try some fonts from https://urdufonts.net/ after your suggestions and they render correctly. Specifically > the font I tried > were: > Jameel Noori Nastaleeq Regular > Alvi Nastaleeq > Zohra Unicode > Manzor Unicode > > I didn't notice a problem with any of them except a very minor one for the last two which have visible > boundaries where glyphs > are joined. So would it be correct to say that using a proper font solves the problem? > Since Urdu uses the Arabic characters, Emacs uses character > composition rules for Arabic when displaying this text. Do you know > if the composition rules for Urdu are different? > > I think using Arabic composition rules might be part of the problem. Urdu alphabet is a superset of Arabic > alphabet and if I > don't set a font specifically designed for Urdu, the words where some characters should be joined but aren't > always seem to > include a character like ہ which is in Urdu alphabet but not in Arabic. I don't think the problem is with compositions, because in the 2 examples you described above, there are no character compositions. Moreover, our pattern for asking HarfBuzz to shape Arabic text is this: "[\u0600-\u074F\u200C\u200D]+" which includes all of the characters, including U+06C1 which you say causes problems. You could try setting current-iso639-language to the symbol 'ur' (without the quotes), that should tell HarfBuzz to shape the text as appropriate for Urdu. But I think the real problem is with the font, not with shaping.