From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Rah Guzar Newsgroups: gmane.emacs.bugs Subject: bug#50951: Fwd: bug#50951: 28.0.50; Urdu text is not displayed correctly Date: Sat, 2 Oct 2021 13:43:47 +0200 Message-ID: References: <83mtnsc63i.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000085be705cd5d32e8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3397"; mail-complaints-to="usenet@ciao.gmane.io" To: 50951@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Oct 02 13:47:04 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mWdTr-0000hS-W1 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 02 Oct 2021 13:47:04 +0200 Original-Received: from localhost ([::1]:40924 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mWdTp-0008Rl-UO for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 02 Oct 2021 07:47:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52248) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mWdRu-0006mY-Kl for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 07:45:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46567) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mWdRu-0003W5-CN for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 07:45:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mWdRu-00080Y-9T for bug-gnu-emacs@gnu.org; Sat, 02 Oct 2021 07:45:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Rah Guzar Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 02 Oct 2021 11:45:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 50951 X-GNU-PR-Package: emacs Original-Received: via spool by 50951-submit@debbugs.gnu.org id=B50951.163317504730688 (code B ref 50951); Sat, 02 Oct 2021 11:45:02 +0000 Original-Received: (at 50951) by debbugs.gnu.org; 2 Oct 2021 11:44:07 +0000 Original-Received: from localhost ([127.0.0.1]:58107 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mWdR1-0007ys-5A for submit@debbugs.gnu.org; Sat, 02 Oct 2021 07:44:07 -0400 Original-Received: from mail-wr1-f48.google.com ([209.85.221.48]:44970) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mWdQy-0007yF-Ey for 50951@debbugs.gnu.org; Sat, 02 Oct 2021 07:44:05 -0400 Original-Received: by mail-wr1-f48.google.com with SMTP id d6so19726991wrc.11 for <50951@debbugs.gnu.org>; Sat, 02 Oct 2021 04:44:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=V9UPVYDz0lroiHg5hsJ4bC6iaSu/buSggfdkeDM4wpo=; b=dytU2IHZLsnV+1TEX+yCfokHh99V/RLzJVDzHs62bxjHyA+1bSk35MSvaCCvnYXdK1 t75GzrR4fhn1Xl9YbLh0ZBXZD0xucv2NuXeAH7BIzbwNHAtu81/gZh34cxk8Lm62gWxM cR77t+fODV1PjX4k2YrjTy3JVLfNhZj6S1qaBhv2BzOrGTvzQoo09iPEPlxZxLicb+si 4CvRBvq8znTZ4z4wqxSvQVP5w75pNED3ldAri8Fx10BO7me5re/OlWeQcwqll9doyPb0 3nTFYFzxsuC/0iYVXW1u0zW1NT3itQ9pSnYhylMU+tM+J91GCUA/Cnjc9oCVZyKJcIro OX8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=V9UPVYDz0lroiHg5hsJ4bC6iaSu/buSggfdkeDM4wpo=; b=Lz8hniYoEN/QKnHk+nPJ6spUu05wUWkDaS+KnCWMOOoN1f5TkqlI2AFJvdGSMCfdqt Ukf1DuGh/+4mesK4cpsqmzOrwNpSgEkopV3n5Fi59jpbMrOFcMBv7fho8xIQhMZavw4S QSvjJaFFAmPYRSUK9XNQQABnXbRCxFARBjjOFAZfY1CyC5N4jShS/m77aN7Kp7fMKbdn IlspFF9fteOCHDjw8aSZZr+DX0WDW40iJ0I4r8G/cXme5VS+uvK9TcEBrgrrkAhG9xxC V6M1Ir+zeT3TVBpebQpGB7BmEElpIl58E97u34JbjSahOkKNNSO6qtuMTfbj9qIGGEHt BXtQ== X-Gm-Message-State: AOAM5331owbGHjaqDH5DDK5nuddAi5m+KxQTkBNcz8qH6rqaqFxg3VaL KI9T4iROwuWAEm/yUDCPOG1evzquJnGT5KhrJtOE1kYy X-Google-Smtp-Source: ABdhPJyT/lqgqAwiUTu7zphbOtXZSeKl2jyyXmeBBikTu0vR4p6d2c/7rK6Y+ceh2tSjsNzyB6KiNQz2q2px+uOak38= X-Received: by 2002:a5d:4601:: with SMTP id t1mr2955779wrq.298.1633175038613; Sat, 02 Oct 2021 04:43:58 -0700 (PDT) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:216130 Archived-At: --000000000000085be705cd5d32e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I forgot to reply all for my reply and it didn't go to the mailing list. Sorry about that and I am forwarding it to the mailing list now. ---------- Forwarded message --------- From: Rah Guzar Date: Sat, Oct 2, 2021 at 1:40 PM Subject: Re: bug#50951: 28.0.50; Urdu text is not displayed correctly To: Eli Zaretskii Hi, Thanks a lot for the reply. On Sat, Oct 2, 2021 at 8:07 AM Eli Zaretskii wrote: > -10-01T21:49:10,611532571+02:00.png > > Can you give a few specific examples of characters that should be > joined, but aren't? Please name the characters and also give they > positions relative to the beginning of this text, as I don't read > Urdu, so the images are useless for me without some additional data > and explanations. > Let us consider the word =D9=86=DB=81=DB=8C=DA=BA It is composed of four letters. I will use character field from `describe-char` for each of them below 1) =D9=86=E2=80=8E (displayed as =D9=86=E2=80=8E) (codepoint 1606, #o3106, = #x646) 2) =DB=81=E2=80=8E (displayed as =DB=81=E2=80=8E) (codepoint 1729, #o3301,= #x6c1) 3) =DB=8C=E2=80=8E (displayed as =DB=8C=E2=80=8E) (codepoint 1740, #o3314,= #x6cc) 4) =DA=BA=E2=80=8E (displayed as =DA=BA=E2=80=8E) (codepoint 1722, #o3272, = #x6ba) It should be displayed with all 4 characters joined together, instead they are all displayed individually. If I change to `NotoNastaliqUrdu` this word is displayed correctly. But there is problem with =D8=AD=D8=B1=D9=81 It consist of three letters, 1) =D8=AD=E2=80=8E (displayed as =D8=AD=E2=80=8E) (codepoint 1581, #o3055, = #x62d) 2) =D8=B1=E2=80=8E (displayed as =D8=B1=E2=80=8E) (codepoint 1585, #o3061, = #x631) 3) =D9=81=E2=80=8E (displayed as =D9=81=E2=80=8E) (codepoint 1601, #o3101, = #x641) The first two characters should be joined and the last one should be on its own. This seems to be the case. But the two groups are rendered on top of each other making it illegible. So isn't this a matter of finding a proper font, in particularly given > the "Nastaliq vs Naskh" issues? NotoNastaliqUrdu is not the only font > supporting Nastaliq, so perhaps other fonts fare better? > My knowledge here is very deficient but my impression is Nastaliq and Naskh are styles and shouldn't affect composition. NotoNastaliqUrdu was the only Urdu font available from my distro. Libreoffice which also uses harfbuzz renders it correctly so I didn't try another font at first. Like emacs libreoffice also uses a Naskh font by default but all the characters are joined properly. I did try some fonts from https://urdufonts.net/ after your suggestions and they render correctly. Specifically the font I tried were: Jameel Noori Nastaleeq Regular Alvi Nastaleeq Zohra Unicode Manzor Unicode I didn't notice a problem with any of them except a very minor one for the last two which have visible boundaries where glyphs are joined. Since Urdu uses the Arabic characters, Emacs uses character > composition rules for Arabic when displaying this text. Do you know > if the composition rules for Urdu are different? > I think using Arabic composition rules might be part of the problem. Urdu alphabet is a superset of Arabic alphabet and if I don't set a font specifically designed for Urdu, the words where some characters should be joined but aren't always seem to include a character like =DB=81 which is in Urdu alphabet but not in Arabic= . Also, which version of HarfBuzz do you have installed? > It is 2.9.1 Please let me know if you need any more information. Thanks a lot again. --000000000000085be705cd5d32e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I forgot to reply all for my reply and it didn't = go to the mailing list. Sorry about that and I am forwarding it
<= div>to the mailing list now.

---------- Forwarded message -------= --
From: Rah Guzar <aikrahguzar@gmail= .com>
Date: Sat, Oct 2, 2021 at 1:40 PM
Subject: Re: bu= g#50951: 28.0.50; Urdu text is not displayed correctly
To: Eli Zaretskii= <eliz@gnu.org>


=
Hi,
=C2=A0 Thanks a lot for the reply.
<= /div>
O= n Sat, Oct 2, 2021 at 8:07 AM Eli Zaretskii <eliz@gnu.org> wrote:
-10-01T21:49:10,611532571+02:00.png

Can you give a few specific examples of characters that should be
joined, but aren't?=C2=A0 Please name the characters and also give they=
positions relative to the beginning of this text, as I don't read
Urdu, so the images are useless for me without some additional data
and explanations.

Let us consider the w= ord =D9=86=DB=81=DB=8C=DA=BA

It is composed of fou= r letters. I will use character field from `describe-char` for each of them= below
1) =D9=86=E2=80=8E (displayed as =D9=86=E2=80=8E) (co= depoint 1606, #o3106, #x646)
2)=C2=A0 =DB=81=E2=80=8E (displayed = as =DB=81=E2=80=8E) (codepoint 1729, #o3301, #x6c1)
3)=C2=A0 =DB= =8C=E2=80=8E (displayed as =DB=8C=E2=80=8E) (codepoint 1740, #o3314, #x6cc)=
4) =DA=BA=E2=80=8E (displayed as =DA=BA=E2=80=8E) (codepoint 172= 2, #o3272, #x6ba)

It should be displayed with all = 4 characters joined together, instead they are all displayed individually.<= /div>
If I change to `NotoNastaliqUrdu` this word is displayed correctl= y. But there is problem with=C2=A0=C2=A0 =D8=AD=D8=B1=D9=81

<= /div>
It consist of three letters,
1) =D8=AD=E2=80=8E (displa= yed as =D8=AD=E2=80=8E) (codepoint 1581, #o3055, #x62d)
2) =D8=B1= =E2=80=8E (displayed as =D8=B1=E2=80=8E) (codepoint 1585, #o3061, #x631)
3) =D9=81=E2=80=8E (displayed as =D9=81=E2=80=8E) (codepoint 1601, = #o3101, #x641)

The first two characters should be = joined and the last one should be on its own. This seems to be the case.
But the two groups are rendered on top of each other making it ille= gible.

So isn't this a matter of finding a proper font, in particularly given<= br> the "Nastaliq vs Naskh" issues?=C2=A0 NotoNastaliqUrdu is not the= only font
supporting Nastaliq, so perhaps other fonts fare better?
=C2=A0
My knowledge here is very deficient but my impression i= s Nastaliq and Naskh are styles and shouldn't affect composition.
=
NotoNastaliqUrdu was the only Urdu font available from my distro.=C2= =A0 Libreoffice which also uses harfbuzz renders it
correctly so = I didn't try another font at first. Like emacs libreoffice also uses a = Naskh font by default but all the characters
are joined prop= erly.

I did try some fonts from https://urdufonts.net/ after you= r suggestions and they render correctly. Specifically the font I tried
were:
Jameel Noori Nastaleeq Regular
Alvi Na= staleeq=C2=A0
Zohra Unicode
Manzor Unicode

I didn't notice a problem with any of them except a ve= ry minor one for the last two which have visible boundaries where glyphs
are joined.=C2=A0

Since Urdu uses the Arabic characters, Emacs uses character
composition rules for Arabic when displaying this text.=C2=A0 Do you know if the composition rules for Urdu are different?

<= /div>
I think using Arabic composition rules might be part of the probl= em. Urdu alphabet is a superset of Arabic alphabet and if I
don&#= 39;t set a font specifically designed for Urdu, the words where some charac= ters should be joined but aren't always seem to
include a cha= racter like =DB=81 which is in Urdu alphabet but not in Arabic.
<= div>
Also, which version of HarfBuzz do you have installed?
It is 2.9.1

Please let me know if you need any mo= re information.

Thanks a lot again.
--000000000000085be705cd5d32e8--