From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Behdad Esfahbod Newsgroups: gmane.emacs.bugs Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n) Date: Thu, 20 Dec 2018 15:45:50 -0500 Message-ID: References: <20181213203102.GF2244@macbook.localdomain> <83h8fghcpo.fsf@gnu.org> <20181214075056.GI2244@macbook.localdomain> <8336r0h1cb.fsf@gnu.org> <20181214110316.GK2244@macbook.localdomain> <83y38sfcme.fsf@gnu.org> <83tvjgf7ux.fsf@gnu.org> <83mup4du5z.fsf@gnu.org> <8336qsc9fl.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000039de7057d7a3814" X-Trace: blaine.gmane.org 1545338711 16999 195.159.176.226 (20 Dec 2018 20:45:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 20 Dec 2018 20:45:11 +0000 (UTC) Cc: Khaled Hosny , 33729@debbugs.gnu.org, Mohammad Nasirifar , kaushal.modi@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 20 21:45:06 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga5Bp-0004JK-IY for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Dec 2018 21:45:05 +0100 Original-Received: from localhost ([::1]:39790 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ga5Dw-000334-6v for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Dec 2018 15:47:16 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37970) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ga5Dl-000322-Qd for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 15:47:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ga5Di-0005Mc-7X for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 15:47:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52513) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ga5Di-0005M8-3o for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 15:47:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ga5Dh-0001zJ-SJ for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 15:47:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Behdad Esfahbod Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 20 Dec 2018 20:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33729 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 33729-submit@debbugs.gnu.org id=B33729.15453387727582 (code B ref 33729); Thu, 20 Dec 2018 20:47:01 +0000 Original-Received: (at 33729) by debbugs.gnu.org; 20 Dec 2018 20:46:12 +0000 Original-Received: from localhost ([127.0.0.1]:56771 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga5Ct-0001yE-Ub for submit@debbugs.gnu.org; Thu, 20 Dec 2018 15:46:12 -0500 Original-Received: from mail-lj1-f171.google.com ([209.85.208.171]:36175) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga5Cr-0001y0-29 for 33729@debbugs.gnu.org; Thu, 20 Dec 2018 15:46:09 -0500 Original-Received: by mail-lj1-f171.google.com with SMTP id g11-v6so2804269ljk.3 for <33729@debbugs.gnu.org>; Thu, 20 Dec 2018 12:46:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qiztYbbWTjmkNotaTEfjLPxecDdh6Y8MJ72zjmNgeRE=; b=nEH9heIpZvf8cfrdXOZBRHEYORYRLMSRc7eACZ/DLs8N540Lzbbj5nZ4txqr3ueihp c28pYsGIccDLCYWxZiGy73pE2WrKbfRloWOKhSFWqPNiBmWIYRHC0PWzUfbXv6wHdmBx KPqJ4UwmAEmNLl6kcKHmsw2h47ZrkXLmdWXeW9q8L/5YT2PRoetv6dcOoQoP01pVa3j1 XQEDXSk/FrM2R/htoBuO9U4W0E4sJ9yflDwQgaSSH1CnRKiF8/2y+7D7JqY1w8p5ZBSz XkTd/s9jM6UEErcySpEtmGxH8c+m43K+rBEQavAKwxH7hHdIrUsYG1UTNO32H0vEPtZg DEDQ== X-Gm-Message-State: AA+aEWZoQ7HzKL6+J2ue/j+uZTQvQonSnVHff9HyIVLwnGosvz0Ua8lo KcqjciybTchixVTAk+80kqHsGZpvlv8Ldw4Lp+c= X-Google-Smtp-Source: AFSGD/WO7fqX0kO5W6YJY5W9DHMJBNd6iAnhQbvfdyNZuixlZh1fULf9t2jcDNaTogaRaokY/PVBn4JkDaTjfRjWQNI= X-Received: by 2002:a2e:1b47:: with SMTP id b68-v6mr13847877ljb.104.1545338762795; Thu, 20 Dec 2018 12:46:02 -0800 (PST) In-Reply-To: <8336qsc9fl.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:153651 Archived-At: --000000000000039de7057d7a3814 Content-Type: text/plain; charset="UTF-8" Sounds good to me. On Thu, Dec 20, 2018 at 1:58 PM Eli Zaretskii wrote: > Ping! Could someone on the Harfbuzz team please comment on the > thoughts below? Khaled, Mohammad, Behdad? > > > Date: Mon, 17 Dec 2018 17:55:52 +0200 > > From: Eli Zaretskii > > Cc: dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org, > > far.nasiri.m@gmail.com, kaushal.modi@gmail.com > > > > > From: Glenn Morris > > > Cc: far.nasiri.m@gmail.com, dr.khaled.hosny@gmail.com, > behdad@behdad.org, 33729@debbugs.gnu.org, kaushal.modi@gmail.com > > > Date: Sun, 16 Dec 2018 19:30:00 -0500 > > > > > > > After some thinking, my conclusion is that we should import the > > > > ISO 15924 database from https://unicode.org/iso15924/, use a script > > > > similar to admin/unidata/blocks.awk to generate an alist from it that > > > > maps Emacs script names to ISO 15924 tags, and then access that alist > > > > from uni_script to get the correct script information to Harfbuzz. > > > > > > > > Patches implementing that are welcome. > > > > > > I live to write awk scripts. I'm not 100% sure what you want, but as a > > > first example, the following takes > > > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt > > > as input and outputs lines of the form "(gujr . gujarati)". > > > > > > The aliases are so that the RHS matches charscript.el. > > > > > > If this is not right, please clarify exactly what the inputs and output > > > should be. > > > > Thanks. > > > > It turns out I didn't have this figured out completely, and your > > proposal forced me to dig some more into the relevant parts of Unicode > > and Emacs. I found a few additional issues and considerations; for at > > least some of them I'd like to hear the opinions of the Harfbuzz > > developers. > > > > Here are the issues: > > > > . Contrary to my original thoughts, I now tend to think that a > > separate char-table, say char-iso159240tag-table, that maps > > character codepoints directly to the script tags, is a better > > solution: > > - it will allow a faster look up, obviously > > - the subdivision of characters into scripts, as shown in > > Unicode's Scripts.txt, is slightly different from what > > char-script-table does, so a simple mapping from Emacs scripts > > to ISO 15924 script tag will not do. For example, many > > characters Emacs puts into 'latin' or 'symbol' scripts are in > > the Common script according to Scripts.txt, and similarly for > > the Inherited script. I imagine this is important for > > Harfbuzz. > > > > . Whether to produce the character-to-script-tag mapping using the > > UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the > > canonical ISO 15924 tags from https://unicode.org/iso15924/, > > depends on whether the slight differences mentioned in > > https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter > > for Harfbuzz. For example, ISO 15924 has separate tags for the > > Fraktur and Gaelic varieties of the Latin script: does this > > distinction matter for Harfbuzz? > > > > . Does Harfbuzz handle the issues mentioned in > > https://www.unicode.org/reports/tr24/#Script_Anomalies, and in > > particular the use case of decomposed characters which yield a > > different script than their precomposed variants? This use case is > > quite common in handling of character compositions, so it's > > important to understand its implications before we decide on the > > implementation. > > > > To summarize, unless the Harfbuzz guys advise differently, I'd prefer > > processing Scripts.txt and PropertyValueAliases.txt into a list > > similar to the one we produce in charscript.el, then generate a > > char-table from that list. > > > > Thanks again for working on this. > > > > > > > > > -- behdad http://behdad.org/ --000000000000039de7057d7a3814 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Sounds good to me.

On Thu, Dec 20, 2018 at 1:58 PM Eli Zaretskii <eliz@gnu.org> wrote:
Ping!=C2=A0 Could someone on the Harfbuzz = team please comment on the
thoughts below?=C2=A0 Khaled, Mohammad, Behdad?

> Date: Mon, 17 Dec 2018 17:55:52 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: dr.= khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0far.nasiri.m@gmail.com, kaushal.modi@gmail.com
>
> > From: Glenn Morris <rgm@gnu.org>
> > Cc: f= ar.nasiri.m@gmail.com,=C2=A0 dr.khaled.hosny@gmail.com,=C2=A0 behdad@behdad.org,=C2=A0 33729@debbugs.gnu.org,=C2=A0 kaush= al.modi@gmail.com
> > Date: Sun, 16 Dec 2018 19:30:00 -0500
> >
> > > After some thinking, my conclusion is that we should import = the
> > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > similar to admin/unidata/blocks.awk to generate an alist fro= m it that
> > > maps Emacs script names to ISO 15924 tags, and then access t= hat alist
> > > from uni_script to get the correct script information to Har= fbuzz.
> > >
> > > Patches implementing that are welcome.
> >
> > I live to write awk scripts. I'm not 100% sure what you want,= but as a
> > first example, the following takes
> >
http://www.unicode.or= g/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > as input and outputs lines of the form "(gujr . gujarati)&qu= ot;.
> >
> > The aliases are so that the RHS matches charscript.el.
> >
> > If this is not right, please clarify exactly what the inputs and = output
> > should be.
>
> Thanks.
>
> It turns out I didn't have this figured out completely, and your > proposal forced me to dig some more into the relevant parts of Unicode=
> and Emacs.=C2=A0 I found a few additional issues and considerations; f= or at
> least some of them I'd like to hear the opinions of the Harfbuzz > developers.
>
> Here are the issues:
>
>=C2=A0 . Contrary to my original thoughts, I now tend to think that a >=C2=A0 =C2=A0 separate char-table, say char-iso159240tag-table, that ma= ps
>=C2=A0 =C2=A0 character codepoints directly to the script tags, is a be= tter
>=C2=A0 =C2=A0 solution:
>=C2=A0 =C2=A0 =C2=A0- it will allow a faster look up, obviously
>=C2=A0 =C2=A0 =C2=A0- the subdivision of characters into scripts, as sh= own in
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Unicode's Scripts.txt, is slightly diffe= rent from what
>=C2=A0 =C2=A0 =C2=A0 =C2=A0char-script-table does, so a simple mapping = from Emacs scripts
>=C2=A0 =C2=A0 =C2=A0 =C2=A0to ISO 15924 script tag will not do.=C2=A0 F= or example, many
>=C2=A0 =C2=A0 =C2=A0 =C2=A0characters Emacs puts into 'latin' o= r 'symbol' scripts are in
>=C2=A0 =C2=A0 =C2=A0 =C2=A0the Common script according to Scripts.txt, = and similarly for
>=C2=A0 =C2=A0 =C2=A0 =C2=A0the Inherited script.=C2=A0 I imagine this i= s important for
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Harfbuzz.
>
>=C2=A0 . Whether to produce the character-to-script-tag mapping using t= he
>=C2=A0 =C2=A0 UCD files, such as Scripts.txt and PropertyValueAliases.t= xt, or the
>=C2=A0 =C2=A0 canonical ISO 15924 tags from https://unicode.org/iso= 15924/,
>=C2=A0 =C2=A0 depends on whether the slight differences mentioned in >=C2=A0 =C2=A0 https://www.unicode.org/= reports/tr24/#Relation_To_ISO15924 matter
>=C2=A0 =C2=A0 for Harfbuzz.=C2=A0 For example, ISO 15924 has separate t= ags for the
>=C2=A0 =C2=A0 Fraktur and Gaelic varieties of the Latin script: does th= is
>=C2=A0 =C2=A0 distinction matter for Harfbuzz?
>
>=C2=A0 . Does Harfbuzz handle the issues mentioned in
>=C2=A0 =C2=A0 https://www.unicode.org/repo= rts/tr24/#Script_Anomalies, and in
>=C2=A0 =C2=A0 particular the use case of decomposed characters which yi= eld a
>=C2=A0 =C2=A0 different script than their precomposed variants?=C2=A0 T= his use case is
>=C2=A0 =C2=A0 quite common in handling of character compositions, so it= 's
>=C2=A0 =C2=A0 important to understand its implications before we decide= on the
>=C2=A0 =C2=A0 implementation.
>
> To summarize, unless the Harfbuzz guys advise differently, I'd pre= fer
> processing Scripts.txt and PropertyValueAliases.txt into a list
> similar to the one we produce in charscript.el, then generate a
> char-table from that list.
>
> Thanks again for working on this.
>
>
>
>


--
--000000000000039de7057d7a3814--