From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n) Date: Thu, 20 Dec 2018 20:58:06 +0200 Message-ID: <8336qsc9fl.fsf@gnu.org> References: <20181213203102.GF2244@macbook.localdomain> <83h8fghcpo.fsf@gnu.org> <20181214075056.GI2244@macbook.localdomain> <8336r0h1cb.fsf@gnu.org> <20181214110316.GK2244@macbook.localdomain> <83y38sfcme.fsf@gnu.org> <83tvjgf7ux.fsf@gnu.org> <83mup4du5z.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1545333695 8349 195.159.176.226 (20 Dec 2018 19:21:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 20 Dec 2018 19:21:35 +0000 (UTC) Cc: 33729@debbugs.gnu.org, kaushal.modi@gmail.com To: dr.khaled.hosny@gmail.com, behdad@behdad.org,, far.nasiri.m@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Dec 20 20:21:30 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga3su-0001zw-EF for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Dec 2018 20:21:28 +0100 Original-Received: from localhost ([::1]:39536 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ga3v1-0006rj-70 for geb-bug-gnu-emacs@m.gmane.org; Thu, 20 Dec 2018 14:23:39 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39432) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ga3XG-0004vj-Pf for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 13:59:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ga3XC-00024h-H4 for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 13:59:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52479) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ga3XC-00024T-Cb for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 13:59:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ga3XC-0003og-BB for bug-gnu-emacs@gnu.org; Thu, 20 Dec 2018 13:59:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 20 Dec 2018 18:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33729 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 33729-submit@debbugs.gnu.org id=B33729.154533230114613 (code B ref 33729); Thu, 20 Dec 2018 18:59:02 +0000 Original-Received: (at 33729) by debbugs.gnu.org; 20 Dec 2018 18:58:21 +0000 Original-Received: from localhost ([127.0.0.1]:56737 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga3WX-0003nc-64 for submit@debbugs.gnu.org; Thu, 20 Dec 2018 13:58:21 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:57740) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ga3WU-0003nO-Tr for 33729@debbugs.gnu.org; Thu, 20 Dec 2018 13:58:19 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ga3WO-00013D-Jm for 33729@debbugs.gnu.org; Thu, 20 Dec 2018 13:58:13 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42745) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ga3WJ-0000yC-Mq; Thu, 20 Dec 2018 13:58:07 -0500 Original-Received: from [176.228.60.248] (port=3934 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ga3WH-0000tp-Ox; Thu, 20 Dec 2018 13:58:07 -0500 In-reply-to: <83mup4du5z.fsf@gnu.org> (message from Eli Zaretskii on Mon, 17 Dec 2018 17:55:52 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:153647 Archived-At: Ping! Could someone on the Harfbuzz team please comment on the thoughts below? Khaled, Mohammad, Behdad? > Date: Mon, 17 Dec 2018 17:55:52 +0200 > From: Eli Zaretskii > Cc: dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org, > far.nasiri.m@gmail.com, kaushal.modi@gmail.com > > > From: Glenn Morris > > Cc: far.nasiri.m@gmail.com, dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org, kaushal.modi@gmail.com > > Date: Sun, 16 Dec 2018 19:30:00 -0500 > > > > > After some thinking, my conclusion is that we should import the > > > ISO 15924 database from https://unicode.org/iso15924/, use a script > > > similar to admin/unidata/blocks.awk to generate an alist from it that > > > maps Emacs script names to ISO 15924 tags, and then access that alist > > > from uni_script to get the correct script information to Harfbuzz. > > > > > > Patches implementing that are welcome. > > > > I live to write awk scripts. I'm not 100% sure what you want, but as a > > first example, the following takes > > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt > > as input and outputs lines of the form "(gujr . gujarati)". > > > > The aliases are so that the RHS matches charscript.el. > > > > If this is not right, please clarify exactly what the inputs and output > > should be. > > Thanks. > > It turns out I didn't have this figured out completely, and your > proposal forced me to dig some more into the relevant parts of Unicode > and Emacs. I found a few additional issues and considerations; for at > least some of them I'd like to hear the opinions of the Harfbuzz > developers. > > Here are the issues: > > . Contrary to my original thoughts, I now tend to think that a > separate char-table, say char-iso159240tag-table, that maps > character codepoints directly to the script tags, is a better > solution: > - it will allow a faster look up, obviously > - the subdivision of characters into scripts, as shown in > Unicode's Scripts.txt, is slightly different from what > char-script-table does, so a simple mapping from Emacs scripts > to ISO 15924 script tag will not do. For example, many > characters Emacs puts into 'latin' or 'symbol' scripts are in > the Common script according to Scripts.txt, and similarly for > the Inherited script. I imagine this is important for > Harfbuzz. > > . Whether to produce the character-to-script-tag mapping using the > UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the > canonical ISO 15924 tags from https://unicode.org/iso15924/, > depends on whether the slight differences mentioned in > https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter > for Harfbuzz. For example, ISO 15924 has separate tags for the > Fraktur and Gaelic varieties of the Latin script: does this > distinction matter for Harfbuzz? > > . Does Harfbuzz handle the issues mentioned in > https://www.unicode.org/reports/tr24/#Script_Anomalies, and in > particular the use case of decomposed characters which yield a > different script than their precomposed variants? This use case is > quite common in handling of character compositions, so it's > important to understand its implications before we decide on the > implementation. > > To summarize, unless the Harfbuzz guys advise differently, I'd prefer > processing Scripts.txt and PropertyValueAliases.txt into a list > similar to the one we produce in charscript.el, then generate a > char-table from that list. > > Thanks again for working on this. > > > >