From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n) Date: Mon, 17 Dec 2018 17:55:52 +0200 Message-ID: <83mup4du5z.fsf@gnu.org> References: <20181213203102.GF2244@macbook.localdomain> <83h8fghcpo.fsf@gnu.org> <20181214075056.GI2244@macbook.localdomain> <8336r0h1cb.fsf@gnu.org> <20181214110316.GK2244@macbook.localdomain> <83y38sfcme.fsf@gnu.org> <83tvjgf7ux.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1545062109 1885 195.159.176.226 (17 Dec 2018 15:55:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 17 Dec 2018 15:55:09 +0000 (UTC) Cc: dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org, far.nasiri.m@gmail.com, kaushal.modi@gmail.com To: Glenn Morris Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 17 16:55:04 2018 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gYvEW-0000LB-Ci for geb-bug-gnu-emacs@m.gmane.org; Mon, 17 Dec 2018 16:55:04 +0100 Original-Received: from localhost ([::1]:47559 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gYvGd-0004aZ-7T for geb-bug-gnu-emacs@m.gmane.org; Mon, 17 Dec 2018 10:57:15 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gYvGU-0004aG-6B for bug-gnu-emacs@gnu.org; Mon, 17 Dec 2018 10:57:07 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gYvGP-0006eK-W4 for bug-gnu-emacs@gnu.org; Mon, 17 Dec 2018 10:57:06 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:47746) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gYvGP-0006eB-Pz for bug-gnu-emacs@gnu.org; Mon, 17 Dec 2018 10:57:01 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gYvGP-00012m-Mz for bug-gnu-emacs@gnu.org; Mon, 17 Dec 2018 10:57:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 17 Dec 2018 15:57:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33729 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 33729-submit@debbugs.gnu.org id=B33729.15450621643931 (code B ref 33729); Mon, 17 Dec 2018 15:57:01 +0000 Original-Received: (at 33729) by debbugs.gnu.org; 17 Dec 2018 15:56:04 +0000 Original-Received: from localhost ([127.0.0.1]:52004 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gYvFT-00011K-Mr for submit@debbugs.gnu.org; Mon, 17 Dec 2018 10:56:04 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:60549) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gYvFR-00010o-OE for 33729@debbugs.gnu.org; Mon, 17 Dec 2018 10:56:02 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gYvFK-0005tV-NZ for 33729@debbugs.gnu.org; Mon, 17 Dec 2018 10:55:56 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:38770) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gYvFD-0005nl-GS; Mon, 17 Dec 2018 10:55:47 -0500 Original-Received: from [176.228.60.248] (port=3863 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gYvFB-0005hs-UW; Mon, 17 Dec 2018 10:55:46 -0500 In-reply-to: (message from Glenn Morris on Sun, 16 Dec 2018 19:30:00 -0500) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:153514 Archived-At: > From: Glenn Morris > Cc: far.nasiri.m@gmail.com, dr.khaled.hosny@gmail.com, behdad@behdad.org, 33729@debbugs.gnu.org, kaushal.modi@gmail.com > Date: Sun, 16 Dec 2018 19:30:00 -0500 > > > After some thinking, my conclusion is that we should import the > > ISO 15924 database from https://unicode.org/iso15924/, use a script > > similar to admin/unidata/blocks.awk to generate an alist from it that > > maps Emacs script names to ISO 15924 tags, and then access that alist > > from uni_script to get the correct script information to Harfbuzz. > > > > Patches implementing that are welcome. > > I live to write awk scripts. I'm not 100% sure what you want, but as a > first example, the following takes > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt > as input and outputs lines of the form "(gujr . gujarati)". > > The aliases are so that the RHS matches charscript.el. > > If this is not right, please clarify exactly what the inputs and output > should be. Thanks. It turns out I didn't have this figured out completely, and your proposal forced me to dig some more into the relevant parts of Unicode and Emacs. I found a few additional issues and considerations; for at least some of them I'd like to hear the opinions of the Harfbuzz developers. Here are the issues: . Contrary to my original thoughts, I now tend to think that a separate char-table, say char-iso159240tag-table, that maps character codepoints directly to the script tags, is a better solution: - it will allow a faster look up, obviously - the subdivision of characters into scripts, as shown in Unicode's Scripts.txt, is slightly different from what char-script-table does, so a simple mapping from Emacs scripts to ISO 15924 script tag will not do. For example, many characters Emacs puts into 'latin' or 'symbol' scripts are in the Common script according to Scripts.txt, and similarly for the Inherited script. I imagine this is important for Harfbuzz. . Whether to produce the character-to-script-tag mapping using the UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the canonical ISO 15924 tags from https://unicode.org/iso15924/, depends on whether the slight differences mentioned in https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter for Harfbuzz. For example, ISO 15924 has separate tags for the Fraktur and Gaelic varieties of the Latin script: does this distinction matter for Harfbuzz? . Does Harfbuzz handle the issues mentioned in https://www.unicode.org/reports/tr24/#Script_Anomalies, and in particular the use case of decomposed characters which yield a different script than their precomposed variants? This use case is quite common in handling of character compositions, so it's important to understand its implications before we decide on the implementation. To summarize, unless the Harfbuzz guys advise differently, I'd prefer processing Scripts.txt and PropertyValueAliases.txt into a list similar to the one we produce in charscript.el, then generate a char-table from that list. Thanks again for working on this.