From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n) Date: Sun, 06 Jan 2019 18:03:55 +0200 Message-ID: <83wonhzsb8.fsf@gnu.org> References: <20181222154945.GE2244@macbook.localdomain> <83bm5d9wsc.fsf@gnu.org> <20181222205948.GF2244@macbook.localdomain> <838t0gapcj.fsf@gnu.org> <20181223135109.GA6568@macbook.localdomain> <83va3k8c79.fsf@gnu.org> <20181224020847.GC6568@macbook.localdomain> <83lg4e9a7q.fsf@gnu.org> <20181224173723.GH6568@macbook.localdomain> <83imzi94tz.fsf@gnu.org> <20190105211514.GB28761@macbook.localdomain> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1546790591 27955 195.159.176.226 (6 Jan 2019 16:03:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 6 Jan 2019 16:03:11 +0000 (UTC) Cc: behdad@behdad.org, far.nasiri.m@gmail.com, 33729@debbugs.gnu.org To: Khaled Hosny , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Jan 06 17:03:07 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ggAtG-00077t-8i for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Jan 2019 17:03:06 +0100 Original-Received: from localhost ([127.0.0.1]:40751 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ggAvN-0006ZR-9X for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Jan 2019 11:05:17 -0500 Original-Received: from eggsout.gnu.org ([209.51.188.92]:51657 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ggAvE-0006Y2-IV for bug-gnu-emacs@gnu.org; Sun, 06 Jan 2019 11:05:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ggAv8-0003XE-W0 for bug-gnu-emacs@gnu.org; Sun, 06 Jan 2019 11:05:08 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:51825) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ggAv8-0003VW-Er for bug-gnu-emacs@gnu.org; Sun, 06 Jan 2019 11:05:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ggAv8-00088b-8o for bug-gnu-emacs@gnu.org; Sun, 06 Jan 2019 11:05:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 06 Jan 2019 16:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 33729 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 33729-submit@debbugs.gnu.org id=B33729.154679066531226 (code B ref 33729); Sun, 06 Jan 2019 16:05:02 +0000 Original-Received: (at 33729) by debbugs.gnu.org; 6 Jan 2019 16:04:25 +0000 Original-Received: from localhost ([127.0.0.1]:48962 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ggAuW-00087a-TB for submit@debbugs.gnu.org; Sun, 06 Jan 2019 11:04:25 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:47948) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ggAuU-00087U-IW for 33729@debbugs.gnu.org; Sun, 06 Jan 2019 11:04:23 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ggAuS-00024d-64 for 33729@debbugs.gnu.org; Sun, 06 Jan 2019 11:04:22 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:38821) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ggAuK-0001qw-Oi; Sun, 06 Jan 2019 11:04:12 -0500 Original-Received: from [176.228.60.248] (port=1170 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ggAuJ-0002Rt-C9; Sun, 06 Jan 2019 11:04:11 -0500 In-reply-to: <20190105211514.GB28761@macbook.localdomain> (message from Khaled Hosny on Sat, 5 Jan 2019 23:15:14 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:154192 Archived-At: > Date: Sat, 5 Jan 2019 23:15:14 +0200 > From: Khaled Hosny > Cc: rgm@gnu.org, far.nasiri.m@gmail.com, behdad@behdad.org, > 33729@debbugs.gnu.org, kaushal.modi@gmail.com > > > > The built-in HarfBuzz code is for getting the script for a given > > > character, but resolving characters with Common script is left to the > > > client. Suppose you have this string (upper case for RTL) ABC 123 DEF, > > > what HarfBuzz sees during shaping is three separate chunks of text ABC, > > > 123, DEF. The 123 part is all Common script characters and thus > > > hb_buffer_guess_segment_properties won’t be able to guess anything (and > > > based on the font and the script, this can cause rendering differences). > > > Emacs will have to resolve the script of Common characters before > > > applying bidi algorithm and pass that down to HarfBuzz. > > > > I'm not sure I understand: why does HarfBuzz care that 123 was in the > > middle if RTL text. > > It doesn’t. What it cares about here is the correct script. Because 123 > are in the middle of RTL text they will be shaped separately, and thus > hb_buffer_guess_segment_properties() will only see 123 and won’t to be > able to guess the correct script for them (Arabic, Hebrew, etc., > whatever the script for the surrounding RTL text is). That's what I was asking: why it's important for HarfBuzz to know that 123 should be shaped for the Arabic script? > Depending on the font, the digits might be shaped differently if the > script is, say Arabic, by e.g. applying script-specific substitutions to > forms more suitable for a given script. I guess this is what I'm missing, then: these script-specific substitutions. Can you elaborate on that, or point to some place where these substitutions are described in detail? > > (In general, AFAIK simple characters like 123 will not even go through > > HarfBuzz, as Emacs doesn't call the shaper for characters whose entry > > in composition-function-table is nil. So I guess 123 here should > > stand for some other characters, not for literal digits? IOW, I don't > > think I understand the example very well.) > > This is a bug then and needs to be fixed. All text should go through > HarfBuzz since even so-called “simple” character often require shaping > depending on the text and the font. If this is done for optimization, > then it should be revised to see if shaping with HarfBuzz is actually > significantly slower and if it is, find more proper ways to optimize it. (Adding Handa-san to the discussion, in the hope that he could comment on the issue.) I think running all text through a shaper might be prohibitively expensive, because the shaper is called through Lisp code (see composite.el), and we decide which chunk of text to pass to the shaper using regexp search. See the various files under lisp/language/ which set up portions of composition-function-table as appropriate for each language that needs it. So I think we should identify all the cases where "simple" characters surrounded by, or adjacent to, "non-simple" ones need to be passed to a shaper, and add the necessary regular expressions to the data structures in lisp/languages/. Can you describe these cases, or point me to a place where I can find the relevant info? Thanks.