From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Entering emojis Date: Fri, 29 Oct 2021 13:54:45 +0300 Message-ID: <83pmroqex6.fsf@gnu.org> References: <87cznths5j.fsf@gnus.org> <0353A9DA-0041-4D71-8E1B-09FB07A5FD0F@acm.org> <87ilxialzw.fsf@igel.home> <831r46wj6r.fsf@gnu.org> <83fssmuxui.fsf@gnu.org> <83bl3aux6y.fsf@gnu.org> <835ytiuvm9.fsf@gnu.org> <834k91vgie.fsf@gnu.org> <8ff3b131c5fa370d9eaf@heytings.org> <83mtmttsxz.fsf@gnu.org> <8ff3b131c56b7b2d1d6f@heytings.org> <83bl39tqnl.fsf@gnu.org> <8ff3b131c531f5254799@heytings.org> <83a6ittp5r.fsf@gnu.org> <8ff3b131c53b9df49236@heytings.org> <834k91th5c.fsf@gnu.org> <8ff3b131c5fe09753ca0@heytings.org> <83mtmtru6l.fsf@gnu.org> <8ff3b131c57f741d04e5@heytings.org> <83lf2drqx6.fsf@gnu.org> <8ff3b131c550df7ca195@heytings.org> <83v91gqne4.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27316"; mail-complaints-to="usenet@ciao.gmane.io" Cc: mattiase@acm.org, raman@google.com, schwab@linux-m68k.org, stefankangas@gmail.com, emacs-devel@gnu.org To: Gregory Heytings Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Oct 29 12:56:11 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mgPYQ-0006wh-VZ for ged-emacs-devel@m.gmane-mx.org; Fri, 29 Oct 2021 12:56:11 +0200 Original-Received: from localhost ([::1]:40720 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mgPYP-0001by-JE for ged-emacs-devel@m.gmane-mx.org; Fri, 29 Oct 2021 06:56:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56472) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mgPXL-0000p5-RU for emacs-devel@gnu.org; Fri, 29 Oct 2021 06:55:03 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:48830) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mgPXK-0006sK-Dh; Fri, 29 Oct 2021 06:55:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=lw7UPM7ToxWbzeQO+aUyVgPgYVrf058TvfptzpUA7KQ=; b=d/WkEtPXYXtW DnY3M5mT09gliiZ2BVedcXk7Oq1+jHtiK/YDyLOaDUBsRbxVgTeEjXTOcr21OzUqb5h/W/fy3KHZE SWz07dW53iZgFV+F2YQRjXil2b5GrHKUOe0dxNiBDMR2c2XU+85dsIX4G4i8tz9IXS0Js7DehNEkl U8B+WUpm/354VHwpJ2HmxAhZ+GgYQvblu4U+XET1vpFgn35O46ETe0Z/q7EyPe0tedvEdkQdhTtPw HI5pbAET7S8MCY3DmDE3rb/AREdjA81gzmc8jJwc0SR437Vmb+bqk7DliisyiU6PlggijdIn/kj5J Df7WPX6NGd0P5nZ4ZjVvqg==; Original-Received: from [87.69.77.57] (port=1327 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mgPXJ-00060a-UD; Fri, 29 Oct 2021 06:55:02 -0400 In-Reply-To: (message from Gregory Heytings on Fri, 29 Oct 2021 10:32:03 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278184 Archived-At: > Date: Fri, 29 Oct 2021 10:32:03 +0000 > From: Gregory Heytings > cc: mattiase@acm.org, emacs-devel@gnu.org, schwab@linux-m68k.org, > stefankangas@gmail.com, raman@google.com > > >> In this case, ISTM that the problem is not the font, but the shaping > >> engine. If Harfbuzz does not know how handle the joiners and segment > >> delimiters, it should behave as they did not exist, and use the font > >> ligatures (if the font does have ligatures) > > > > AFAICT, this is what happens here. > > No, because Harfbuzz displays the fictitious joiner and deelimiter glyphs, > and does not try to use the ligatures that the font provides. Because the font and/or HarfBuzz don't support the formatting controls. If you have evidence that the font does support the formatting controls, but HarfBuzz doesn't, please show it. I think that's not the case, because the same or similar display problems happen with LibreOffice when using the format controls. When a sequence of codepoints is not recognized as a composable one, what we get as result is either a separate glyph for each codepoint, or maybe composed glyphs for some sub-sequence. That is normal and expected when a sequence is not recognized. > >> instead of displaying the fictitious glyph at that codepoint (at the > >> codepoint of the joiner or delimiter). > > > > I don't think I understand what fictitious glyph you allude to here. The > > joiners were displayed as thin spaces by the Emacs > > glyphless-char-display feature, because HarfBuzz+font didn't compose the > > sequence, and returned the separate font glyphs for each codepoint in > > the sequence. IOW, the composition failed, and therefore Emacs > > displayed each of these characters as it's supposed to do. > > > > A picture is worth a thousand words. I attach four files: > > In 1.png you see what Harfbuzz displays with the previous HELLO entry. > The three glyphs with a thick rectangle above and a crossed rectangle > below are a fictitious glyph in the Aegyptus font for the codepoint > hieroglyph vertical joiner, and the opening and closing parentheses are > fictitious glyphs in the Aegyptus font for the codepoints hieroglyph begin > and end segment. Why do you call them "fictitious"? If those are the glyphs returned by the font, then that's what the font designers want us to display > In 2.png you see what I would expect Harfbuzz to do with the previous > HELLO entry, if it knows that it cannot handle the joiner and segment > delimiters or if it detects that the font does not provide enough > information to handle them appropriately, and if the font has no > ligatures: displaying the hieroglyphs one after the other. That's what I > would expect to see with the Noto Hieroglyph font for example. > > In 3.png you see what I would expect Harfbuzz to do with the previous > HELLO entry, if it knows that it cannot handle the joiner and segment > delimiters or if it detects that the font does not provide enough > information to handle them appropriately, and if the font does have > ligatures: displaying the hieroglyphs with the ligatures provided by the > font. That's what I would expect to see with the Aegyptus font. > > In 4.png you see what I would expect Harfbuzz to do with the previous > HELLO entry, if it knows that it can handle the joiner and segment > delimiters and if it detects that the font does provide enough information > to handle them appropriately. Your expectations are incorrect. The job of producing the correct glyphs for a sequence involves both the font and the shaping engine. The shaping engine in general should not produce any glyphs that the font didn't return, and AFAIU has no means to "detect that the font doesn't provide enough information" or "doesn't have ligatures". The shaping engine is supposed to trust the font that it knows what it's doing. The role of the shaping engine is to collect information about the context of the character sequence (language, script, directionality, etc.), and communicate that information to the font so that the font could select the appropriate glyphs. > > No, the joiners are supposed to tell the shaping engine and the font > > that we want the ligatures and not separate font glyphs. > > Unless I misunderstand something, a text without joiners and delimiters > would thus be displayed as 2.png, even if the underlying font provides > ligatures with which it could be displayed as 3.png. And ZWNJ would be > ignored. Which makes, as I said, the task of those who want to edit > egyptian texts much harder, and unnecessarily so. It is irrelevant what you and me think about whether this makes sense. We try to follow the Unicode Standard where we don't have a reason to deviate from it. If and when the fonts will implement what Unicode says, we should also do what Unicode says, regardless of our private opinions about the convenience of writing this script.