From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: tomas@tuxteam.de Newsgroups: gmane.emacs.devel Subject: Re: Ligature support Date: Sat, 6 Nov 2021 10:16:25 +0100 Message-ID: <20211106091625.GB18911@tuxteam.de> References: <83tugqd488.fsf@gnu.org> <87zgqiiq8c.fsf@gnus.org> <83r1bud3or.fsf@gnu.org> <87r1buipf5.fsf@gnus.org> <83o86yd0ig.fsf@gnu.org> <87a6iiimay.fsf@gnus.org> <20211105171356.GB24570@tuxteam.de> <83ee7uct47.fsf@gnu.org> <20211105195245.GC24570@tuxteam.de> <838ry2cpl4.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZoaI/ZTpAVc4A5k6" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15273"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Nov 06 10:17:38 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mjHpS-0003lK-P7 for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 10:17:38 +0100 Original-Received: from localhost ([::1]:48158 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mjHpR-0008BJ-Bo for ged-emacs-devel@m.gmane-mx.org; Sat, 06 Nov 2021 05:17:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41094) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mjHoN-0007Ts-3s for emacs-devel@gnu.org; Sat, 06 Nov 2021 05:16:31 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:58367) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1mjHoK-0000HM-77; Sat, 06 Nov 2021 05:16:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=1ZZCphc7hMrywXGAc41IzBixhnc76uuNFo1BMghhu/w=; b=N8x/P0MYyogZ4yFlg0wQnjJb34aCaIrhGJtFJhRHdLAk9BwhsM8a7OMrlZ+TxafgESrnb3cj2o69++yNfai2eAB+LGfiYWjgtPQd/wNeRQgbOnD1TO/6kMSpGplOB4Ze72ow3HjW9Dxq6YxaO+t1lWbIQ2/ACYyODDATpbqblP3mjHziL53W4CN0gRldHDC8Fw7V6jKq5O+thg3AsTyloOjY+n+vOhto8qIGvyLJeD0mAULHmw0QHUqwof6jJLYA8i8Y2HYq8tuLAEk2Z4HF6dudHZWd8t+f6GyMWrw72JWFBKNKsmnNo1QVPTfLAp2Jhbs8pZ3xbjws5SEPlmVe2w==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1mjHoH-0005lT-7i; Sat, 06 Nov 2021 10:16:25 +0100 Content-Disposition: inline In-Reply-To: <838ry2cpl4.fsf@gnu.org> Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278851 Archived-At: --ZoaI/ZTpAVc4A5k6 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 05, 2021 at 10:30:47PM +0200, Eli Zaretskii wrote: > > Date: Fri, 5 Nov 2021 20:52:45 +0100 > > From: tomas@tuxteam.de > > Cc: emacs-devel@gnu.org > >=20 > > > > it would have to know (or guess?) the language it is treating. > > >=20 > > > We do pass the language to HarfBuzz when we think we know it, but the > > > problem is Emacs itself has no good notion of the "current language". > >=20 > > This is what I was pointing at. >=20 > Well, don't just point to the obvious: better sit down and code some > features that we can use to be smarter ;-) >=20 > > If the text itself is multilingual, your best bet is to ask the user >=20 > Asking the user during redisplay is a non-starter. :-) More constructively, that's what happens while typesetting text. That's what TeX has \/ for. We have two classes of language: the ones, where ligatures are essential (Arabic, Hangul -- I must admit that I know very little about the latter). For those, there is no choice. Then we have those where ligatures are rather a "decoration", an accident of old handwriting further fashioned by the introduction of movable type. And a decoration you sometimes downright don't want (in TeX, last time I looked, most German writers just disabled ligatures: the "wrong ligatures" are so much more disturbing, and the prospect of proofreading the thing for wrong ligatures and sprinkling your source with \/ just isn't worth it). In short, there are languages where "asking the user" is just the only option; that means that the feature only makes sense while typesetting (where you /can/ ask the user) and not while rendering dynamically (the "redisplay" case we are treating here). The problem is composed with TeX's legacy, which used its ligature mechanism for things which strictly aren't, think -- for an em dash. It's a nice hack, and people perceive that as a ligature, too (you can see that elsewhere in this huge thread) but it ain't. I still think: there isn't a general solution. Me? I'd prefer to disable all ligatures unless I'm writing Arabic. >=20 > > and your second-best bet is to do some statistical heuristics, which > > only will "work" for a longer stretch of text. >=20 > That's a waste of CPU cycles: when we don't know the language, we ask > HarfBuzz to guess, and I trust HarfBuzz that it can guess as well or > better as we can. I haven't looked into it, but I wonder what magic it uses, if it isn't some variation of "maximum likelihood over n-gram statistics". >=20 > > > Such a notion is problematic in a multilingual editor such as Emacs. > > > It is something we still need to figure out, and after that implement > > > the necessary infrastructure. What we have now is rudimentary and > > > very insufficient. > >=20 > > I think that will always be an approximation. >=20 > Maybe, maybe not. I Hope at least sometimes we could do better. > there are various hints in the form of the encoding, the source of the > text, etc. We just need to figure out which means we have for > gleaning the language that is not obvious from the characters > themselves (because HarfBuzz does the latter already), and provide the > features for Lisp programs and users to use them. Some day I'll peek into HarfBuzz's source code. Perhaps next year. Cheers - t --ZoaI/ZTpAVc4A5k6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAmGGR+kACgkQBcgs9XrR2kaC3wCcCQRJLAaOIT0zQcqkRSDiCMPB 2uUAn184PQ137ODWgkQOcViKvp02U30x =f9gn -----END PGP SIGNATURE----- --ZoaI/ZTpAVc4A5k6--