From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: tomas@tuxteam.de Newsgroups: gmane.emacs.devel Subject: Re: Ligature support Date: Fri, 5 Nov 2021 20:52:45 +0100 Message-ID: <20211105195245.GC24570@tuxteam.de> References: <83v916d64z.fsf@gnu.org> <878ry2k6qz.fsf@gnus.org> <83tugqd488.fsf@gnu.org> <87zgqiiq8c.fsf@gnus.org> <83r1bud3or.fsf@gnu.org> <87r1buipf5.fsf@gnus.org> <83o86yd0ig.fsf@gnu.org> <87a6iiimay.fsf@gnus.org> <20211105171356.GB24570@tuxteam.de> <83ee7uct47.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W5WqUoFLvi1M7tJE" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29551"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Nov 05 20:54:37 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mj5IL-0007YN-3w for ged-emacs-devel@m.gmane-mx.org; Fri, 05 Nov 2021 20:54:37 +0100 Original-Received: from localhost ([::1]:38806 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mj5IJ-0007lX-4S for ged-emacs-devel@m.gmane-mx.org; Fri, 05 Nov 2021 15:54:35 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54302) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mj5Gc-0006fB-Q1 for emacs-devel@gnu.org; Fri, 05 Nov 2021 15:52:50 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:56717) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1mj5Ga-0002Ah-JK; Fri, 05 Nov 2021 15:52:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=l/B/Ae8/8YBHPNRpJ6ifpPr38UbHM+i71IsSnZy6JlY=; b=ildECJBDqZTrsSw8JF3V3KJrJcU0uvJBQBoX7jpuv7QvQ4Chqfv1p87cVy8Mym+SadbU3Js3Hadm9O4Co9xQ7P+hpz++NMNLRLl8dDPW7TMy+5M7OMNit/Ycs7HhQ74Oc+5jzt+kVvQWhZd9YfB4bn7vopuk5JHoDg8V08zPJOFH7oIt7yWqYMupynMbm77SDLLwqxfFp3YW/jQKjdAhPnXQblEbu7fjfhGfUFlYAC+pFoLu0lRoQzUS8jvlTeboCWc6GDWL9ATlzr7uDIIE7kQaifoAexR0yvvF2a7gRwnQUSm7COuPi1vX+G4mqSmEXbbpT37htzw3SnxFVvfiLw==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1mj5GX-00082V-N6; Fri, 05 Nov 2021 20:52:45 +0100 Content-Disposition: inline In-Reply-To: <83ee7uct47.fsf@gnu.org> Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278802 Archived-At: --W5WqUoFLvi1M7tJE Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 05, 2021 at 09:14:32PM +0200, Eli Zaretskii wrote: > > Date: Fri, 5 Nov 2021 18:13:56 +0100 > > From: > >=20 > > Thing is you sometimes want the ligature and sometimes you don't. > > [depends on language] [...] > > it would have to know (or guess?) the language it is treating. >=20 > We do pass the language to HarfBuzz when we think we know it, but the > problem is Emacs itself has no good notion of the "current language". This is what I was pointing at. I don't think this is a problem which can be solved in general. You have homographs (words that write the same) within one language, you have them across languages. If the text itself is multilingual, your best bet is to ask the user and your second-best bet is to do some statistical heuristics, which only will "work" for a longer stretch of text. If you press me, I think I can find two German homographs where the one would take a ligature and the other not >:-) > Such a notion is problematic in a multilingual editor such as Emacs. > It is something we still need to figure out, and after that implement > the necessary infrastructure. What we have now is rudimentary and > very insufficient. I think that will always be an approximation. AFAIK Mozilla has (had?) a library for guessing a text's (human) language (this is useful for other things: capitalisation is language-dependent too, e.g. the Turkish dotless i). But it will always be something which fails in edge cases, I think. Cheers - t --W5WqUoFLvi1M7tJE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAmGFi40ACgkQBcgs9XrR2kbVSwCeIcf0uZfiLOCNxXG1ud4AOnOX DbgAn06Qc9Jlx1vr5JihCiVZYMgx1poe =6lGR -----END PGP SIGNATURE----- --W5WqUoFLvi1M7tJE--