On Fri, Nov 05, 2021 at 09:14:32PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 5 Nov 2021 18:13:56 +0100
> > From: <tomas@tuxteam.de>
> > 
> > Thing is you sometimes want the ligature and sometimes you don't.
> > [depends on language]

[...]

> > it would have to know (or guess?) the language it is treating.
> 
> We do pass the language to HarfBuzz when we think we know it, but the
> problem is Emacs itself has no good notion of the "current language".

This is what I was pointing at. I don't think this is a problem which
can be solved in general. You have homographs (words that write the
same) within one language, you have them across languages.

If the text itself is multilingual, your best bet is to ask the user
and your second-best bet is to do some statistical heuristics, which
only will "work" for a longer stretch of text.

If you press me, I think I can find two German homographs where the
one would take a ligature and the other not >:-)

> Such a notion is problematic in a multilingual editor such as Emacs.
> It is something we still need to figure out, and after that implement
> the necessary infrastructure.  What we have now is rudimentary and
> very insufficient.

I think that will always be an approximation. AFAIK Mozilla has (had?)
a library for guessing a text's (human) language (this is useful for
other things: capitalisation is language-dependent too, e.g. the Turkish
dotless i).

But it will always be something which fails in edge cases, I think.

Cheers
 - t