unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
@ 2015-03-23  1:06 Richard Wordingham
  2015-03-23 15:38 ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Wordingham @ 2015-03-23  1:06 UTC (permalink / raw)
  To: 20173

When a ligature of two base characters has two combining marks on the
first component but none on the second, the second combining mark is
rendered as though it applied to the second component. A good example
is the Arabic sequence لَّا (lam, shadda, fatha, alef - <U+0644, U+0651,
U+064E, U+0627), where the shadda is rendered on the lam part of
lam-alif ligature and the fatha on the alif part.  This problem is not
restricted to right-to-left scripts; I encountered the problem when
debugging left-to-right rendering.  Lam-alif is one of the most
reliably generated ligatures bearing marks on different components.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-23  1:06 bug#20173: 24.4; Rendering misallocates combining marks on ligatures Richard Wordingham
@ 2015-03-23 15:38 ` Eli Zaretskii
  2015-03-23 22:41   ` Richard Wordingham
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-03-23 15:38 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: 20173

> Date: Mon, 23 Mar 2015 01:06:26 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> 
> When a ligature of two base characters has two combining marks on the
> first component but none on the second, the second combining mark is
> rendered as though it applied to the second component. A good example
> is the Arabic sequence لَّا (lam, shadda, fatha, alef - <U+0644, U+0651,
> U+064E, U+0627), where the shadda is rendered on the lam part of
> lam-alif ligature and the fatha on the alif part.  This problem is not
> restricted to right-to-left scripts; I encountered the problem when
> debugging left-to-right rendering.  Lam-alif is one of the most
> reliably generated ligatures bearing marks on different components.

Is it possible that some rule(s) are missing from the end of
lisp/language/misc-lang.el?  Could you please take a look and see if
something needs to be fixed/added in how we set up the compositions
for Arabic?

Thanks.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-23 15:38 ` Eli Zaretskii
@ 2015-03-23 22:41   ` Richard Wordingham
  2015-03-24  3:42     ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Wordingham @ 2015-03-23 22:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 20173

On Mon, 23 Mar 2015 17:38:52 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Mon, 23 Mar 2015 01:06:26 +0000
> > From: Richard Wordingham <richard.wordingham@ntlworld.com>

> Is it possible that some rule(s) are missing from the end of
> lisp/language/misc-lang.el?  Could you please take a look and see if
> something needs to be fixed/added in how we set up the compositions
> for Arabic?

There's no relevant problem there.  I demonstrated the bug to myself by
first rendering Tai Tham <NA, TONE-2, SIGN AA> and confirming that
TONE-2 rendered above the first component of the ligature NAA, fromed
from <NA, SIGN AA>.  I then hacked my font so that the glyph for TONE-2
was decomposed into the glyphs for MAI KANG and TONE-2, in that order,
and observing TONE-2 being rendered on the second component of the
ligature.  I then turned to Arabic so that a custom font would not be
needed to demonstrate the bug.

As to what needs fixing in the Arabic section of misc-lang.el:

Clusters containing letters should be limited to letters and marks on
them.  Otherwise, the digits 1, 2, 3 are reversed in a variable name
like بج١٢٣د.  (I'm not sure why the problem doesn't appear with بج١٢٣.)

(set-char-table-range
 composition-function-table
 '(#x600 . #x6FF)
 (list ["[\u0600-\u06FF]+" 0 font-shape-gstring]))

should change to something like

(set-char-table-range
 composition-function-table
 '(#x610 . #x615)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
 0
 font-shape-gstring]))

; Skip punctuation

(set-char-table-range
 composition-function-table
 '(#x621 . #x65F)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

; skip digits and punctuation

(set-char-table-range
 composition-function-table
 '(#x66E . #x6D3)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
 0 font-shape-gstring]))

; skip punctuation

(set-char-table-range
 composition-function-table
 '(#x6D5 . #x6EF)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
  0 font-shape-gstring]))

; Skip digits

(set-char-table-range
 composition-function-table
 '(#x6FA . #x6FC)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

; Skip symbols

(set-char-table-range
 composition-function-table
 '(#x6FF . #x6FF)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

There are more elegant ways of expressing this, which is just as well,
for there are also blocks Arabic Supplement (U+0750 to U+077F) and
Arabic Extended-A (U+08A0 to U+08FF).  Being an international script,
the Arabic script has a lot of letters, just like the Latin script.

Richard.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-23 22:41   ` Richard Wordingham
@ 2015-03-24  3:42     ` Eli Zaretskii
  2015-03-24  8:28       ` Richard Wordingham
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-03-24  3:42 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: 20173

> Date: Mon, 23 Mar 2015 22:41:07 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> Cc: 20173@debbugs.gnu.org
> 
> On Mon, 23 Mar 2015 17:38:52 +0200
> Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > > Date: Mon, 23 Mar 2015 01:06:26 +0000
> > > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> 
> > Is it possible that some rule(s) are missing from the end of
> > lisp/language/misc-lang.el?  Could you please take a look and see if
> > something needs to be fixed/added in how we set up the compositions
> > for Arabic?
> 
> There's no relevant problem there.  I demonstrated the bug to myself by
> first rendering Tai Tham <NA, TONE-2, SIGN AA> and confirming that
> TONE-2 rendered above the first component of the ligature NAA, fromed
> from <NA, SIGN AA>.  I then hacked my font so that the glyph for TONE-2
> was decomposed into the glyphs for MAI KANG and TONE-2, in that order,
> and observing TONE-2 being rendered on the second component of the
> ligature.  I then turned to Arabic so that a custom font would not be
> needed to demonstrate the bug.

Sorry, I'm not sure I understand you.  If the setting of composition
rules for Arabic is not the culprit, then what is?  AFAIK, there are
no rules that guide Emacs's shaping except what's in
composition-function-table.  Beyond that, the only other factor is the
font backend and how it shapes glyphs given the chunks of text Emacs
presents to it.

> As to what needs fixing in the Arabic section of misc-lang.el:

Thanks, I will look into these.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-24  3:42     ` Eli Zaretskii
@ 2015-03-24  8:28       ` Richard Wordingham
  2015-03-24 17:03         ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Wordingham @ 2015-03-24  8:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 20173

On Tue, 24 Mar 2015 05:42:18 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> If the setting of composition
> rules for Arabic is not the culprit, then what is?  AFAIK, there are
> no rules that guide Emacs's shaping except what's in
> composition-function-table.  Beyond that, the only other factor is the
> font backend and how it shapes glyphs given the chunks of text Emacs
> presents to it.

The font backend on Unixy systems consists of three components - m17n
(shaping control), libotf (OTL look-up implementation) and Freetype
(glyph rendering).  The glue between them is in Emacs,
most relevantly in function ftfont_drive_otf() in ftfont.c.

My analysis of the problem, which could quite easily be wrong, is as
follows.  To control the positioning of marks for the mark2ligature
lookup, it is necessary to record in some fashion which component of
the ligature a mark applies to.  I cannot see this information being
stored.  The information should be generated and used by libotf, but
needs to be stored between callbacks of ftfont_drive_otf() by m17n.
(The initial settings are implicit in the sequence of codepoints.)
Storing this information would, so far as I can see, require a change to
ftfont_drive_otf().

I may be able to change my font to work round this bug; I can certainly
change it to hide the symptom I observed.  The solution will be to
categorise the ligature NAA <U+1A36, U+1A63> as a base glyph rather
than as a ligature glyph.

There are other places where the HarfBuzz rendering system, which aims
to be compatible with Windows, uses this information.  In particular,
marks applied to a ligature are only allowed to ligate if they apply to
the same component of a ligature, and mark2mark positioning only
applies if the two marks apply to the same component.  This logic is
described as 'the most tricky part of the OpenType specification'.
Part of the trickiness may be that it seems not to have been
published externally (possibly not even internally) by Microsoft.  The
guiding principle seems to be that one should do the right things to the
marks on a ligature of Arabic consonants.

I have become well-acquainted with this logic because the 'same
component logic' seems to be applied by HarfBuzz regardless of whether
the marks are preceded by a base glyph or a ligature glyph.  The
Windows logic seems similar, but is subtly different.  I hit problems
with the Tai Tham NAA ligature, because the marks above on its two
components do interact.  The marks below should probably also interact,
but combinations where I would expect them to have to interact seem not
to occur in natural text.

> > As to what needs fixing in the Arabic section of misc-lang.el:

> Thanks, I will look into these.

You might want to first check whether composed Arabic is
usable. Doesn't making each word a grapheme cluster makes editing
unpleasant?  It might be worth restricting the clustering to
cursively connected sequences of letters within a word.

Richard.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-24  8:28       ` Richard Wordingham
@ 2015-03-24 17:03         ` Eli Zaretskii
  2015-03-24 20:22           ` Richard Wordingham
  2015-03-27  9:04           ` Richard Wordingham
  0 siblings, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2015-03-24 17:03 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: 20173

> Date: Tue, 24 Mar 2015 08:28:28 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> Cc: 20173@debbugs.gnu.org
> 
> On Tue, 24 Mar 2015 05:42:18 +0200
> Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > If the setting of composition
> > rules for Arabic is not the culprit, then what is?  AFAIK, there are
> > no rules that guide Emacs's shaping except what's in
> > composition-function-table.  Beyond that, the only other factor is the
> > font backend and how it shapes glyphs given the chunks of text Emacs
> > presents to it.
> 
> The font backend on Unixy systems consists of three components - m17n
> (shaping control), libotf (OTL look-up implementation) and Freetype
> (glyph rendering).  The glue between them is in Emacs,
> most relevantly in function ftfont_drive_otf() in ftfont.c.
> 
> My analysis of the problem, which could quite easily be wrong, is as
> follows.  To control the positioning of marks for the mark2ligature
> lookup, it is necessary to record in some fashion which component of
> the ligature a mark applies to.  I cannot see this information being
> stored.  The information should be generated and used by libotf, but
> needs to be stored between callbacks of ftfont_drive_otf() by m17n.
> (The initial settings are implicit in the sequence of codepoints.)
> Storing this information would, so far as I can see, require a change to
> ftfont_drive_otf().

So this means that on Windows this problem does not exist?

> You might want to first check whether composed Arabic is
> usable. Doesn't making each word a grapheme cluster makes editing
> unpleasant?

I don't know; I don't speak or write any of the languages that use the
Arabic script.  I expect the users that do to come up and ask for
features they miss.  We already allow deletion of single codepoints,
even when they are composed; we might as well provide similar features
for movement or whatever.  But the requests (and, perhaps, even the
code) should come from people who actually use these scripts,
otherwise it's a sure way to white elephants and other similar
creatures.

Thanks.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-24 17:03         ` Eli Zaretskii
@ 2015-03-24 20:22           ` Richard Wordingham
  2015-03-27  9:04           ` Richard Wordingham
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Wordingham @ 2015-03-24 20:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 20173

On Tue, 24 Mar 2015 19:03:38 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> So this means that on Windows this problem does not exist?

Correct.  The Arabic test sequence renders properly in Emacs 24.4.1
on Windows 7.

Richard.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-24 17:03         ` Eli Zaretskii
  2015-03-24 20:22           ` Richard Wordingham
@ 2015-03-27  9:04           ` Richard Wordingham
  2015-03-27  9:54             ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Wordingham @ 2015-03-27  9:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 20173

On Tue, 24 Mar 2015 19:03:38 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Tue, 24 Mar 2015 08:28:28 +0000
> > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > Cc: 20173@debbugs.gnu.org

> > You might want to first check whether composed Arabic is
> > usable. Doesn't making each word a grapheme cluster makes editing
> > unpleasant?

> I don't know; I don't speak or write any of the languages that use the
> Arabic script.  I expect the users that do to come up and ask for
> features they miss.  We already allow deletion of single codepoints,
> even when they are composed; we might as well provide similar features
> for movement or whatever.

I forgot that grapheme clustering is done in m17n, not Emacs itself.
The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
with combining marks.  It *seems* I have a problem with tpu-forward-char
and tpu-backward-char; it's as though there's an initialisation fault
which stops them stepping through the Arabic compositions at first.  It
may be an issue with the presumably underlying forward-char and
backward-char; I haven't investigated further.  I'll have to record
the exact actions provoking the problem before I formally record a bug.

Richard. 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-27  9:04           ` Richard Wordingham
@ 2015-03-27  9:54             ` Eli Zaretskii
  2020-08-17 22:45               ` Stefan Kangas
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-03-27  9:54 UTC (permalink / raw)
  To: Richard Wordingham; +Cc: 20173

> Date: Fri, 27 Mar 2015 09:04:44 +0000
> From: Richard Wordingham <richard.wordingham@ntlworld.com>
> Cc: 20173@debbugs.gnu.org
> 
> On Tue, 24 Mar 2015 19:03:38 +0200
> Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > > Date: Tue, 24 Mar 2015 08:28:28 +0000
> > > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > > Cc: 20173@debbugs.gnu.org
> 
> > > You might want to first check whether composed Arabic is
> > > usable. Doesn't making each word a grapheme cluster makes editing
> > > unpleasant?
> 
> > I don't know; I don't speak or write any of the languages that use the
> > Arabic script.  I expect the users that do to come up and ask for
> > features they miss.  We already allow deletion of single codepoints,
> > even when they are composed; we might as well provide similar features
> > for movement or whatever.
> 
> I forgot that grapheme clustering is done in m17n, not Emacs itself.
> The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
> with combining marks.  It *seems* I have a problem with tpu-forward-char
> and tpu-backward-char; it's as though there's an initialisation fault
> which stops them stepping through the Arabic compositions at first.  It
> may be an issue with the presumably underlying forward-char and
> backward-char; I haven't investigated further.  I'll have to record
> the exact actions provoking the problem before I formally record a bug.

Please try in "emacs -Q" without activating the TPU emulation.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#20173: 24.4; Rendering misallocates combining marks on ligatures
  2015-03-27  9:54             ` Eli Zaretskii
@ 2020-08-17 22:45               ` Stefan Kangas
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Kangas @ 2020-08-17 22:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Richard Wordingham, 20173-done

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Fri, 27 Mar 2015 09:04:44 +0000
>> From: Richard Wordingham <richard.wordingham@ntlworld.com>
>> Cc: 20173@debbugs.gnu.org
>>
>> On Tue, 24 Mar 2015 19:03:38 +0200
>> Eli Zaretskii <eliz@gnu.org> wrote:
>>
>> > > Date: Tue, 24 Mar 2015 08:28:28 +0000
>> > > From: Richard Wordingham <richard.wordingham@ntlworld.com>
>> > > Cc: 20173@debbugs.gnu.org
>>
>> > > You might want to first check whether composed Arabic is
>> > > usable. Doesn't making each word a grapheme cluster makes editing
>> > > unpleasant?
>>
>> > I don't know; I don't speak or write any of the languages that use the
>> > Arabic script.  I expect the users that do to come up and ask for
>> > features they miss.  We already allow deletion of single codepoints,
>> > even when they are composed; we might as well provide similar features
>> > for movement or whatever.
>>
>> I forgot that grapheme clustering is done in m17n, not Emacs itself.
>> The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
>> with combining marks.  It *seems* I have a problem with tpu-forward-char
>> and tpu-backward-char; it's as though there's an initialisation fault
>> which stops them stepping through the Arabic compositions at first.  It
>> may be an issue with the presumably underlying forward-char and
>> backward-char; I haven't investigated further.  I'll have to record
>> the exact actions provoking the problem before I formally record a bug.
>
> Please try in "emacs -Q" without activating the TPU emulation.

More information was requested, but none was given within 5 years, so
I'm closing this bug.  If this is still an issue, please reply to this
email (use "Reply to all" in your email client) and we can reopen the
bug report.

Best regards,
Stefan Kangas





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-08-17 22:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-23  1:06 bug#20173: 24.4; Rendering misallocates combining marks on ligatures Richard Wordingham
2015-03-23 15:38 ` Eli Zaretskii
2015-03-23 22:41   ` Richard Wordingham
2015-03-24  3:42     ` Eli Zaretskii
2015-03-24  8:28       ` Richard Wordingham
2015-03-24 17:03         ` Eli Zaretskii
2015-03-24 20:22           ` Richard Wordingham
2015-03-27  9:04           ` Richard Wordingham
2015-03-27  9:54             ` Eli Zaretskii
2020-08-17 22:45               ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).