Unicode combining characters

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Unicode combining characters
@ 2021-05-25 15:56 Anand Tamariya
  2021-05-25 17:22 ` Stefan Monnier
  2021-05-25 17:24 ` Eli Zaretskii
  0 siblings, 2 replies; 17+ messages in thread
From: Anand Tamariya @ 2021-05-25 15:56 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

Hindi Devanagari script has lot of unicode combining characters which
results in misalignment in a rectangular overlay for constant number of
characters  (screenshot )
<https://1.bp.blogspot.com/-P2ZnFePOpOo/YK0cNJ4B5II/AAAAAAAAJJs/t-MADtxUeps3S_WXZ_rFWjf9daH49sr9QCLcBGAsYHQ/s421/combining.png>
What would be a recommended way to tackle this in Emacs?

[-- Attachment #2: Type: text/html, Size: 429 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 15:56 Unicode combining characters Anand Tamariya
@ 2021-05-25 17:22 ` Stefan Monnier
  2021-05-25 17:24 ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Monnier @ 2021-05-25 17:22 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> Hindi Devanagari script has lot of unicode combining characters which
> results in misalignment in a rectangular overlay for constant number of
> characters  (screenshot )
> <https://1.bp.blogspot.com/-P2ZnFePOpOo/YK0cNJ4B5II/AAAAAAAAJJs/t-MADtxUeps3S_WXZ_rFWjf9daH49sr9QCLcBGAsYHQ/s421/combining.png>
> What would be a recommended way to tackle this in Emacs?

In a GUI session, the usual answer is to use posframe, AFAIK.


        Stefan




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 15:56 Unicode combining characters Anand Tamariya
  2021-05-25 17:22 ` Stefan Monnier
@ 2021-05-25 17:24 ` Eli Zaretskii
  2021-05-25 18:15   ` Clément Pit-Claudel
  2021-05-26  9:51   ` Anand Tamariya
  1 sibling, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-25 17:24 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> From: Anand Tamariya <atamariya@gmail.com>
> Date: Tue, 25 May 2021 21:26:44 +0530
> 
> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a
> rectangular overlay for constant number of characters (screenshot ) 
> What would be a recommended way to tackle this in Emacs?

Use align-to 'space' display spec and/or the window-text-pixel-size
function, which will account for the actual size of the text on
display.  string-width can also be used, but it only gives an
approximation, as it is oblivious of the actual size of the font
glyphs.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 17:24 ` Eli Zaretskii
@ 2021-05-25 18:15   ` Clément Pit-Claudel
  2021-05-25 18:39     ` Eli Zaretskii
  2021-05-26  9:51   ` Anand Tamariya
  1 sibling, 1 reply; 17+ messages in thread
From: Clément Pit-Claudel @ 2021-05-25 18:15 UTC (permalink / raw)
  To: emacs-devel

On 5/25/21 1:24 PM, Eli Zaretskii wrote:
>> From: Anand Tamariya <atamariya@gmail.com>
>> Date: Tue, 25 May 2021 21:26:44 +0530
>>
>> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a
>> rectangular overlay for constant number of characters (screenshot ) 
>> What would be a recommended way to tackle this in Emacs?
> 
> Use align-to 'space' display spec and/or the window-text-pixel-size
> function, which will account for the actual size of the text on
> display. 

Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no?

(IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?)



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 18:15   ` Clément Pit-Claudel
@ 2021-05-25 18:39     ` Eli Zaretskii
  2021-05-25 19:30       ` Clément Pit-Claudel
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-25 18:39 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel

> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 25 May 2021 14:15:33 -0400
> 
> On 5/25/21 1:24 PM, Eli Zaretskii wrote:
> >> From: Anand Tamariya <atamariya@gmail.com>
> >> Date: Tue, 25 May 2021 21:26:44 +0530
> >>
> >> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a
> >> rectangular overlay for constant number of characters (screenshot ) 
> >> What would be a recommended way to tackle this in Emacs?
> > 
> > Use align-to 'space' display spec and/or the window-text-pixel-size
> > function, which will account for the actual size of the text on
> > display. 
> 
> Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no?

I don't understand, but maybe you know about the particular use case
more than I do.  I just mentioned two devices that can be accurate to
1 pixel wrt to the X coordinate.

> (IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?)

Again, I don't think I follow.  If you have "this text", you can
calculate its width on display, and then know how many pixels of white
space you will need after "this string" replaces that text.  So,
unless I'm missing something, specifying the space width is redundant,
and actually makes a solvable problem unsolvable.

But I might be talking nonsense because I don't understand what
problem the OP wants to solve.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 18:39     ` Eli Zaretskii
@ 2021-05-25 19:30       ` Clément Pit-Claudel
  2021-05-25 19:44         ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Clément Pit-Claudel @ 2021-05-25 19:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 5/25/21 2:39 PM, Eli Zaretskii wrote:
>> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
>> Date: Tue, 25 May 2021 14:15:33 -0400
>>
>> On 5/25/21 1:24 PM, Eli Zaretskii wrote:
>>>> From: Anand Tamariya <atamariya@gmail.com>
>>>> Date: Tue, 25 May 2021 21:26:44 +0530
>>>>
>>>> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a
>>>> rectangular overlay for constant number of characters (screenshot ) 
>>>> What would be a recommended way to tackle this in Emacs?
>>>
>>> Use align-to 'space' display spec and/or the window-text-pixel-size
>>> function, which will account for the actual size of the text on
>>> display. 
>>
>> Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no?
> 
> I don't understand, but maybe you know about the particular use case
> more than I do.  I just mentioned two devices that can be accurate to
> 1 pixel wrt to the X coordinate.
> 
>> (IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?)
> 
> Again, I don't think I follow.  If you have "this text", you can
> calculate its width on display, and then know how many pixels of white
> space you will need after "this string" replaces that text.  So,
> unless I'm missing something, specifying the space width is redundant,
> and actually makes a solvable problem unsolvable.

Based on the screenshot this is an issue with Company.  Company displays its "pop-ups" by putting a replacing 'display property on the text following the point (and on the next few lines).  So if the buffer contains

ABC XYZ DEF GHI
JKL MNO PQR STU

and the point is after XYZ, then company puts a replacing display spec from " DEF" to "STU".
To display completions "XYZ1233" and "XYZ456", the replacing display spec contains "123| GHI\nJKL XYZ456| STU", so the final display is

ABC XYZ123| GHI
JKL XYZ456| STU

The OP's issue is that "123" and "456" don't have the same length.  As far as I know, there is no way to add extra space after 123 or 456 so that they reach the same X coordinate, given that they are already part of a display spec.

Clément.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 19:30       ` Clément Pit-Claudel
@ 2021-05-25 19:44         ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-25 19:44 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Clément Pit-Claudel <cpitclaudel@gmail.com>
> Date: Tue, 25 May 2021 15:30:21 -0400
> 
> Based on the screenshot this is an issue with Company.  Company displays its "pop-ups" by putting a replacing 'display property on the text following the point (and on the next few lines).  So if the buffer contains
> 
> ABC XYZ DEF GHI
> JKL MNO PQR STU
> 
> and the point is after XYZ, then company puts a replacing display spec from " DEF" to "STU".
> To display completions "XYZ1233" and "XYZ456", the replacing display spec contains "123| GHI\nJKL XYZ456| STU", so the final display is
> 
> ABC XYZ123| GHI
> JKL XYZ456| STU
> 
> The OP's issue is that "123" and "456" don't have the same length.  As far as I know, there is no way to add extra space after 123 or 456 so that they reach the same X coordinate, given that they are already part of a display spec.

First, the OP said "overlay", and overlay strings can have display
properties.

And second, I'd expect the current code to use string-width to compute
how much whitespace will be needed after each completion candidate,
and string-width already accounts for composed (a.k.a "combined")
characters.  Yes, string-width provides only an approximation for the
true pixel width of the string, but that's not specific to
compositions, and the whole technique is somewhat of a kludge anyway,
for this reason and others.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-25 17:24 ` Eli Zaretskii
  2021-05-25 18:15   ` Clément Pit-Claudel
@ 2021-05-26  9:51   ` Anand Tamariya
  2021-05-26 10:04     ` Joost Kremers
  2021-05-26 12:54     ` Eli Zaretskii
  1 sibling, 2 replies; 17+ messages in thread
From: Anand Tamariya @ 2021-05-26  9:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1003 bytes --]

Thanks Eli - align-to 'space' display spec seems helpful.

Though it's a company specific issue related to unicode character
composition, here's some more details on the issue for record should
somebody else stumble upon the same.

Let's call the first character in the screenshot as shr (single glyph) and
the second one as sh-r (two glyphs).
(setq shr  (string 2358 2381 2352))
(setq sh-r (string 2358 2352))

(string-width shr)  ;; 2
(string-width sh-r) ;; 2

To create the rectangular region, we need to pad the strings with
appropriate number of spaces. align-to 'space' display spec seems helpful
in this case as shown below. You will notice that character "a" is aligned
in both cases. Now I need to figure out how to use the same within company.

(insert (concat shr
(let ((sp " "))
 (font-lock-append-text-property 0 1 'display `(space . (:align-to 10)) sp)
 sp)
"a"))

(insert (concat sh-r
(let ((sp " "))
 (font-lock-append-text-property 0 1 'display `(space . (:align-to 10)) sp)
 sp)
"a"))

[-- Attachment #2: Type: text/html, Size: 1391 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-26  9:51   ` Anand Tamariya
@ 2021-05-26 10:04     ` Joost Kremers
  2021-05-26 12:54     ` Eli Zaretskii
  1 sibling, 0 replies; 17+ messages in thread
From: Joost Kremers @ 2021-05-26 10:04 UTC (permalink / raw)
  To: emacs-devel

On Wed, May 26 2021, Anand Tamariya wrote:
> Thanks Eli - align-to 'space' display spec seems helpful.
>
> Though it's a company specific issue related to unicode character
> composition, here's some more details on the issue for record should
> somebody else stumble upon the same.

At the risk of posting something irrelevant: the effect shown in the screen shot
you posted also occurs if you use company in a buffer with variable-pitch-mode
(which I do in e.g., LaTeX buffers). I don't know if that's the same problem,
but if it is, a solution would be applicable beyond combining characters. 

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-26  9:51   ` Anand Tamariya
  2021-05-26 10:04     ` Joost Kremers
@ 2021-05-26 12:54     ` Eli Zaretskii
  2021-05-26 17:14       ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-26 12:54 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> From: Anand Tamariya <atamariya@gmail.com>
> Date: Wed, 26 May 2021 15:21:05 +0530
> Cc: emacs-devel@gnu.org
> 
> Let's call the first character in the screenshot as shr (single glyph) and the second one as sh-r (two glyphs).
> (setq shr  (string 2358 2381 2352))
> (setq sh-r (string 2358 2352))
> 
> (string-width shr)  ;; 2
> (string-width sh-r) ;; 2

Sorry, it turns out I've misremembered: string-width doesn't account
for "automatic compositions", the ones that happen due to
composition-function-table (as opposed to "static compositions" which
happen due to the 'composition' text property).  So this case
currently cannot be handled correctly by string-width; we should fix
that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-26 12:54     ` Eli Zaretskii
@ 2021-05-26 17:14       ` Eli Zaretskii
  2021-05-27  7:00         ` Anand Tamariya
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-26 17:14 UTC (permalink / raw)
  To: atamariya; +Cc: emacs-devel

> Date: Wed, 26 May 2021 15:54:43 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> > (setq shr  (string 2358 2381 2352))
> > (setq sh-r (string 2358 2352))
> > 
> > (string-width shr)  ;; 2
> > (string-width sh-r) ;; 2
> 
> Sorry, it turns out I've misremembered: string-width doesn't account
> for "automatic compositions", the ones that happen due to
> composition-function-table (as opposed to "static compositions" which
> happen due to the 'composition' text property).  So this case
> currently cannot be handled correctly by string-width; we should fix
> that.

Please try the latest master branch, I hope I fixed this now.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-26 17:14       ` Eli Zaretskii
@ 2021-05-27  7:00         ` Anand Tamariya
  2021-05-27  9:40           ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: Anand Tamariya @ 2021-05-27  7:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

>
> Please try the latest master branch, I hope I fixed this now.
>

The fix works for the given example. However, here's another one that
ideally should be one composed glyph (validated by moving the cursor over
the glyph) but counts as 2 in string-width.

(setq ra (string 2352 2366))

(string-width ra) ; 2

;; Glyph in a word
(setq shankar (string 2358 2306 2325 2352 2366))

[-- Attachment #2: Type: text/html, Size: 715 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-27  7:00         ` Anand Tamariya
@ 2021-05-27  9:40           ` Eli Zaretskii
  2021-05-27 10:34             ` Basil L. Contovounesios
  2021-05-27 13:27             ` Anand Tamariya
  0 siblings, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-27  9:40 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> From: Anand Tamariya <atamariya@gmail.com>
> Date: Thu, 27 May 2021 12:30:04 +0530
> Cc: emacs-devel@gnu.org
> 
>  Please try the latest master branch, I hope I fixed this now.
> 
> The fix works for the given example. However, here's another one that ideally should be one composed glyph
> (validated by moving the cursor over the glyph) but counts as 2 in string-width.
> 
> (setq ra (string 2352 2366))
> 
> (string-width ra) ; 2

OK, I improved this case now on master, please take a look.

However, please note that getting this right makes string-width more
dependent on the selected-frame's font used by the default face for
the characters of the string.  In particular, if that font is unable
to combine the characters that should be composed, you will now get
width which could be different from the value on other frames with
other fonts.  Also, the new code only works in interactive sessions on
GUI frames, because we need the shaping engine (a.k.a. "font driver")
to compose characters.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-27  9:40           ` Eli Zaretskii
@ 2021-05-27 10:34             ` Basil L. Contovounesios
  2021-05-27 12:30               ` Eli Zaretskii
  2021-05-27 13:27             ` Anand Tamariya
  1 sibling, 1 reply; 17+ messages in thread
From: Basil L. Contovounesios @ 2021-05-27 10:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Anand Tamariya

Eli Zaretskii <eliz@gnu.org> writes:

> OK, I improved this case now on master, please take a look

I'm seeing a couple of warnings:

character.c: In function ‘lisp_string_width’:
character.c:397:16: warning: assignment to ‘int’ from ‘Lisp_Object’
  {aka ‘struct Lisp_X *’} makes integer from pointer without a cast
  [-Wint-conversion]
  397 |     font_width = AREF (font_info, 11);
      |                ^
character.c:398:19: warning: ordered comparison of pointer with integer
  zero [-Wextra]
  398 |     if (font_info <= 0)
      |                   ^~
character.c:399:18: warning: assignment to ‘int’ from ‘Lisp_Object’ {aka
  ‘struct Lisp_X *’} makes integer from pointer without a cast
  [-Wint-conversion]
  399 |       font_width = AREF (font_info, 10);

Do the font_info elements need to be untagged, and font_width rather
than font_info checked for being positive?

Thanks,

-- 
Basil

gcc (Debian 10.2.1-6) 10.2.1 20210110



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-27 10:34             ` Basil L. Contovounesios
@ 2021-05-27 12:30               ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-27 12:30 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: emacs-devel, atamariya

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Cc: Anand Tamariya <atamariya@gmail.com>,  emacs-devel@gnu.org
> Date: Thu, 27 May 2021 11:34:55 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > OK, I improved this case now on master, please take a look
> 
> I'm seeing a couple of warnings:

Oops! sorry, should be fixed now.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-27  9:40           ` Eli Zaretskii
  2021-05-27 10:34             ` Basil L. Contovounesios
@ 2021-05-27 13:27             ` Anand Tamariya
  2021-05-27 13:44               ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: Anand Tamariya @ 2021-05-27 13:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

> > The fix works for the given example. However, here's another one that
> ideally should be one composed glyph
> > (validated by moving the cursor over the glyph) but counts as 2 in
> string-width.
> >
> > (setq ra (string 2352 2366))
> >
> > (string-width ra) ; 2
>
> OK, I improved this case now on master, please take a look.
>
> Wonderful!! It works. Thanks.
Do you think (current-column) should also return a value conforming to the
display logic? e.g. if 'ra' above is the first character in the line and
point next to it, should it report 1 or 2?

[-- Attachment #2: Type: text/html, Size: 831 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Unicode combining characters
  2021-05-27 13:27             ` Anand Tamariya
@ 2021-05-27 13:44               ` Eli Zaretskii
  0 siblings, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2021-05-27 13:44 UTC (permalink / raw)
  To: Anand Tamariya; +Cc: emacs-devel

> From: Anand Tamariya <atamariya@gmail.com>
> Date: Thu, 27 May 2021 18:57:58 +0530
> Cc: emacs-devel@gnu.org
> 
>  OK, I improved this case now on master, please take a look.
> 
> Wonderful!! It works. Thanks.

Thanks for testing.

> Do you think (current-column) should also return a value conforming to the display logic? e.g. if 'ra' above is
> the first character in the line and point next to it, should it report 1 or 2?

That'd be too much, IMO.  current-column is called in many places, and
it would be unexpected for it to return different values depending on
the font and the frame.  The correspondence between these two
functions is not 100% now anyway (e.g., current-column is sensitive to
auto-composition-mode, whereas string-width isn't).

Lisp programs that need 100% accuracy in these matters should call
window-text-pixel-size.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-27 13:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-25 15:56 Unicode combining characters Anand Tamariya
2021-05-25 17:22 ` Stefan Monnier
2021-05-25 17:24 ` Eli Zaretskii
2021-05-25 18:15   ` Clément Pit-Claudel
2021-05-25 18:39     ` Eli Zaretskii
2021-05-25 19:30       ` Clément Pit-Claudel
2021-05-25 19:44         ` Eli Zaretskii
2021-05-26  9:51   ` Anand Tamariya
2021-05-26 10:04     ` Joost Kremers
2021-05-26 12:54     ` Eli Zaretskii
2021-05-26 17:14       ` Eli Zaretskii
2021-05-27  7:00         ` Anand Tamariya
2021-05-27  9:40           ` Eli Zaretskii
2021-05-27 10:34             ` Basil L. Contovounesios
2021-05-27 12:30               ` Eli Zaretskii
2021-05-27 13:27             ` Anand Tamariya
2021-05-27 13:44               ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).