* Unicode combining characters @ 2021-05-25 15:56 Anand Tamariya 2021-05-25 17:22 ` Stefan Monnier 2021-05-25 17:24 ` Eli Zaretskii 0 siblings, 2 replies; 17+ messages in thread From: Anand Tamariya @ 2021-05-25 15:56 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 353 bytes --] Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a rectangular overlay for constant number of characters (screenshot ) <https://1.bp.blogspot.com/-P2ZnFePOpOo/YK0cNJ4B5II/AAAAAAAAJJs/t-MADtxUeps3S_WXZ_rFWjf9daH49sr9QCLcBGAsYHQ/s421/combining.png> What would be a recommended way to tackle this in Emacs? [-- Attachment #2: Type: text/html, Size: 429 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 15:56 Unicode combining characters Anand Tamariya @ 2021-05-25 17:22 ` Stefan Monnier 2021-05-25 17:24 ` Eli Zaretskii 1 sibling, 0 replies; 17+ messages in thread From: Stefan Monnier @ 2021-05-25 17:22 UTC (permalink / raw) To: Anand Tamariya; +Cc: emacs-devel > Hindi Devanagari script has lot of unicode combining characters which > results in misalignment in a rectangular overlay for constant number of > characters (screenshot ) > <https://1.bp.blogspot.com/-P2ZnFePOpOo/YK0cNJ4B5II/AAAAAAAAJJs/t-MADtxUeps3S_WXZ_rFWjf9daH49sr9QCLcBGAsYHQ/s421/combining.png> > What would be a recommended way to tackle this in Emacs? In a GUI session, the usual answer is to use posframe, AFAIK. Stefan ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 15:56 Unicode combining characters Anand Tamariya 2021-05-25 17:22 ` Stefan Monnier @ 2021-05-25 17:24 ` Eli Zaretskii 2021-05-25 18:15 ` Clément Pit-Claudel 2021-05-26 9:51 ` Anand Tamariya 1 sibling, 2 replies; 17+ messages in thread From: Eli Zaretskii @ 2021-05-25 17:24 UTC (permalink / raw) To: Anand Tamariya; +Cc: emacs-devel > From: Anand Tamariya <atamariya@gmail.com> > Date: Tue, 25 May 2021 21:26:44 +0530 > > Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a > rectangular overlay for constant number of characters (screenshot ) > What would be a recommended way to tackle this in Emacs? Use align-to 'space' display spec and/or the window-text-pixel-size function, which will account for the actual size of the text on display. string-width can also be used, but it only gives an approximation, as it is oblivious of the actual size of the font glyphs. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 17:24 ` Eli Zaretskii @ 2021-05-25 18:15 ` Clément Pit-Claudel 2021-05-25 18:39 ` Eli Zaretskii 2021-05-26 9:51 ` Anand Tamariya 1 sibling, 1 reply; 17+ messages in thread From: Clément Pit-Claudel @ 2021-05-25 18:15 UTC (permalink / raw) To: emacs-devel On 5/25/21 1:24 PM, Eli Zaretskii wrote: >> From: Anand Tamariya <atamariya@gmail.com> >> Date: Tue, 25 May 2021 21:26:44 +0530 >> >> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a >> rectangular overlay for constant number of characters (screenshot ) >> What would be a recommended way to tackle this in Emacs? > > Use align-to 'space' display spec and/or the window-text-pixel-size > function, which will account for the actual size of the text on > display. Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no? (IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 18:15 ` Clément Pit-Claudel @ 2021-05-25 18:39 ` Eli Zaretskii 2021-05-25 19:30 ` Clément Pit-Claudel 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2021-05-25 18:39 UTC (permalink / raw) To: Clément Pit-Claudel; +Cc: emacs-devel > From: Clément Pit-Claudel <cpitclaudel@gmail.com> > Date: Tue, 25 May 2021 14:15:33 -0400 > > On 5/25/21 1:24 PM, Eli Zaretskii wrote: > >> From: Anand Tamariya <atamariya@gmail.com> > >> Date: Tue, 25 May 2021 21:26:44 +0530 > >> > >> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a > >> rectangular overlay for constant number of characters (screenshot ) > >> What would be a recommended way to tackle this in Emacs? > > > > Use align-to 'space' display spec and/or the window-text-pixel-size > > function, which will account for the actual size of the text on > > display. > > Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no? I don't understand, but maybe you know about the particular use case more than I do. I just mentioned two devices that can be accurate to 1 pixel wrt to the X coordinate. > (IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?) Again, I don't think I follow. If you have "this text", you can calculate its width on display, and then know how many pixels of white space you will need after "this string" replaces that text. So, unless I'm missing something, specifying the space width is redundant, and actually makes a solvable problem unsolvable. But I might be talking nonsense because I don't understand what problem the OP wants to solve. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 18:39 ` Eli Zaretskii @ 2021-05-25 19:30 ` Clément Pit-Claudel 2021-05-25 19:44 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: Clément Pit-Claudel @ 2021-05-25 19:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel On 5/25/21 2:39 PM, Eli Zaretskii wrote: >> From: Clément Pit-Claudel <cpitclaudel@gmail.com> >> Date: Tue, 25 May 2021 14:15:33 -0400 >> >> On 5/25/21 1:24 PM, Eli Zaretskii wrote: >>>> From: Anand Tamariya <atamariya@gmail.com> >>>> Date: Tue, 25 May 2021 21:26:44 +0530 >>>> >>>> Hindi Devanagari script has lot of unicode combining characters which results in misalignment in a >>>> rectangular overlay for constant number of characters (screenshot ) >>>> What would be a recommended way to tackle this in Emacs? >>> >>> Use align-to 'space' display spec and/or the window-text-pixel-size >>> function, which will account for the actual size of the text on >>> display. >> >> Will this work? The misaligned specs are already part of a replacing dipsplay spec, so the additional align-to would be ignored, no? > > I don't understand, but maybe you know about the particular use case > more than I do. I just mentioned two devices that can be accurate to > 1 pixel wrt to the X coordinate. > >> (IIRC, there is no way to say "replace this text by this string followed by this specified space; it's one or the other, right?) > > Again, I don't think I follow. If you have "this text", you can > calculate its width on display, and then know how many pixels of white > space you will need after "this string" replaces that text. So, > unless I'm missing something, specifying the space width is redundant, > and actually makes a solvable problem unsolvable. Based on the screenshot this is an issue with Company. Company displays its "pop-ups" by putting a replacing 'display property on the text following the point (and on the next few lines). So if the buffer contains ABC XYZ DEF GHI JKL MNO PQR STU and the point is after XYZ, then company puts a replacing display spec from " DEF" to "STU". To display completions "XYZ1233" and "XYZ456", the replacing display spec contains "123| GHI\nJKL XYZ456| STU", so the final display is ABC XYZ123| GHI JKL XYZ456| STU The OP's issue is that "123" and "456" don't have the same length. As far as I know, there is no way to add extra space after 123 or 456 so that they reach the same X coordinate, given that they are already part of a display spec. Clément. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 19:30 ` Clément Pit-Claudel @ 2021-05-25 19:44 ` Eli Zaretskii 0 siblings, 0 replies; 17+ messages in thread From: Eli Zaretskii @ 2021-05-25 19:44 UTC (permalink / raw) To: Clément Pit-Claudel; +Cc: emacs-devel > Cc: emacs-devel@gnu.org > From: Clément Pit-Claudel <cpitclaudel@gmail.com> > Date: Tue, 25 May 2021 15:30:21 -0400 > > Based on the screenshot this is an issue with Company. Company displays its "pop-ups" by putting a replacing 'display property on the text following the point (and on the next few lines). So if the buffer contains > > ABC XYZ DEF GHI > JKL MNO PQR STU > > and the point is after XYZ, then company puts a replacing display spec from " DEF" to "STU". > To display completions "XYZ1233" and "XYZ456", the replacing display spec contains "123| GHI\nJKL XYZ456| STU", so the final display is > > ABC XYZ123| GHI > JKL XYZ456| STU > > The OP's issue is that "123" and "456" don't have the same length. As far as I know, there is no way to add extra space after 123 or 456 so that they reach the same X coordinate, given that they are already part of a display spec. First, the OP said "overlay", and overlay strings can have display properties. And second, I'd expect the current code to use string-width to compute how much whitespace will be needed after each completion candidate, and string-width already accounts for composed (a.k.a "combined") characters. Yes, string-width provides only an approximation for the true pixel width of the string, but that's not specific to compositions, and the whole technique is somewhat of a kludge anyway, for this reason and others. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-25 17:24 ` Eli Zaretskii 2021-05-25 18:15 ` Clément Pit-Claudel @ 2021-05-26 9:51 ` Anand Tamariya 2021-05-26 10:04 ` Joost Kremers 2021-05-26 12:54 ` Eli Zaretskii 1 sibling, 2 replies; 17+ messages in thread From: Anand Tamariya @ 2021-05-26 9:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1003 bytes --] Thanks Eli - align-to 'space' display spec seems helpful. Though it's a company specific issue related to unicode character composition, here's some more details on the issue for record should somebody else stumble upon the same. Let's call the first character in the screenshot as shr (single glyph) and the second one as sh-r (two glyphs). (setq shr (string 2358 2381 2352)) (setq sh-r (string 2358 2352)) (string-width shr) ;; 2 (string-width sh-r) ;; 2 To create the rectangular region, we need to pad the strings with appropriate number of spaces. align-to 'space' display spec seems helpful in this case as shown below. You will notice that character "a" is aligned in both cases. Now I need to figure out how to use the same within company. (insert (concat shr (let ((sp " ")) (font-lock-append-text-property 0 1 'display `(space . (:align-to 10)) sp) sp) "a")) (insert (concat sh-r (let ((sp " ")) (font-lock-append-text-property 0 1 'display `(space . (:align-to 10)) sp) sp) "a")) [-- Attachment #2: Type: text/html, Size: 1391 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-26 9:51 ` Anand Tamariya @ 2021-05-26 10:04 ` Joost Kremers 2021-05-26 12:54 ` Eli Zaretskii 1 sibling, 0 replies; 17+ messages in thread From: Joost Kremers @ 2021-05-26 10:04 UTC (permalink / raw) To: emacs-devel On Wed, May 26 2021, Anand Tamariya wrote: > Thanks Eli - align-to 'space' display spec seems helpful. > > Though it's a company specific issue related to unicode character > composition, here's some more details on the issue for record should > somebody else stumble upon the same. At the risk of posting something irrelevant: the effect shown in the screen shot you posted also occurs if you use company in a buffer with variable-pitch-mode (which I do in e.g., LaTeX buffers). I don't know if that's the same problem, but if it is, a solution would be applicable beyond combining characters. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-26 9:51 ` Anand Tamariya 2021-05-26 10:04 ` Joost Kremers @ 2021-05-26 12:54 ` Eli Zaretskii 2021-05-26 17:14 ` Eli Zaretskii 1 sibling, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2021-05-26 12:54 UTC (permalink / raw) To: Anand Tamariya; +Cc: emacs-devel > From: Anand Tamariya <atamariya@gmail.com> > Date: Wed, 26 May 2021 15:21:05 +0530 > Cc: emacs-devel@gnu.org > > Let's call the first character in the screenshot as shr (single glyph) and the second one as sh-r (two glyphs). > (setq shr (string 2358 2381 2352)) > (setq sh-r (string 2358 2352)) > > (string-width shr) ;; 2 > (string-width sh-r) ;; 2 Sorry, it turns out I've misremembered: string-width doesn't account for "automatic compositions", the ones that happen due to composition-function-table (as opposed to "static compositions" which happen due to the 'composition' text property). So this case currently cannot be handled correctly by string-width; we should fix that. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-26 12:54 ` Eli Zaretskii @ 2021-05-26 17:14 ` Eli Zaretskii 2021-05-27 7:00 ` Anand Tamariya 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2021-05-26 17:14 UTC (permalink / raw) To: atamariya; +Cc: emacs-devel > Date: Wed, 26 May 2021 15:54:43 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: emacs-devel@gnu.org > > > (setq shr (string 2358 2381 2352)) > > (setq sh-r (string 2358 2352)) > > > > (string-width shr) ;; 2 > > (string-width sh-r) ;; 2 > > Sorry, it turns out I've misremembered: string-width doesn't account > for "automatic compositions", the ones that happen due to > composition-function-table (as opposed to "static compositions" which > happen due to the 'composition' text property). So this case > currently cannot be handled correctly by string-width; we should fix > that. Please try the latest master branch, I hope I fixed this now. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-26 17:14 ` Eli Zaretskii @ 2021-05-27 7:00 ` Anand Tamariya 2021-05-27 9:40 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: Anand Tamariya @ 2021-05-27 7:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 379 bytes --] > > Please try the latest master branch, I hope I fixed this now. > The fix works for the given example. However, here's another one that ideally should be one composed glyph (validated by moving the cursor over the glyph) but counts as 2 in string-width. (setq ra (string 2352 2366)) (string-width ra) ; 2 ;; Glyph in a word (setq shankar (string 2358 2306 2325 2352 2366)) [-- Attachment #2: Type: text/html, Size: 715 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-27 7:00 ` Anand Tamariya @ 2021-05-27 9:40 ` Eli Zaretskii 2021-05-27 10:34 ` Basil L. Contovounesios 2021-05-27 13:27 ` Anand Tamariya 0 siblings, 2 replies; 17+ messages in thread From: Eli Zaretskii @ 2021-05-27 9:40 UTC (permalink / raw) To: Anand Tamariya; +Cc: emacs-devel > From: Anand Tamariya <atamariya@gmail.com> > Date: Thu, 27 May 2021 12:30:04 +0530 > Cc: emacs-devel@gnu.org > > Please try the latest master branch, I hope I fixed this now. > > The fix works for the given example. However, here's another one that ideally should be one composed glyph > (validated by moving the cursor over the glyph) but counts as 2 in string-width. > > (setq ra (string 2352 2366)) > > (string-width ra) ; 2 OK, I improved this case now on master, please take a look. However, please note that getting this right makes string-width more dependent on the selected-frame's font used by the default face for the characters of the string. In particular, if that font is unable to combine the characters that should be composed, you will now get width which could be different from the value on other frames with other fonts. Also, the new code only works in interactive sessions on GUI frames, because we need the shaping engine (a.k.a. "font driver") to compose characters. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-27 9:40 ` Eli Zaretskii @ 2021-05-27 10:34 ` Basil L. Contovounesios 2021-05-27 12:30 ` Eli Zaretskii 2021-05-27 13:27 ` Anand Tamariya 1 sibling, 1 reply; 17+ messages in thread From: Basil L. Contovounesios @ 2021-05-27 10:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Anand Tamariya Eli Zaretskii <eliz@gnu.org> writes: > OK, I improved this case now on master, please take a look I'm seeing a couple of warnings: character.c: In function ‘lisp_string_width’: character.c:397:16: warning: assignment to ‘int’ from ‘Lisp_Object’ {aka ‘struct Lisp_X *’} makes integer from pointer without a cast [-Wint-conversion] 397 | font_width = AREF (font_info, 11); | ^ character.c:398:19: warning: ordered comparison of pointer with integer zero [-Wextra] 398 | if (font_info <= 0) | ^~ character.c:399:18: warning: assignment to ‘int’ from ‘Lisp_Object’ {aka ‘struct Lisp_X *’} makes integer from pointer without a cast [-Wint-conversion] 399 | font_width = AREF (font_info, 10); Do the font_info elements need to be untagged, and font_width rather than font_info checked for being positive? Thanks, -- Basil gcc (Debian 10.2.1-6) 10.2.1 20210110 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-27 10:34 ` Basil L. Contovounesios @ 2021-05-27 12:30 ` Eli Zaretskii 0 siblings, 0 replies; 17+ messages in thread From: Eli Zaretskii @ 2021-05-27 12:30 UTC (permalink / raw) To: Basil L. Contovounesios; +Cc: emacs-devel, atamariya > From: "Basil L. Contovounesios" <contovob@tcd.ie> > Cc: Anand Tamariya <atamariya@gmail.com>, emacs-devel@gnu.org > Date: Thu, 27 May 2021 11:34:55 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > OK, I improved this case now on master, please take a look > > I'm seeing a couple of warnings: Oops! sorry, should be fixed now. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-27 9:40 ` Eli Zaretskii 2021-05-27 10:34 ` Basil L. Contovounesios @ 2021-05-27 13:27 ` Anand Tamariya 2021-05-27 13:44 ` Eli Zaretskii 1 sibling, 1 reply; 17+ messages in thread From: Anand Tamariya @ 2021-05-27 13:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 556 bytes --] > > The fix works for the given example. However, here's another one that > ideally should be one composed glyph > > (validated by moving the cursor over the glyph) but counts as 2 in > string-width. > > > > (setq ra (string 2352 2366)) > > > > (string-width ra) ; 2 > > OK, I improved this case now on master, please take a look. > > Wonderful!! It works. Thanks. Do you think (current-column) should also return a value conforming to the display logic? e.g. if 'ra' above is the first character in the line and point next to it, should it report 1 or 2? [-- Attachment #2: Type: text/html, Size: 831 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Unicode combining characters 2021-05-27 13:27 ` Anand Tamariya @ 2021-05-27 13:44 ` Eli Zaretskii 0 siblings, 0 replies; 17+ messages in thread From: Eli Zaretskii @ 2021-05-27 13:44 UTC (permalink / raw) To: Anand Tamariya; +Cc: emacs-devel > From: Anand Tamariya <atamariya@gmail.com> > Date: Thu, 27 May 2021 18:57:58 +0530 > Cc: emacs-devel@gnu.org > > OK, I improved this case now on master, please take a look. > > Wonderful!! It works. Thanks. Thanks for testing. > Do you think (current-column) should also return a value conforming to the display logic? e.g. if 'ra' above is > the first character in the line and point next to it, should it report 1 or 2? That'd be too much, IMO. current-column is called in many places, and it would be unexpected for it to return different values depending on the font and the frame. The correspondence between these two functions is not 100% now anyway (e.g., current-column is sensitive to auto-composition-mode, whereas string-width isn't). Lisp programs that need 100% accuracy in these matters should call window-text-pixel-size. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-05-27 13:44 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-05-25 15:56 Unicode combining characters Anand Tamariya 2021-05-25 17:22 ` Stefan Monnier 2021-05-25 17:24 ` Eli Zaretskii 2021-05-25 18:15 ` Clément Pit-Claudel 2021-05-25 18:39 ` Eli Zaretskii 2021-05-25 19:30 ` Clément Pit-Claudel 2021-05-25 19:44 ` Eli Zaretskii 2021-05-26 9:51 ` Anand Tamariya 2021-05-26 10:04 ` Joost Kremers 2021-05-26 12:54 ` Eli Zaretskii 2021-05-26 17:14 ` Eli Zaretskii 2021-05-27 7:00 ` Anand Tamariya 2021-05-27 9:40 ` Eli Zaretskii 2021-05-27 10:34 ` Basil L. Contovounesios 2021-05-27 12:30 ` Eli Zaretskii 2021-05-27 13:27 ` Anand Tamariya 2021-05-27 13:44 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).