unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#56237: 29.0.50; delete-forward-char fails to delete character
@ 2022-06-26 16:07 visuweshm
  2022-06-26 16:13 ` Visuwesh
  2022-06-26 16:18 ` Eli Zaretskii
  0 siblings, 2 replies; 21+ messages in thread
From: visuweshm @ 2022-06-26 16:07 UTC (permalink / raw)
  To: 56237

delete-forward-char fails to delete if the point is between two composed
characters.  To demonstrate,

        1. emacs -Q
        2. Yank "ரு போ" to the *scratch* buffer
        3. Place the cursor on the space character and say <Delete>

Observe how delete-forward-char does nothing.

In GNU Emacs 29.0.50 (build 22, x86_64-pc-linux-gnu, X toolkit, Xaw scroll bars)
 of 2022-06-25 built on astatine
Repository revision: 376ecd5346496a4f11a3bc93814b03d7a884b841
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101003
System Description: Debian GNU/Linux 11 (bullseye)

Configured using:
 'configure --with-modules --with-sound=alsa --with-x-toolkit=lucid
 --with-json --without-xaw3d --without-gconf --without-libsystemd
 --with-x --without-cairo'

Configured features:
ACL DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON
LIBOTF LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG
SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS WEBP X11 XDBE XFT
XIM XINPUT2 XPM LUCID ZLIB

Important settings:
  value of $LC_MONETARY: ta_IN.UTF-8
  value of $LC_NUMERIC: ta_IN.UTF-8
  value of $LANG: en_GB.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
cus-start cus-load ind-util quail help-mode cl-loaddefs cl-lib rmc
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq
simple cl-generic indonesian philippine cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite emoji-zwj charscript charprop case-table
epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button
loaddefs faces cus-face macroexp files window text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting x-toolkit
xinput2 x multi-tty make-network-process emacs)

Memory information:
((conses 16 110282 8844)
 (symbols 48 6549 0)
 (strings 32 40942 1993)
 (string-bytes 1 619696)
 (vectors 16 23168)
 (vector-slots 8 327896 13098)
 (floats 8 24 24)
 (intervals 56 2398 6)
 (buffers 992 11))





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:07 bug#56237: 29.0.50; delete-forward-char fails to delete character visuweshm
@ 2022-06-26 16:13 ` Visuwesh
  2022-06-26 16:18 ` Eli Zaretskii
  1 sibling, 0 replies; 21+ messages in thread
From: Visuwesh @ 2022-06-26 16:13 UTC (permalink / raw)
  To: 56237

[ஞாயிறு ஜூன் 26, 2022] visuweshm@gmail.com wrote:

> delete-forward-char fails to delete if the point is between two composed
> characters.  To demonstrate,
>
>         1. emacs -Q
>         2. Yank "ரு போ" to the *scratch* buffer
>         3. Place the cursor on the space character and say <Delete>
>
> Observe how delete-forward-char does nothing.

It seems to have trouble with composed characters back to back: Say
<delete> when point is over the first composed character in ப்போ, it
deletes both the characters instead of just ப்.  But it copes fine when
you have ருரூ and the point is over the first composed character again.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:07 bug#56237: 29.0.50; delete-forward-char fails to delete character visuweshm
  2022-06-26 16:13 ` Visuwesh
@ 2022-06-26 16:18 ` Eli Zaretskii
  2022-06-26 16:24   ` Lars Ingebrigtsen
  2022-06-26 16:25   ` Visuwesh
  1 sibling, 2 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 16:18 UTC (permalink / raw)
  To: visuweshm; +Cc: 56237

> From: visuweshm@gmail.com
> Date: Sun, 26 Jun 2022 21:37:07 +0530
> 
> delete-forward-char fails to delete if the point is between two composed
> characters.  To demonstrate,
> 
>         1. emacs -Q
>         2. Yank "ரு போ" to the *scratch* buffer
>         3. Place the cursor on the space character and say <Delete>
> 
> Observe how delete-forward-char does nothing.

This is a feature (new with Emacs 29): delete-forward-char deletes
entire grapheme clusters.  Use C-d if you want to delete individual
codepoints inside a grapheme cluster.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:18 ` Eli Zaretskii
@ 2022-06-26 16:24   ` Lars Ingebrigtsen
  2022-06-26 16:25   ` Visuwesh
  1 sibling, 0 replies; 21+ messages in thread
From: Lars Ingebrigtsen @ 2022-06-26 16:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237, visuweshm

Eli Zaretskii <eliz@gnu.org> writes:

>> delete-forward-char fails to delete if the point is between two composed
>> characters.  To demonstrate,
>> 
>>         1. emacs -Q
>>         2. Yank "ரு போ" to the *scratch* buffer
>>         3. Place the cursor on the space character and say <Delete>
>> 
>> Observe how delete-forward-char does nothing.
>
> This is a feature (new with Emacs 29): delete-forward-char deletes
> entire grapheme clusters.  Use C-d if you want to delete individual
> codepoints inside a grapheme cluster.

Putting point at the start of the line, you can hit <del> three times to
delete all characters on the line -- so there's three grapheme clusters,
and that works fine.  The problem is if you put point on the second
cluster, <del> does nothing, and that has to be a bug.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:18 ` Eli Zaretskii
  2022-06-26 16:24   ` Lars Ingebrigtsen
@ 2022-06-26 16:25   ` Visuwesh
  2022-06-26 16:36     ` Eli Zaretskii
  2022-06-26 16:38     ` Eli Zaretskii
  1 sibling, 2 replies; 21+ messages in thread
From: Visuwesh @ 2022-06-26 16:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> From: visuweshm@gmail.com
>> Date: Sun, 26 Jun 2022 21:37:07 +0530
>> 
>> delete-forward-char fails to delete if the point is between two composed
>> characters.  To demonstrate,
>> 
>>         1. emacs -Q
>>         2. Yank "ரு போ" to the *scratch* buffer
>>         3. Place the cursor on the space character and say <Delete>
>> 
>> Observe how delete-forward-char does nothing.
>
> This is a feature (new with Emacs 29): delete-forward-char deletes
> entire grapheme clusters.  Use C-d if you want to delete individual
> codepoints inside a grapheme cluster.

I'm not sure how this defeats my expectation though?  I want the SPC to
be deleted and it does it do it when the buffer contains "b b" with the
point over SPC but it fails to do so when the buffer contains "ரு b"
instead.  I was under the impression that I could safely rebind C-d to
delete-forward-char.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:25   ` Visuwesh
@ 2022-06-26 16:36     ` Eli Zaretskii
  2022-06-26 16:47       ` Visuwesh
  2022-06-26 16:38     ` Eli Zaretskii
  1 sibling, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 16:36 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Sun, 26 Jun 2022 21:55:50 +0530
> 
> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
> 
> >> From: visuweshm@gmail.com
> >> Date: Sun, 26 Jun 2022 21:37:07 +0530
> >> 
> >> delete-forward-char fails to delete if the point is between two composed
> >> characters.  To demonstrate,
> >> 
> >>         1. emacs -Q
> >>         2. Yank "ரு போ" to the *scratch* buffer
> >>         3. Place the cursor on the space character and say <Delete>
> >> 
> >> Observe how delete-forward-char does nothing.
> >
> > This is a feature (new with Emacs 29): delete-forward-char deletes
> > entire grapheme clusters.  Use C-d if you want to delete individual
> > codepoints inside a grapheme cluster.
> 
> I'm not sure how this defeats my expectation though?  I want the SPC to
> be deleted and it does it do it when the buffer contains "b b" with the
> point over SPC but it fails to do so when the buffer contains "ரு b"
> instead.  I was under the impression that I could safely rebind C-d to
> delete-forward-char.

Sorry, I misunderstood the report.

I tried to fix this now on master.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:25   ` Visuwesh
  2022-06-26 16:36     ` Eli Zaretskii
@ 2022-06-26 16:38     ` Eli Zaretskii
  1 sibling, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 16:38 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Sun, 26 Jun 2022 21:55:50 +0530
> 
> I was under the impression that I could safely rebind C-d to
> delete-forward-char.

The main point is not the key binding: there's nothing magic in C-d
per se.  The point is the delete-char and delete-forward-char behave
differently inside composed sequences, and that's intentional.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:36     ` Eli Zaretskii
@ 2022-06-26 16:47       ` Visuwesh
  2022-06-26 16:57         ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-06-26 16:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

> Sorry, I misunderstood the report.
>
> I tried to fix this now on master.

Thanks for the quick fix!  Unfortunately, delete-forward-char still
deletes two clusters instead of one in "ப்போ".  :(

I.e.,

    |ப்போ becomes |

where | denotes the point.

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm@gmail.com>
>> Cc: 56237@debbugs.gnu.org
>> Date: Sun, 26 Jun 2022 21:55:50 +0530
>> 
>> I was under the impression that I could safely rebind C-d to
>> delete-forward-char.
>
> The main point is not the key binding: there's nothing magic in C-d
> per se.  The point is the delete-char and delete-forward-char behave
> differently inside composed sequences, and that's intentional.

Right.  For my use case, I never need go into the composed sequence so I
rebound C-d to delete-forward.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:47       ` Visuwesh
@ 2022-06-26 16:57         ` Eli Zaretskii
  2022-06-26 17:06           ` Visuwesh
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 16:57 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Sun, 26 Jun 2022 22:17:53 +0530
> 
> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
> 
> > Sorry, I misunderstood the report.
> >
> > I tried to fix this now on master.
> 
> Thanks for the quick fix!  Unfortunately, delete-forward-char still
> deletes two clusters instead of one in "ப்போ".  :(

They aren't two clusters, they are two graphemes that are part of a
single grapheme cluster.

Invoke find-composition, and you will see that it returns a single
composition there.

> I.e.,
> 
>     |ப்போ becomes |
> 
> where | denotes the point.

This is the intended behavior.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 16:57         ` Eli Zaretskii
@ 2022-06-26 17:06           ` Visuwesh
  2022-06-26 17:26             ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-06-26 17:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm@gmail.com>
>> Cc: 56237@debbugs.gnu.org
>> Date: Sun, 26 Jun 2022 22:17:53 +0530
>> 
>> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
>> 
>> > Sorry, I misunderstood the report.
>> >
>> > I tried to fix this now on master.
>> 
>> Thanks for the quick fix!  Unfortunately, delete-forward-char still
>> deletes two clusters instead of one in "ப்போ".  :(
>
> They aren't two clusters, they are two graphemes that are part of a
> single grapheme cluster.
>
> Invoke find-composition, and you will see that it returns a single
> composition there.
>

If find-composition is indeed right, then the return value is very
unintuvitive as a native speaker: ப் and போ are two separate characters
and combining them into a single cluster is weird...  

Am I right in thinking that a grapheme cluster is made up of characters
that can be grouped together to produce a single "letter" on screen?  If
so, the behaviour of find-composition is still confusing since I need to
say C-f twice to move over ப்போ.

>> I.e.,
>> 
>>     |ப்போ becomes |
>> 
>> where | denotes the point.
>
> This is the intended behavior.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 17:06           ` Visuwesh
@ 2022-06-26 17:26             ` Eli Zaretskii
  2022-06-26 18:01               ` Eli Zaretskii
  2022-06-27  5:31               ` Visuwesh
  0 siblings, 2 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 17:26 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Sun, 26 Jun 2022 22:36:31 +0530
> 
> > Invoke find-composition, and you will see that it returns a single
> > composition there.
> 
> If find-composition is indeed right, then the return value is very
> unintuvitive as a native speaker: ப் and போ are two separate characters
> and combining them into a single cluster is weird...  

Maybe you are right, but then Someone(TM) will have to either modify
find-composition or explain how to interpret its return value
differently from what we do now.  What is now in delete-forward-char
expresses my level of knowledge in this area, which admittedly is
limited.

> Am I right in thinking that a grapheme cluster is made up of characters
> that can be grouped together to produce a single "letter" on screen?

The fact that you quote "letter" already means that we have
terminology problem, because I don't think you will be able to define
it rigorously enough for this purpose.

I don't think we have a definition of a grapheme cluster in Emacs
terms that is always correct, given that these decisions are in many
cases delegated to the shaping engine.

> If so, the behaviour of find-composition is still confusing since I
> need to say C-f twice to move over ப்போ.

Could be.  If it confuses too much, you are free to use delete-char to
delete one codepoint at a time.  What delete-forward-char codes is a
convenience feature, so if it is sub-optimal in some rare cases,
that's not a catastrophe, I think.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 17:26             ` Eli Zaretskii
@ 2022-06-26 18:01               ` Eli Zaretskii
  2022-06-27  5:31               ` Visuwesh
  1 sibling, 0 replies; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-26 18:01 UTC (permalink / raw)
  To: visuweshm; +Cc: 56237

> Cc: 56237@debbugs.gnu.org
> Date: Sun, 26 Jun 2022 20:26:56 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > From: Visuwesh <visuweshm@gmail.com>
> > Cc: 56237@debbugs.gnu.org
> > Date: Sun, 26 Jun 2022 22:36:31 +0530
> > 
> > > Invoke find-composition, and you will see that it returns a single
> > > composition there.
> > 
> > If find-composition is indeed right, then the return value is very
> > unintuvitive as a native speaker: ப் and போ are two separate characters
> > and combining them into a single cluster is weird...  
> 
> Maybe you are right, but then Someone(TM) will have to either modify
> find-composition or explain how to interpret its return value
> differently from what we do now.  What is now in delete-forward-char
> expresses my level of knowledge in this area, which admittedly is
> limited.
> [...]
> If so, the behaviour of find-composition is still confusing since I
> need to say C-f twice to move over ப்போ.

Mmm... that gave an idea.  Let me see if I can come up with something.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-26 17:26             ` Eli Zaretskii
  2022-06-26 18:01               ` Eli Zaretskii
@ 2022-06-27  5:31               ` Visuwesh
  2022-06-27  5:47                 ` Visuwesh
  1 sibling, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-06-27  5:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm@gmail.com>
>> Cc: 56237@debbugs.gnu.org
>> Date: Sun, 26 Jun 2022 22:36:31 +0530
>> 
>> > Invoke find-composition, and you will see that it returns a single
>> > composition there.
>> 
>> If find-composition is indeed right, then the return value is very
>> unintuvitive as a native speaker: ப் and போ are two separate characters
>> and combining them into a single cluster is weird...  
>
> Maybe you are right, but then Someone(TM) will have to either modify
> find-composition or explain how to interpret its return value
> differently from what we do now.  What is now in delete-forward-char
> expresses my level of knowledge in this area, which admittedly is
> limited.
>

Turns out that Someone™ was closer to us than I thought: describe-char.
With a bit of edebug and reading the code in composition.h (for the
LGLYPH_* macros) and defsubst's in composite.el, I think I figured out
the logic:

We need to call find-composition with a non-nil DETAIL-P argument to get
the gstring.  The gstring contains the glyphs that will be used to
construct the grapheme cluster [1].  According to composition.h, those
glyphs which have the same FROM and TO indices are part of the same
grapheme cluster so to get the actual length of individual codepoints,
we need to calculate the number of glyphs which have an equal FROM and
TO indices.

Understanding all this, I came up with the following code:

    (let* ((composition (find-composition 0 nil "ப்போ" t))
           (gstring (nth 2 composition))
           (num-glyphs (lgstring-glyph-len gstring))
           (i 1)
           (from (lglyph-from (lgstring-glyph gstring 0)))
           (to (lglyph-to (lgstring-glyph gstring 0))))
      (while (and (< i num-glyphs)
                  (= from (lglyph-from (lgstring-glyph gstring i)))
                  (= to (lglyph-to (lgstring-glyph gstring i))))
        (setq i (1+ i)))
      i)

here i is the number of characters we need to delete using delete-char.

[1] For the gstring format, see composition-get-gstring.

But I think we should test this code in cases where a grapheme cluster
contains more than two codepoints since all the composed characters in
Tamil are made up of two Unicode codepoints.  I can't test it on emojis
since I don't know of an Emoji font that won't crash potentially Xft and
has enough coverage.

>> Am I right in thinking that a grapheme cluster is made up of characters
>> that can be grouped together to produce a single "letter" on screen?
>
> The fact that you quote "letter" already means that we have
> terminology problem, because I don't think you will be able to define
> it rigorously enough for this purpose.
>
> I don't think we have a definition of a grapheme cluster in Emacs
> terms that is always correct, given that these decisions are in many
> cases delegated to the shaping engine.
>

I quoted "letter" because I was thinking of emojis.  I should have been
more explicit, sorry about that.

>> If so, the behaviour of find-composition is still confusing since I
>> need to say C-f twice to move over ப்போ.
>
> Could be.  If it confuses too much, you are free to use delete-char to
> delete one codepoint at a time.  What delete-forward-char codes is a
> convenience feature, so if it is sub-optimal in some rare cases,
> that's not a catastrophe, I think.

Unfortunately, the places where the current code of delete-forward-char
fails are far too frequent to put up with the switch between delete-char
and delete-forward-char.  ப்போ is only a single example, in fact,
delete-forward-char fails whenever a cluster which contains a consonant
and a virama is followed by another Tamil character.

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> If so, the behaviour of find-composition is still confusing since I
>> need to say C-f twice to move over ப்போ.
>
> Mmm... that gave an idea.  Let me see if I can come up with something.

It could be a false alarm since the clusters in Tamil are all are made
up of two Unicode codepoints.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-27  5:31               ` Visuwesh
@ 2022-06-27  5:47                 ` Visuwesh
  2022-06-27 12:39                   ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-06-27  5:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[திங்கள் ஜூன் 27, 2022] Visuwesh wrote:

> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
>
>>> From: Visuwesh <visuweshm@gmail.com>
>>> Cc: 56237@debbugs.gnu.org
>>> Date: Sun, 26 Jun 2022 22:36:31 +0530
>>> 
>>> > Invoke find-composition, and you will see that it returns a single
>>> > composition there.
>>> 
>>> If find-composition is indeed right, then the return value is very
>>> unintuvitive as a native speaker: ப் and போ are two separate characters
>>> and combining them into a single cluster is weird...  
>>
>> Maybe you are right, but then Someone(TM) will have to either modify
>> find-composition or explain how to interpret its return value
>> differently from what we do now.  What is now in delete-forward-char
>> expresses my level of knowledge in this area, which admittedly is
>> limited.
>>
>
> Turns out that Someone™ was closer to us than I thought: describe-char.
> With a bit of edebug and reading the code in composition.h (for the
> LGLYPH_* macros) and defsubst's in composite.el, I think I figured out
> the logic:
>
> We need to call find-composition with a non-nil DETAIL-P argument to get
> the gstring.  The gstring contains the glyphs that will be used to
> construct the grapheme cluster [1].  According to composition.h, those
> glyphs which have the same FROM and TO indices are part of the same
> grapheme cluster so to get the actual length of individual codepoints,
> we need to calculate the number of glyphs which have an equal FROM and
> TO indices.
>
> Understanding all this, I came up with the following code:
>
>     (let* ((composition (find-composition 0 nil "ப்போ" t))
>            (gstring (nth 2 composition))
>            (num-glyphs (lgstring-glyph-len gstring))
>            (i 1)
>            (from (lglyph-from (lgstring-glyph gstring 0)))
>            (to (lglyph-to (lgstring-glyph gstring 0))))
>       (while (and (< i num-glyphs)
>                   (= from (lglyph-from (lgstring-glyph gstring i)))
>                   (= to (lglyph-to (lgstring-glyph gstring i))))
>         (setq i (1+ i)))
>       i)
>
> here i is the number of characters we need to delete using delete-char.
>
> [1] For the gstring format, see composition-get-gstring.
>
> But I think we should test this code in cases where a grapheme cluster
> contains more than two codepoints since all the composed characters in
> Tamil are made up of two Unicode codepoints.  I can't test it on emojis
> since I don't know of an Emoji font that won't crash potentially Xft and
> has enough coverage.
>

I got my hopes too high.  :(

This fails for the simple case of ரு (C-u C-x = also fails!) so I guess
we are back to square one.  Although ரு is composed from 0BB0 0BC1, the
gstring only has one glyph.






^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-27  5:47                 ` Visuwesh
@ 2022-06-27 12:39                   ` Eli Zaretskii
  2022-06-27 14:24                     ` Visuwesh
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-27 12:39 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Mon, 27 Jun 2022 11:17:25 +0530
> 
> >     (let* ((composition (find-composition 0 nil "ப்போ" t))
> >            (gstring (nth 2 composition))
> >            (num-glyphs (lgstring-glyph-len gstring))
> >            (i 1)
> >            (from (lglyph-from (lgstring-glyph gstring 0)))
> >            (to (lglyph-to (lgstring-glyph gstring 0))))
> >       (while (and (< i num-glyphs)
> >                   (= from (lglyph-from (lgstring-glyph gstring i)))
> >                   (= to (lglyph-to (lgstring-glyph gstring i))))
> >         (setq i (1+ i)))
> >       i)
> >
> > here i is the number of characters we need to delete using delete-char.
> >
> > [1] For the gstring format, see composition-get-gstring.
> >
> > But I think we should test this code in cases where a grapheme cluster
> > contains more than two codepoints since all the composed characters in
> > Tamil are made up of two Unicode codepoints.  I can't test it on emojis
> > since I don't know of an Emoji font that won't crash potentially Xft and
> > has enough coverage.
> >
> 
> I got my hopes too high.  :(
> 
> This fails for the simple case of ரு (C-u C-x = also fails!) so I guess
> we are back to square one.  Although ரு is composed from 0BB0 0BC1, the
> gstring only has one glyph.

Yes, composition of N characters can in general produce M glyphs,
where M can be smaller, equal, or greater than N.  It's a many-to-many
operation, and we cannot rely on getting the same number of glyphs as
the number of codepoints we compose.

The idea is nevertheless correct (I had the same one), it just needs
some fine-tuning.  (And "C-x =" tries to solve a different problem:
how to match each glyph with a codepoint, and that problem is in
general insoluble, so it's a small wonder that it fails.)

Please try the latest master, I hope delete-forward-char now behaves
better.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-27 12:39                   ` Eli Zaretskii
@ 2022-06-27 14:24                     ` Visuwesh
  2022-06-27 15:53                       ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-06-27 14:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237

[திங்கள் ஜூன் 27, 2022] Eli Zaretskii wrote:

> Yes, composition of N characters can in general produce M glyphs,
> where M can be smaller, equal, or greater than N.  It's a many-to-many
> operation, and we cannot rely on getting the same number of glyphs as
> the number of codepoints we compose.
>
> The idea is nevertheless correct (I had the same one), it just needs
> some fine-tuning.  (And "C-x =" tries to solve a different problem:
> how to match each glyph with a codepoint, and that problem is in
> general insoluble, so it's a small wonder that it fails.)
>

Right, thanks for the explanation.

> Please try the latest master, I hope delete-forward-char now behaves
> better.

Thanks, it is much better now!  It doesn't fail on the cases that I had
problems with so far.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-27 14:24                     ` Visuwesh
@ 2022-06-27 15:53                       ` Eli Zaretskii
  2022-07-02  7:03                         ` Visuwesh
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-06-27 15:53 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org
> Date: Mon, 27 Jun 2022 19:54:42 +0530
> 
> [திங்கள் ஜூன் 27, 2022] Eli Zaretskii wrote:
> 
> > Yes, composition of N characters can in general produce M glyphs,
> > where M can be smaller, equal, or greater than N.  It's a many-to-many
> > operation, and we cannot rely on getting the same number of glyphs as
> > the number of codepoints we compose.
> >
> > The idea is nevertheless correct (I had the same one), it just needs
> > some fine-tuning.  (And "C-x =" tries to solve a different problem:
> > how to match each glyph with a codepoint, and that problem is in
> > general insoluble, so it's a small wonder that it fails.)
> >
> 
> Right, thanks for the explanation.
> 
> > Please try the latest master, I hope delete-forward-char now behaves
> > better.
> 
> Thanks, it is much better now!  It doesn't fail on the cases that I had
> problems with so far.

Thanks for testing, and feel free to close whenever you feel satisfied ;-)





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-06-27 15:53                       ` Eli Zaretskii
@ 2022-07-02  7:03                         ` Visuwesh
  2022-07-16 12:50                           ` Visuwesh
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-07-02  7:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237, 56237-done

[திங்கள் ஜூன் 27, 2022] Eli Zaretskii wrote:

>> Thanks, it is much better now!  It doesn't fail on the cases that I had
>> problems with so far.
>
> Thanks for testing, and feel free to close whenever you feel satisfied ;-)

Closing issue since I haven't noticed anything funny.  Thanks again for
the quick fixes!





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-07-02  7:03                         ` Visuwesh
@ 2022-07-16 12:50                           ` Visuwesh
  2022-07-16 13:31                             ` Eli Zaretskii
  0 siblings, 1 reply; 21+ messages in thread
From: Visuwesh @ 2022-07-16 12:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237, 56237-done

reopen 56237
thanks

[சனி ஜூலை 02, 2022] Visuwesh wrote:

> [திங்கள் ஜூன் 27, 2022] Eli Zaretskii wrote:
>
>>> Thanks, it is much better now!  It doesn't fail on the cases that I had
>>> problems with so far.
>>
>> Thanks for testing, and feel free to close whenever you feel satisfied ;-)
>
> Closing issue since I haven't noticed anything funny.  Thanks again for
> the quick fixes!

This command somehow to have been regressed.  With the HEAD at
0190dff96a, the command works perfectly fine but with current master, it
fails to delete by entire clusters: it behaves identical to delete-char.

With the point being |,

    |ரு

after <Delete>, becomes

    |ு

instead of |.

Can you please take a look, Eli?





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-07-16 12:50                           ` Visuwesh
@ 2022-07-16 13:31                             ` Eli Zaretskii
  2022-07-16 13:43                               ` Visuwesh
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2022-07-16 13:31 UTC (permalink / raw)
  To: Visuwesh; +Cc: 56237-done

> From: Visuwesh <visuweshm@gmail.com>
> Cc: 56237@debbugs.gnu.org,  56237-done@debbugs.gnu.org
> Date: Sat, 16 Jul 2022 18:20:46 +0530
> 
> This command somehow to have been regressed.  With the HEAD at
> 0190dff96a, the command works perfectly fine but with current master, it
> fails to delete by entire clusters: it behaves identical to delete-char.
> 
> With the point being |,
> 
>     |ரு
> 
> after <Delete>, becomes
> 
>     |ு
> 
> instead of |.
> 
> Can you please take a look, Eli?

Thanks, now fixed.

Please in the future always open a new bug instead of reopening an old
one, because regression is almost never due to the same problem that
was originally fixed, and our bugs should be per programming error,
not per external manifestations of that error.





^ permalink raw reply	[flat|nested] 21+ messages in thread

* bug#56237: 29.0.50; delete-forward-char fails to delete character
  2022-07-16 13:31                             ` Eli Zaretskii
@ 2022-07-16 13:43                               ` Visuwesh
  0 siblings, 0 replies; 21+ messages in thread
From: Visuwesh @ 2022-07-16 13:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 56237-done

[சனி ஜூலை 16, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm@gmail.com>
>> Cc: 56237@debbugs.gnu.org,  56237-done@debbugs.gnu.org
>> Date: Sat, 16 Jul 2022 18:20:46 +0530
>> 
>> This command somehow to have been regressed.  With the HEAD at
>> 0190dff96a, the command works perfectly fine but with current master, it
>> fails to delete by entire clusters: it behaves identical to delete-char.
>> 
>> With the point being |,
>> 
>>     |ரு
>> 
>> after <Delete>, becomes
>> 
>>     |ு
>> 
>> instead of |.
>> 
>> Can you please take a look, Eli?
>
> Thanks, now fixed.

Thanks! can confirm that it is fixed.

> Please in the future always open a new bug instead of reopening an old
> one, because regression is almost never due to the same problem that
> was originally fixed, and our bugs should be per programming error,
> not per external manifestations of that error.

OK, noted.





^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-07-16 13:43 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 16:07 bug#56237: 29.0.50; delete-forward-char fails to delete character visuweshm
2022-06-26 16:13 ` Visuwesh
2022-06-26 16:18 ` Eli Zaretskii
2022-06-26 16:24   ` Lars Ingebrigtsen
2022-06-26 16:25   ` Visuwesh
2022-06-26 16:36     ` Eli Zaretskii
2022-06-26 16:47       ` Visuwesh
2022-06-26 16:57         ` Eli Zaretskii
2022-06-26 17:06           ` Visuwesh
2022-06-26 17:26             ` Eli Zaretskii
2022-06-26 18:01               ` Eli Zaretskii
2022-06-27  5:31               ` Visuwesh
2022-06-27  5:47                 ` Visuwesh
2022-06-27 12:39                   ` Eli Zaretskii
2022-06-27 14:24                     ` Visuwesh
2022-06-27 15:53                       ` Eli Zaretskii
2022-07-02  7:03                         ` Visuwesh
2022-07-16 12:50                           ` Visuwesh
2022-07-16 13:31                             ` Eli Zaretskii
2022-07-16 13:43                               ` Visuwesh
2022-06-26 16:38     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).