unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Visuwesh <visuweshm@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 56237@debbugs.gnu.org
Subject: bug#56237: 29.0.50; delete-forward-char fails to delete character
Date: Mon, 27 Jun 2022 11:01:03 +0530	[thread overview]
Message-ID: <87sfnqoep4.fsf@gmail.com> (raw)
In-Reply-To: <83tu878hen.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 26 Jun 2022 20:26:56 +0300")

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm@gmail.com>
>> Cc: 56237@debbugs.gnu.org
>> Date: Sun, 26 Jun 2022 22:36:31 +0530
>> 
>> > Invoke find-composition, and you will see that it returns a single
>> > composition there.
>> 
>> If find-composition is indeed right, then the return value is very
>> unintuvitive as a native speaker: ப் and போ are two separate characters
>> and combining them into a single cluster is weird...  
>
> Maybe you are right, but then Someone(TM) will have to either modify
> find-composition or explain how to interpret its return value
> differently from what we do now.  What is now in delete-forward-char
> expresses my level of knowledge in this area, which admittedly is
> limited.
>

Turns out that Someone™ was closer to us than I thought: describe-char.
With a bit of edebug and reading the code in composition.h (for the
LGLYPH_* macros) and defsubst's in composite.el, I think I figured out
the logic:

We need to call find-composition with a non-nil DETAIL-P argument to get
the gstring.  The gstring contains the glyphs that will be used to
construct the grapheme cluster [1].  According to composition.h, those
glyphs which have the same FROM and TO indices are part of the same
grapheme cluster so to get the actual length of individual codepoints,
we need to calculate the number of glyphs which have an equal FROM and
TO indices.

Understanding all this, I came up with the following code:

    (let* ((composition (find-composition 0 nil "ப்போ" t))
           (gstring (nth 2 composition))
           (num-glyphs (lgstring-glyph-len gstring))
           (i 1)
           (from (lglyph-from (lgstring-glyph gstring 0)))
           (to (lglyph-to (lgstring-glyph gstring 0))))
      (while (and (< i num-glyphs)
                  (= from (lglyph-from (lgstring-glyph gstring i)))
                  (= to (lglyph-to (lgstring-glyph gstring i))))
        (setq i (1+ i)))
      i)

here i is the number of characters we need to delete using delete-char.

[1] For the gstring format, see composition-get-gstring.

But I think we should test this code in cases where a grapheme cluster
contains more than two codepoints since all the composed characters in
Tamil are made up of two Unicode codepoints.  I can't test it on emojis
since I don't know of an Emoji font that won't crash potentially Xft and
has enough coverage.

>> Am I right in thinking that a grapheme cluster is made up of characters
>> that can be grouped together to produce a single "letter" on screen?
>
> The fact that you quote "letter" already means that we have
> terminology problem, because I don't think you will be able to define
> it rigorously enough for this purpose.
>
> I don't think we have a definition of a grapheme cluster in Emacs
> terms that is always correct, given that these decisions are in many
> cases delegated to the shaping engine.
>

I quoted "letter" because I was thinking of emojis.  I should have been
more explicit, sorry about that.

>> If so, the behaviour of find-composition is still confusing since I
>> need to say C-f twice to move over ப்போ.
>
> Could be.  If it confuses too much, you are free to use delete-char to
> delete one codepoint at a time.  What delete-forward-char codes is a
> convenience feature, so if it is sub-optimal in some rare cases,
> that's not a catastrophe, I think.

Unfortunately, the places where the current code of delete-forward-char
fails are far too frequent to put up with the switch between delete-char
and delete-forward-char.  ப்போ is only a single example, in fact,
delete-forward-char fails whenever a cluster which contains a consonant
and a virama is followed by another Tamil character.

[ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:

>> If so, the behaviour of find-composition is still confusing since I
>> need to say C-f twice to move over ப்போ.
>
> Mmm... that gave an idea.  Let me see if I can come up with something.

It could be a false alarm since the clusters in Tamil are all are made
up of two Unicode codepoints.





  parent reply	other threads:[~2022-06-27  5:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-26 16:07 bug#56237: 29.0.50; delete-forward-char fails to delete character visuweshm
2022-06-26 16:13 ` Visuwesh
2022-06-26 16:18 ` Eli Zaretskii
2022-06-26 16:24   ` Lars Ingebrigtsen
2022-06-26 16:25   ` Visuwesh
2022-06-26 16:36     ` Eli Zaretskii
2022-06-26 16:47       ` Visuwesh
2022-06-26 16:57         ` Eli Zaretskii
2022-06-26 17:06           ` Visuwesh
2022-06-26 17:26             ` Eli Zaretskii
2022-06-26 18:01               ` Eli Zaretskii
2022-06-27  5:31               ` Visuwesh [this message]
2022-06-27  5:47                 ` Visuwesh
2022-06-27 12:39                   ` Eli Zaretskii
2022-06-27 14:24                     ` Visuwesh
2022-06-27 15:53                       ` Eli Zaretskii
2022-07-02  7:03                         ` Visuwesh
2022-07-16 12:50                           ` Visuwesh
2022-07-16 13:31                             ` Eli Zaretskii
2022-07-16 13:43                               ` Visuwesh
2022-06-26 16:38     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfnqoep4.fsf@gmail.com \
    --to=visuweshm@gmail.com \
    --cc=56237@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).