From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Visuwesh Newsgroups: gmane.emacs.bugs Subject: bug#56237: 29.0.50; delete-forward-char fails to delete character Date: Mon, 27 Jun 2022 11:01:03 +0530 Message-ID: <87sfnqoep4.fsf@gmail.com> References: <87v8sn9zo4.fsf@gmail.com> <83zghz8kk3.fsf@gnu.org> <87mtdz9ysx.fsf@gmail.com> <83y1xj8jqb.fsf@gnu.org> <87fsjr9xs6.fsf@gmail.com> <83v8sn8ir9.fsf@gnu.org> <87bkuf9wx4.fsf@gmail.com> <83tu878hen.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36380"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) Cc: 56237@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jun 27 07:32:11 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o5hM3-0009KV-7W for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Jun 2022 07:32:11 +0200 Original-Received: from localhost ([::1]:50112 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o5hM1-0005A5-Nn for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Jun 2022 01:32:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57778) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o5hLu-00059C-AF for bug-gnu-emacs@gnu.org; Mon, 27 Jun 2022 01:32:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:55603) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o5hLu-0008Bf-11 for bug-gnu-emacs@gnu.org; Mon, 27 Jun 2022 01:32:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1o5hLt-0001Wu-UJ for bug-gnu-emacs@gnu.org; Mon, 27 Jun 2022 01:32:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Visuwesh Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Jun 2022 05:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56237 X-GNU-PR-Package: emacs Original-Received: via spool by 56237-submit@debbugs.gnu.org id=B56237.16563078835835 (code B ref 56237); Mon, 27 Jun 2022 05:32:01 +0000 Original-Received: (at 56237) by debbugs.gnu.org; 27 Jun 2022 05:31:23 +0000 Original-Received: from localhost ([127.0.0.1]:49500 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o5hLG-0001W3-OV for submit@debbugs.gnu.org; Mon, 27 Jun 2022 01:31:23 -0400 Original-Received: from mail-pf1-f193.google.com ([209.85.210.193]:42973) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o5hLE-0001Vm-8o for 56237@debbugs.gnu.org; Mon, 27 Jun 2022 01:31:20 -0400 Original-Received: by mail-pf1-f193.google.com with SMTP id d17so7930226pfq.9 for <56237@debbugs.gnu.org>; Sun, 26 Jun 2022 22:31:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=yy2Ou/REe8ji5xDwESNZ8nX8oZHGqUhZPPUvs4FBn34=; b=oNpxxQdqpaOzuy8VnXfhNdUFXwWP5+6SviSbF+uaJ0CV5y4z7JPV1IQz50yekSPOXW I7i2np9f3+QL3iJz7n1BzsprF7XuMbDyNTEKwelnLaIEw9DZ/3elkP9IcYQPLUa4ccgD gkmrPj6CCnSsX8A7w+fouSATzwtL9DN8ih+53eT+WTNxdz3jzT9aMD+/znAosHwGy0/s NN0iU32OFzg+wtDv+pslsA6kfCEkEKXNU1BQEnOhDy7rWGB3n1D04tSV3D2sDra1JL2b AKTAVwelNgMuPkGnENwZ4w7xsuAmNwfYaGoVNS08kQQgqCBtjPpmFIC6tOq28Wq6Qj0q By9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=yy2Ou/REe8ji5xDwESNZ8nX8oZHGqUhZPPUvs4FBn34=; b=f88KLM1H/F/Cbkb+7WLWRfPSIGicRa4/JVzTvJNmVCBLrb7aPxzwrBrtUgN8F+EEtK yeNFOSd5dr+sNLzH6M+gU6iF7MAxj+YDOQu2ptjdFlXGVzJ6KMxo14Gok2bCD/gs12Ih lB4eem7DZKo1ZzxpykFSacOOiuj+tOtbfxAKrZhUcq/y0xvh3GNK7vW+OToI57v37c49 zzWY5s2fBzHJVdsONJvi8nlPF+v2OHzzFrkKCPNplLF9suKzmhH5UctdwQPmHNiYEh/l rv6rdYDocB4UNW8+w7Ekvt0M7AIlU30HUqcVaCAWWZKbO+Dl4Xa7GGkkZkBOsrf1QUOe sDrg== X-Gm-Message-State: AJIora/JnN26VyGwUS2N9f//ZI6XN7tbePtdfkGqHTj03dF4SNEB3OrO E/7QecAgnWyX1gBHIpLBTYI= X-Google-Smtp-Source: AGRyM1s+ZsP3oKEK9BbFisBzdLDWUTSThgRI7uj94QNL9yKN32C0TOo4o30792OyrSA5YwZxTuEWxQ== X-Received: by 2002:a63:a749:0:b0:40c:57e0:86c0 with SMTP id w9-20020a63a749000000b0040c57e086c0mr11250392pgo.265.1656307874076; Sun, 26 Jun 2022 22:31:14 -0700 (PDT) Original-Received: from localhost ([49.204.143.183]) by smtp.gmail.com with ESMTPSA id a8-20020a656048000000b003db7de758besm6199771pgp.5.2022.06.26.22.31.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Jun 2022 22:31:13 -0700 (PDT) In-Reply-To: <83tu878hen.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 26 Jun 2022 20:26:56 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:235409 Archived-At: [=E0=AE=9E=E0=AE=BE=E0=AE=AF=E0=AE=BF=E0=AE=B1=E0=AF=81 =E0=AE=9C=E0=AF=82= =E0=AE=A9=E0=AF=8D 26, 2022] Eli Zaretskii wrote: >> From: Visuwesh >> Cc: 56237@debbugs.gnu.org >> Date: Sun, 26 Jun 2022 22:36:31 +0530 >>=20 >> > Invoke find-composition, and you will see that it returns a single >> > composition there. >>=20 >> If find-composition is indeed right, then the return value is very >> unintuvitive as a native speaker: =E0=AE=AA=E0=AF=8D and =E0=AE=AA=E0=AF= =8B are two separate characters >> and combining them into a single cluster is weird...=20=20 > > Maybe you are right, but then Someone(TM) will have to either modify > find-composition or explain how to interpret its return value > differently from what we do now. What is now in delete-forward-char > expresses my level of knowledge in this area, which admittedly is > limited. > Turns out that Someone=E2=84=A2 was closer to us than I thought: describe-c= har. With a bit of edebug and reading the code in composition.h (for the LGLYPH_* macros) and defsubst's in composite.el, I think I figured out the logic: We need to call find-composition with a non-nil DETAIL-P argument to get the gstring. The gstring contains the glyphs that will be used to construct the grapheme cluster [1]. According to composition.h, those glyphs which have the same FROM and TO indices are part of the same grapheme cluster so to get the actual length of individual codepoints, we need to calculate the number of glyphs which have an equal FROM and TO indices. Understanding all this, I came up with the following code: (let* ((composition (find-composition 0 nil "=E0=AE=AA=E0=AF=8D=E0=AE= =AA=E0=AF=8B" t)) (gstring (nth 2 composition)) (num-glyphs (lgstring-glyph-len gstring)) (i 1) (from (lglyph-from (lgstring-glyph gstring 0))) (to (lglyph-to (lgstring-glyph gstring 0)))) (while (and (< i num-glyphs) (=3D from (lglyph-from (lgstring-glyph gstring i))) (=3D to (lglyph-to (lgstring-glyph gstring i)))) (setq i (1+ i))) i) here i is the number of characters we need to delete using delete-char. [1] For the gstring format, see composition-get-gstring. But I think we should test this code in cases where a grapheme cluster contains more than two codepoints since all the composed characters in Tamil are made up of two Unicode codepoints. I can't test it on emojis since I don't know of an Emoji font that won't crash potentially Xft and has enough coverage. >> Am I right in thinking that a grapheme cluster is made up of characters >> that can be grouped together to produce a single "letter" on screen? > > The fact that you quote "letter" already means that we have > terminology problem, because I don't think you will be able to define > it rigorously enough for this purpose. > > I don't think we have a definition of a grapheme cluster in Emacs > terms that is always correct, given that these decisions are in many > cases delegated to the shaping engine. > I quoted "letter" because I was thinking of emojis. I should have been more explicit, sorry about that. >> If so, the behaviour of find-composition is still confusing since I >> need to say C-f twice to move over =E0=AE=AA=E0=AF=8D=E0=AE=AA=E0=AF=8B. > > Could be. If it confuses too much, you are free to use delete-char to > delete one codepoint at a time. What delete-forward-char codes is a > convenience feature, so if it is sub-optimal in some rare cases, > that's not a catastrophe, I think. Unfortunately, the places where the current code of delete-forward-char fails are far too frequent to put up with the switch between delete-char and delete-forward-char. =E0=AE=AA=E0=AF=8D=E0=AE=AA=E0=AF=8B is only a si= ngle example, in fact, delete-forward-char fails whenever a cluster which contains a consonant and a virama is followed by another Tamil character. [=E0=AE=9E=E0=AE=BE=E0=AE=AF=E0=AE=BF=E0=AE=B1=E0=AF=81 =E0=AE=9C=E0=AF=82= =E0=AE=A9=E0=AF=8D 26, 2022] Eli Zaretskii wrote: >> If so, the behaviour of find-composition is still confusing since I >> need to say C-f twice to move over =E0=AE=AA=E0=AF=8D=E0=AE=AA=E0=AF=8B. > > Mmm... that gave an idea. Let me see if I can come up with something. It could be a false alarm since the clusters in Tamil are all are made up of two Unicode codepoints.