From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#70000: 29.2; Grapheme handling incorrect Date: Mon, 25 Mar 2024 21:35:24 +0200 Message-ID: <86cyrije9v.fsf@gnu.org> References: <878r26duar.fsf@vps.thesusis.net> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36851"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 70000@debbugs.gnu.org To: Phillip Susi Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Mar 25 20:36:57 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1roq7t-0009Id-84 for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 25 Mar 2024 20:36:57 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1roq7S-0003Gl-2c; Mon, 25 Mar 2024 15:36:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roq7D-0003Fs-3O for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2024 15:36:16 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1roq71-0001gq-CO for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2024 15:36:08 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1roq70-0004Sv-BH for bug-gnu-emacs@gnu.org; Mon, 25 Mar 2024 15:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 25 Mar 2024 19:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70000 X-GNU-PR-Package: emacs Original-Received: via spool by 70000-submit@debbugs.gnu.org id=B70000.171139533417122 (code B ref 70000); Mon, 25 Mar 2024 19:36:02 +0000 Original-Received: (at 70000) by debbugs.gnu.org; 25 Mar 2024 19:35:34 +0000 Original-Received: from localhost ([127.0.0.1]:36326 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1roq6Y-0004S5-5n for submit@debbugs.gnu.org; Mon, 25 Mar 2024 15:35:34 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46706) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1roq6W-0004Rn-3M; Mon, 25 Mar 2024 15:35:33 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roq6R-0001Vf-Eb; Mon, 25 Mar 2024 15:35:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=uOxBp07R6KwnVrSVYMxQpKNQmjxmW1f7rvO5ks6F4uM=; b=pilbHEyD20y7 fW2KGrDS+2eUVmRfSPU/6bfxUKJ0pah0RNZ3oymn9N+t/Gml7uTlYTTol0MGd3RsCgmMOfyNOTWVo ydRGBUW5tEGjnnp/63omIjDR25MyKYPkBWiO6b2r4RoTepsJiD66ZwO3vBSS0hujM12cNUZFcO70n BrPTHGAiSVS9YlcATi1ppDW6V29rlEECOX5sdj0PeNk0KsDGJhxAE5WEFjg1uUYux0vM3Zp6qegYr 22XTJJjvu+sd6NO6QHZV7gHoOb6wFXkhnNF9ca4xsx6mGCTyJlNq7lBD/gsl4ckJnNOVN2a3rYesk 1L+jRiZY1T0Wxoj8UnwRGg==; In-Reply-To: <878r26duar.fsf@vps.thesusis.net> (message from Phillip Susi on Mon, 25 Mar 2024 14:45:48 -0400) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282072 Archived-At: tags 70000 notabug thanks > From: Phillip Susi > Date: Mon, 25 Mar 2024 14:45:48 -0400 > > I had some terminal breakage the other day when browsing email with > notmuch. Now a ways down the rabbit hole, it seems this is because > emacs does not correctly handle graphemes. I found this article here: > > https://mitchellh.com/writing/grapheme-clusters-in-terminals > > If I paste that gramehe into GUI emacs, it is displayed as two separate > characters, each two columns wide, instead of the correct way: as a > single double wide character. First, the above blog talks about text-mode terminals (a.k.a. "TTYs"), so it is not relevant to GUI Emacs session. And second, how that particular sequence of codepoints is displayed on GUI frames depends on how your Emacs was built. According to the list of features included in your report, viz.: Configured features: ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB your Emacs is built without HarfBuzz, which I think explains why your Emacs displays the above sequences as 2 separate characters. Furthermore, the appearance depends on the fonts you have installed; specifically, Emoji sequences need a font that has a good support of the Emoji Unicode blocks. In my Emacs, which does use HarfBuzz, I see a single grapheme cluster. > C-f and C-b move over the character as if > it were one, however, backspace deletes only the second, leaving both > the first and the zero width joiner. If C-f and C-b treat it as one, > then so should backspace. That Backspace deletes a single codepoint is a feature: it allows easier editing of composable character sequences, such as Emoji. E.g., imagine you want to make a slight change to the Emoji by modifying just the second of the two characters composed into a grapheme cluster. Emacs supports deletion of the entire grapheme cluster with the command delete-forward-char, by default bound to the function key. > Under recent versions of the foot terminal emulator, this character is > displayed as a single, double wide character, but emacs assumes it still > is 4 colums wide, leading to terminal breakage. Emacs cannot know what the terminal does with these characters, because there's no widely-accepted protocol for accessing that information. Different terminal emulators behave differently, and some even have options to modify their behavior via the various settings. > Emacs needs to not assume the width of graphemes are what wcwidth() > reports, but instead need to query the cursor position after > printing one to find out how wide the terminal actually dispalyed it > as. Querying the cursor position won't help in this case because it is Emacs that moves the cursor when you type C-f, not the terminal. I see no Emacs bug here. Until we have standard ways of querying text-mode terminals about their processing of composable character sequences into grapheme clusters, there's no way for Emacs to behave correctly with all such terminal emulators. Sorry.