all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Phillip Susi <phill@thesusis.net>
Cc: 70000@debbugs.gnu.org
Subject: bug#70000: 29.2; Grapheme handling incorrect
Date: Mon, 25 Mar 2024 21:35:24 +0200	[thread overview]
Message-ID: <86cyrije9v.fsf@gnu.org> (raw)
In-Reply-To: <878r26duar.fsf@vps.thesusis.net> (message from Phillip Susi on Mon, 25 Mar 2024 14:45:48 -0400)

tags 70000 notabug
thanks

> From: Phillip Susi <phill@thesusis.net>
> Date: Mon, 25 Mar 2024 14:45:48 -0400
> 
> I had some terminal breakage the other day when browsing email with
> notmuch.  Now a ways down the rabbit hole, it seems this is because
> emacs does not correctly handle graphemes.  I found this article here:
> 
> https://mitchellh.com/writing/grapheme-clusters-in-terminals
> 
> If I paste that gramehe into GUI emacs, it is displayed as two separate
> characters, each two columns wide, instead of the correct way: as a
> single double wide character.

First, the above blog talks about text-mode terminals (a.k.a. "TTYs"),
so it is not relevant to GUI Emacs session.

And second, how that particular sequence of codepoints is displayed on
GUI frames depends on how your Emacs was built.  According to the list
of features included in your report, viz.:

  Configured features:
  ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM JPEG LCMS2 LIBSYSTEMD
  MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND
  THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER XIM GTK3 ZLIB

your Emacs is built without HarfBuzz, which I think explains why your
Emacs displays the above sequences as 2 separate characters.
Furthermore, the appearance depends on the fonts you have installed;
specifically, Emoji sequences need a font that has a good support of
the Emoji Unicode blocks.  In my Emacs, which does use HarfBuzz, I see
a single grapheme cluster.

> C-f and C-b move over the character as if
> it were one, however, backspace deletes only the second, leaving both
> the first and the zero width joiner.  If C-f and C-b treat it as one,
> then so should backspace.

That Backspace deletes a single codepoint is a feature: it allows
easier editing of composable character sequences, such as Emoji.
E.g., imagine you want to make a slight change to the Emoji by
modifying just the second of the two characters composed into a
grapheme cluster.  Emacs supports deletion of the entire grapheme
cluster with the command delete-forward-char, by default bound to the
<Delete> function key.

> Under recent versions of the foot terminal emulator, this character is
> displayed as a single, double wide character, but emacs assumes it still
> is 4 colums wide, leading to terminal breakage.

Emacs cannot know what the terminal does with these characters,
because there's no widely-accepted protocol for accessing that
information.  Different terminal emulators behave differently, and
some even have options to modify their behavior via the various
settings.

> Emacs needs to not assume the width of graphemes are what wcwidth()
> reports, but instead need to query the cursor position after
> printing one to find out how wide the terminal actually dispalyed it
> as.

Querying the cursor position won't help in this case because it is
Emacs that moves the cursor when you type C-f, not the terminal.

I see no Emacs bug here.  Until we have standard ways of querying
text-mode terminals about their processing of composable character
sequences into grapheme clusters, there's no way for Emacs to behave
correctly with all such terminal emulators.  Sorry.





  reply	other threads:[~2024-03-25 19:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 18:45 bug#70000: 29.2; Grapheme handling incorrect Phillip Susi
2024-03-25 19:35 ` Eli Zaretskii [this message]
2024-03-27 14:11   ` Phillip Susi
2024-03-27 17:17     ` Eli Zaretskii
2024-03-28 16:16       ` Phillip Susi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86cyrije9v.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=70000@debbugs.gnu.org \
    --cc=phill@thesusis.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.