unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
@ 2021-09-28 14:11 Aura Kelloniemi
  2021-09-28 16:12 ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-09-28 14:11 UTC (permalink / raw)
  To: 50865

Hello,

I'm running emacs in the Linux virtual console. Showing text that contains
Unicode emoji characters with modifiers causes wrong output. The problem may
be related to the fact that Linux basically does not understand Unicode
composition, multi-column charactes or anything fancy. Anyhow, here are the
instructions to reproduce this:

Run emacs -Q

In the scratch buffer on an empty line:
Type C-x 8 RET PERSON WITH FOLDED HANDS RET
🙏 appears followed by the cursor. This is what I expect. (In the TTY screen
only a black diamond is shown, but I can verify from /de/vcsu that the
character really is an emoji.)

Then type C-x 8 RET EMOJI MODIFIER FITZPATRICK TYPE-3 RET
Now the emoji modifier appears (again as a diamond), but it is followed by a
space character.

So now the line looks like:
<PERSON WITH FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <SPACE>
_cursor_

I think that the space there should not be added. My guess is that the space
is actually not written to the terminal, but that emacs misplaces the cursor.

Now if a press backspace to delete the last typed character (the emoji
modifier), only the space disappears. If I then press backspace again, both the
emoji and the modifier disapper at the same time.

Interestingly, if I repeat the process of adding the above mentioned emoji and
the modifier characters on a line two times (starting again from an empty
line), the line looks like this:

<PERSON WITH FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <SPACE> <PERSON
WITH FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <SPACE> _cursor_

If I now run M-x redraw-display RET, the line looks like this:

<PERSON WITH FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <PERSON WITH
FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <SPACE> <SPACE> <SPACE>
_cursor_

So for some reason there are now three spaces before the cursor.

I tried this on Debian version of Emacs 26.1, and the results were similar,
but not exactly identical.

Why do I care? If I use a Linux console, and it cannot display emoji, why does
this matter? Because if there is any other text on the same line, it often
gets very garbled, especially if Emacs decides to only update the line
partially.

I cannot stop other people from using emojis nowadays, and that's why I'd like
Emacs to tolerate them.

I will gladly provide more detail. I'm also interested in any (dirty) hacks
that could be used to work around this issue, as it disturbs my emacs usage
all the time (I use Telegram from within Emacs).

If I run emacs within GNU screen (that itself runs in a Linux VT), this
problem does not seem to appear. There are other issues with some Unicode
characters in screen, but I haven't yet found a clear way to reproduce these
issues.

Below is the data produced by M-x report-emacs-bug RET






In GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.17.4)
 of 2021-09-26 built on solaria
Repository revision: 43ae8c828d853382bbc2a27b9e14b9fff6ba18b6
Repository branch: makepkg
System Description: Arch Linux

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --with-native-compilation --with-x-toolkit=gtk3
 --with-xft --with-wide-int --with-modules --with-gameuser=:games
 --with-sound=alsa --with-cairo --with-harfbuzz
 --enable-link-time-optimization 'CFLAGS=-march=native -mtune=native -O2 -pipe
 -fno-plt -fuse-ld=gold -flto -fuse-ld=gold -flto'
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now
 CPPFLAGS=-D_FORTIFY_SOURCE=2'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM
XPM GTK3 ZLIB

Important settings:
  value of $LANG: fi_FI.UTF-8
  locale-coding-system: utf-8-unix

Major mode: ELisp/d

Minor modes in effect:
  gpm-mouse-mode: t
  leaf-key-override-global-mode: t
  shell-dirtrack-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  icomplete-mode: t
  desktop-save-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: linux
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t

Load-path shadows:
/home/aura/.config/emacs/elpa/magit-20210925.1143/magit-section-pkg hides /home/aura/.config/emacs/elpa/magit-section-20210829.1849/magit-section-pkg
/home/aura/.config/emacs/elpa/transient-20210919.1006/transient hides /usr/share/emacs/28.0.50/lisp/transient

Features:
(shadow sort mail-extr emacsbug sendmail cursor-sensor t-mouse term/linux
desktop+ haskell-mode haskell-cabal haskell-utils haskell-font-lock
haskell-indentation haskell-string haskell-sort-imports haskell-lexeme
haskell-align-imports haskell-complete-module haskell-ghc-support flymake-proc
flymake warnings dabbrev haskell-customize python tramp-sh autorevert
filenotify conf-mode cargo cargo-process markdown-mode color racer f s dash
company-oddmuse company-keywords company-etags etags fileloop xref project
company-gtags company-dabbrev-code company-dabbrev company-files company-clang
company-capf company-cmake company-semantic company-template company-bbdb
vc-git vc-dispatcher rust-utils rust-mode rx rust-rustfmt rust-playpen
rust-compile compile rust-cargo yaml-mode org-element avl-tree generator
ol-eww eww xdg url-queue mm-url ol-rmail ol-mhe ol-irc ol-info ol-gnus
nnselect gnus-search eieio-opt cl-extra help-mode speedbar ezimage dframe
gnus-art mm-uu mml2015 gnus-sum shr kinsoku svg dom gnus-group gnus-undo
gnus-start gnus-dbus dbus xml gnus-cloud nnimap nnmail mail-source utf7 netrc
nnoo gnus-spec gnus-int gnus-range gnus-win gnus nnheader ol-docview doc-view
jka-compr image-mode exif ol-bibtex bibtex ol-bbdb ol-w3m org ob ob-tangle
ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src ob-comint
org-pcomplete org-list org-faces org-entities noutline outline org-version
ob-emacs-lisp ob-core ob-eval org-table ol org-keys org-compat advice org-macs
org-loaddefs find-func recentf tree-widget notmuch notmuch-tree notmuch-jump
notmuch-hello notmuch-show notmuch-print notmuch-crypto notmuch-mua
notmuch-message notmuch-draft notmuch-maildir-fcc notmuch-address
notmuch-company notmuch-parser notmuch-wash diff-mode easy-mmode coolj
notmuch-query goto-addr thingatpt icalendar diary-lib diary-loaddefs cal-menu
calendar cal-loaddefs notmuch-tag crm notmuch-lib notmuch-version
notmuch-compat hl-line message rmc puny dired dired-loaddefs rfc822 mml
mailabbrev gmm-utils mailheader mm-view mml-smime mml-sec epa derived epg
rfc6068 epg-config gnus-util rmail rmail-loaddefs mail-utils
text-property-search smime dig mm-decode mm-bodies mm-encode mail-parse
rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr server leaf-keywords
leaf finder-inf package browse-url url url-proxy url-privacy url-expand
url-methods url-history url-cookie url-domsuf url-util mailcap url-handlers
url-parse url-vars cus-edit pp wid-edit tramp tramp-loaddefs trampver
tramp-integration files-x tramp-compat shell pcomplete comint ansi-color ring
parse-time iso8601 time-date ls-lisp format-spec auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map savehist
minibuf-eldef keypad ido seq byte-opt gv bytecomp byte-compile cconv icomplete
company edmacro kmacro pcase subr-x desktop frameset cl-loaddefs cl-lib
cus-load info iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice button loaddefs
faces cus-face macroexp files window text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable backquote
threads dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting cairo move-toolbar gtk x-toolkit x multi-tty
make-network-process native-compile emacs)

Memory information:
((conses 16 541107 17966)
 (symbols 48 48459 2)
 (strings 32 220875 3286)
 (string-bytes 1 6213483)
 (vectors 16 57360)
 (vector-slots 8 1387450 51457)
 (floats 8 484 304)
 (intervals 56 910 2)
 (buffers 992 39))

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 14:11 bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display Aura Kelloniemi
@ 2021-09-28 16:12 ` Eli Zaretskii
  2021-09-28 16:54   ` Aura Kelloniemi
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-28 16:12 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Date: Tue, 28 Sep 2021 17:11:22 +0300
> 
> I'm running emacs in the Linux virtual console. Showing text that contains
> Unicode emoji characters with modifiers causes wrong output. The problem may
> be related to the fact that Linux basically does not understand Unicode
> composition, multi-column charactes or anything fancy. Anyhow, here are the
> instructions to reproduce this:
> 
> Run emacs -Q
> 
> In the scratch buffer on an empty line:
> Type C-x 8 RET PERSON WITH FOLDED HANDS RET
> 🙏 appears followed by the cursor. This is what I expect. (In the TTY screen
> only a black diamond is shown, but I can verify from /de/vcsu that the
> character really is an emoji.)
> 
> Then type C-x 8 RET EMOJI MODIFIER FITZPATRICK TYPE-3 RET
> Now the emoji modifier appears (again as a diamond), but it is followed by a
> space character.
> 
> So now the line looks like:
> <PERSON WITH FOLDED HANDS> <EMOJI MODIFIER FITZPATRICK TYPE-3> <SPACE>
> _cursor_

If your terminal doesn't understand character composition, the best
solution for you is to turn off auto-composition-mode when using Emacs
on that terminal.  Please try that and tell if that resolves the issue
for you: type "M-x auto-composition-mode RET" and make sure Emacs says
that the mode is disabled.

Thanks.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 16:12 ` Eli Zaretskii
@ 2021-09-28 16:54   ` Aura Kelloniemi
  2021-09-28 17:21     ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-09-28 16:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

Hi,

On 2021-09-28 at 19:12 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > If your terminal doesn't understand character composition, the best
 > solution for you is to turn off auto-composition-mode when using Emacs
 > on that terminal.  Please try that and tell if that resolves the issue
 > for you: type "M-x auto-composition-mode RET" and make sure Emacs says
 > that the mode is disabled.

Unfortunately it does not seem to do anything. The problem stays exactly as I
described.

Running again with emacs -Q, I tried both turning off the mode locally and
globally. I got the message that the mode was disabled.

For some reason, when I re-enable auto-composition-mode, the output appears
correct (no extra spaces at the end of line) for a while, but if I start to
add characters to the line containing the emojis, the display gets garbled
again.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 16:54   ` Aura Kelloniemi
@ 2021-09-28 17:21     ` Eli Zaretskii
  2021-09-28 17:41       ` Aura Kelloniemi
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-28 17:21 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Tue, 28 Sep 2021 19:54:29 +0300
> 
>  > If your terminal doesn't understand character composition, the best
>  > solution for you is to turn off auto-composition-mode when using Emacs
>  > on that terminal.  Please try that and tell if that resolves the issue
>  > for you: type "M-x auto-composition-mode RET" and make sure Emacs says
>  > that the mode is disabled.
> 
> Unfortunately it does not seem to do anything. The problem stays exactly as I
> described.

What is your terminal's encoding? what does the following show in the
echo-area?

  M-: (terminal-coding-system) RET

> For some reason, when I re-enable auto-composition-mode, the output appears
> correct (no extra spaces at the end of line) for a while, but if I start to
> add characters to the line containing the emojis, the display gets garbled
> again.

Does typing the below solve the problem?

  M-: (set-char-table-range char-width-table '(#x1f600 . #x1f64f) 1) RET

(You will need to redraw the display after evaluating it, e.g. with
"C-l" or "M-x redraw-display RET".)





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 17:21     ` Eli Zaretskii
@ 2021-09-28 17:41       ` Aura Kelloniemi
  2021-09-28 18:35         ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-09-28 17:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

Hi,

On 2021-09-28 at 20:21 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > What is your terminal's encoding? what does the following show in the
 > echo-area?

utf-8-unix

And the Linux console is properly in Unicode mode. (Which means Unicode as it
was understood in 2002 or so.)

 > Does typing the below solve the problem?

 >   M-: (set-char-table-range char-width-table '(#x1f600 . #x1f64f) 1) RET

No, not alone, but when I tried

(set-char-table-range char-width-table '(#x1f300 . #x1f64f) 1)

the problem is gone. Using the same trick to force some other characters (e.g.
TIMER CLOCK) to width 1, I was able to get rid of the other problem that I
mentioned, but could not track.

Now the question is: should this be added to the linux terminal setup in
Emacs, and for what character range. If I am right, all Unicode code points
above 0x20 print just one character to the terminal, except if they don't
print anything. Probably there is no other documentation than the kernel
source – or at least the documentation is outdated. Linux console is kind of
deprecated, but it is still sort of maintained as there is no alternative
being developed.

Is there a way to get a list of code points that Emacs thinks have a width of
something else than 1?

Thank you for helping me!

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 17:41       ` Aura Kelloniemi
@ 2021-09-28 18:35         ` Eli Zaretskii
  2021-09-28 19:20           ` Aura Kelloniemi
  2021-09-28 20:32           ` Aura Kelloniemi
  0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-28 18:35 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Tue, 28 Sep 2021 20:41:36 +0300
> 
> utf-8-unix
> 
> And the Linux console is properly in Unicode mode. (Which means Unicode as it
> was understood in 2002 or so.)

Which could be too long ago?  Are you saying that the Linux terminal
doesn't understand Unicode beyond the year 2002?  That could explain a
lot.

>  > Does typing the below solve the problem?
> 
>  >   M-: (set-char-table-range char-width-table '(#x1f600 . #x1f64f) 1) RET
> 
> No, not alone, but when I tried
> 
> (set-char-table-range char-width-table '(#x1f300 . #x1f64f) 1)
> 
> the problem is gone. Using the same trick to force some other characters (e.g.
> TIMER CLOCK) to width 1, I was able to get rid of the other problem that I
> mentioned, but could not track.

So it sounds like your terminal cannot handle double-width
characters.  Those "space characters" you see are padding glyphs
output by Emacs when it displays a double-width character.  On my
terminal emulator, the results are satisfactory, and I see no
artifacts.  Are you sure the spaces you saw aren't just visual
surprises, and otherwise don't present any real problems?  If they do
present real problems, can you describe them in more detail, including
the exact sequence of characters you typed for that?

> Now the question is: should this be added to the linux terminal setup in
> Emacs, and for what character range.

No.  That is just a kludgey workaround for some problem I don't yet
understand well enough.

> If I am right, all Unicode code points above 0x20 print just one
> character to the terminal, except if they don't print anything.

That should not be that way.  Some characters are double-width, and
should take up 2 columns on display.

> Probably there is no other documentation than the kernel source – or
> at least the documentation is outdated. Linux console is kind of
> deprecated, but it is still sort of maintained as there is no
> alternative being developed.

Perhaps you should take this up with the developers, then.  But I'd
like to understand better what display problems you saw originally,
because all I read there now is that you saw those extra spaces.

> Is there a way to get a list of code points that Emacs thinks have a width of
> something else than 1?

You can use map-char-table to display all characters that have width
of 2 columns.  Or you can look in lisp/international/characters.el,
around line 1250, where double-width characters are listed.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 18:35         ` Eli Zaretskii
@ 2021-09-28 19:20           ` Aura Kelloniemi
  2021-09-28 20:32           ` Aura Kelloniemi
  1 sibling, 0 replies; 29+ messages in thread
From: Aura Kelloniemi @ 2021-09-28 19:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

On 2021-09-28 at 21:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > > From: Aura Kelloniemi <kaura.dev@sange.fi>
 > > Cc: 50865@debbugs.gnu.org
 > > Date: Tue, 28 Sep 2021 20:41:36 +0300
 > > 
 > > utf-8-unix
 > > 
 > > And the Linux console is properly in Unicode mode. (Which means Unicode as it
 > > was understood in 2002 or so.)

 > Which could be too long ago?  Are you saying that the Linux terminal
 > doesn't understand Unicode beyond the year 2002?  That could explain a
 > lot.

More or less. The code point range has been extended to allow for code points
outside of the 16-bit range. Otherwise I haven't seen much development. The
console font can contain at most 512 glyphs, which is a huge limitation.

 > So it sounds like your terminal cannot handle double-width
 > characters.

Yes, that is correct. Sorry, I probably should have said this earlier. The
Linux console does not support extended-width characters.

This is not a problem just with Emacs, but every single program/library that
supports double-width characters, e.g. readline.

 > Those "space characters" you see are padding glyphs
 > output by Emacs when it displays a double-width character.  On my
 > terminal emulator, the results are satisfactory, and I see no
 > artifacts.  Are you sure the spaces you saw aren't just visual
 > surprises, and otherwise don't present any real problems?  If they do
 > present real problems, can you describe them in more detail, including
 > the exact sequence of characters you typed for that?

Cursor movement gets messed up. If I type:

<PERSON WITH FOLDED HANDS> <DIGIT ONE>

the display is correct:

<PERSON WITH FOLDED HANDS> <DIGIT ONE> _cursor_

If I now do C-l, the display looks like:

<PERSON WITH FOLDED HANDS> <DIGIT ONE> <SPACE> _cursor_

If I type BackSpace to delete the digit 1, only the space disappears, so the
display looks like:

<PERSON WITH FOLDED HANDS> <DIGIT ONE> _cursor_

But really the buffer contains only the emoji.

If I now press BackSpace again, both the emoji and the digit disappear, and
the line becomes empty.

If I use arrow keys to move around in the buffer, the cursor is moved by two
columns every time the point moves over a double-width character. But because
the terminal does not show the double-width character as double-wide, the
cursor placement is off by the number of double-width characters on the left
side of the point.

 > > Probably there is no other documentation than the kernel source – or
 > > at least the documentation is outdated. Linux console is kind of
 > > deprecated, but it is still sort of maintained as there is no
 > > alternative being developed.

 > Perhaps you should take this up with the developers, then.  But I'd
 > like to understand better what display problems you saw originally,
 > because all I read there now is that you saw those extra spaces.

I could, but many have done this already. The situation is such that it would
be easier to rewrite the whole console driver from scratch than to try to
extend it with more features. There does not seem to be interest for doing it.

Mostly people use Linux console for emergency maintainance, or then they are
blind and cannot use the graphical desktop, because the accessibility
technology does not support this. I happen to use Linux VT's for both of these
reasons.

I took a look at Emacs's term/linux.el.gz. It sets auto-composition-mode to
"linux". I don't know what this special value does.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 18:35         ` Eli Zaretskii
  2021-09-28 19:20           ` Aura Kelloniemi
@ 2021-09-28 20:32           ` Aura Kelloniemi
  2021-09-29 13:00             ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-09-28 20:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

On 2021-09-28 at 21:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > That should not be that way.  Some characters are double-width, and
 > should take up 2 columns on display.

I noticed, that Linux console does not understand most of the zero-width
characters either. It happily prints most of the code points in the list of
zero-width characters. Of course they are printed just as diamonds, because
Linux cannot store enough glyphs in its 512-glyph font space, but anyway it
prints a diamond for such characters as <COMBINING GRAVE ACCENT>.

The character range \y200B-\u200F seems to be an exception here. When I try to
print one of these characters on a Linxu VT, it really prints nothing.

When I insert zero-width characters in Emacs, the diamonds representing the
characters are printed interspersed by the padding spaces added by emacs. The
cursor is left behind the extending line of characters as a type, because
Emacs thinks, that the zero-width characters really do not print anything,
even though they do.

I believe that the one viable solution is to make char-width-table a terminal
local variable, so that there can be a simplified version for terminals that
don't understand Unicode correctly.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-28 20:32           ` Aura Kelloniemi
@ 2021-09-29 13:00             ` Eli Zaretskii
  2021-10-01 13:23               ` Aura Kelloniemi
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-09-29 13:00 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Tue, 28 Sep 2021 23:32:53 +0300
> 
> On 2021-09-28 at 21:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
>  > That should not be that way.  Some characters are double-width, and
>  > should take up 2 columns on display.
> 
> I noticed, that Linux console does not understand most of the zero-width
> characters either.

It doesn't need to: Emacs displays those characters on a TTY as
spaces.

> It happily prints most of the code points in the list of
> zero-width characters. Of course they are printed just as diamonds, because
> Linux cannot store enough glyphs in its 512-glyph font space, but anyway it
> prints a diamond for such characters as <COMBINING GRAVE ACCENT>.

COMBINING GRAVE ACCENT (or any other combining codepoint) is not a
good example of zero-width characters.  Try "C-x 8 RET 200c RET"
instead.  Or FEFF or 1D173 or E007f or 1BCA0.

> The character range \y200B-\u200F seems to be an exception here. When I try to
> print one of these characters on a Linxu VT, it really prints nothing.

That's not exception, that's the rule, actually, for true zero-width
characters, not for accents.  Accents exist to combine with preceding
base character, and what you seem to describe means the Linux console
is unable to do even Latin accents?

> When I insert zero-width characters in Emacs, the diamonds representing the
> characters are printed interspersed by the padding spaces added by emacs. The
> cursor is left behind the extending line of characters as a type, because
> Emacs thinks, that the zero-width characters really do not print anything,
> even though they do.

Is this with or without auto-composition-mode?

> I believe that the one viable solution is to make char-width-table a terminal
> local variable, so that there can be a simplified version for terminals that
> don't understand Unicode correctly.

That would affect much more than display, because Emacs consults that
table for other purposes.  We need something limited to display alone.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-09-29 13:00             ` Eli Zaretskii
@ 2021-10-01 13:23               ` Aura Kelloniemi
  2021-10-01 13:35                 ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-10-01 13:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

On 2021-09-29 at 16:00 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > > From: Aura Kelloniemi <kaura.dev@sange.fi>
 > > Cc: 50865@debbugs.gnu.org
 > > Date: Tue, 28 Sep 2021 23:32:53 +0300
 > > 
 > > On 2021-09-28 at 21:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > >  > That should not be that way.  Some characters are double-width, and
 > >  > should take up 2 columns on display.
 > > 
 > > I noticed, that Linux console does not understand most of the zero-width
 > > characters either.

 > It doesn't need to: Emacs displays those characters on a TTY as
 > spaces.

Can this be configured – i.e. can I change the space to something else to ease
debugging?

 > > It happily prints most of the code points in the list of
 > > zero-width characters. Of course they are printed just as diamonds, because
 > > Linux cannot store enough glyphs in its 512-glyph font space, but anyway it
 > > prints a diamond for such characters as <COMBINING GRAVE ACCENT>.

 > COMBINING GRAVE ACCENT (or any other combining codepoint) is not a
 > good example of zero-width characters.

On modern terminal emulators this certainly holds, but Linux is not a modern
terminal emulator and does not support combining characters. It just prints a
diamond for all codepoitns which don't have an associated glyph in the font
(or the kernel knows them to be zero-wide, and this information is out of
date).

 > Try "C-x 8 RET 200c RET"
 > instead.

 > Or FEFF
 > or 1D173 or E007f or 1BCA0.

They print just a single space within emacs. If I print them with echo,
they print a diamond.

 > > The character range \y200B-\u200F seems to be an exception here. When I try to
 > > print one of these characters on a Linxu VT, it really prints nothing.

 > That's not exception, that's the rule, actually, for true zero-width
 > characters, not for accents.  Accents exist to combine with preceding
 > base character, and what you seem to describe means the Linux console
 > is unable to do even Latin accents?

Here is a sample Bash session for demonstration:
$ echo $'i\u300'
i◈
$ echo $'\uEC'
ì

 > > When I insert zero-width characters in Emacs, the diamonds representing the
 > > characters are printed interspersed by the padding spaces added by emacs. The
 > > cursor is left behind the extending line of characters as a type, because
 > > Emacs thinks, that the zero-width characters really do not print anything,
 > > even though they do.

 > Is this with or without auto-composition-mode?

Ok, this was with auto-composition-mode set to t. And it only happens with
combining characters. Other zero-wide characters print the single space, as
should be.

If I set auto-composition-mode to nil, then Emacs does not print anything (not
even the space) when I insert a combining character. If I then move the point
over the invisible combining character, the point moves, but the screen cursor
does not. This is a very confusing behaviour.

Non-combining zero-wide characters print the space (as you said), and there
are no cursor movement issues.

When running in the Linux console emacs's term/linux.el sets
auto-composition-mode to a special value of "linux". I don't know what this
means.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 13:23               ` Aura Kelloniemi
@ 2021-10-01 13:35                 ` Eli Zaretskii
  2021-10-01 14:33                   ` Aura Kelloniemi
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-01 13:35 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Fri, 01 Oct 2021 16:23:01 +0300
> 
>  > > I noticed, that Linux console does not understand most of the zero-width
>  > > characters either.
> 
>  > It doesn't need to: Emacs displays those characters on a TTY as
>  > spaces.
> 
> Can this be configured – i.e. can I change the space to something else to ease
> debugging?

Yes, see glyphless-char-display-control.

>  > That's not exception, that's the rule, actually, for true zero-width
>  > characters, not for accents.  Accents exist to combine with preceding
>  > base character, and what you seem to describe means the Linux console
>  > is unable to do even Latin accents?
> 
> Here is a sample Bash session for demonstration:
> $ echo $'i\u300'
> i◈
> $ echo $'\uEC'
> ì

Ouch!  What a terrible misfeature!

> If I set auto-composition-mode to nil, then Emacs does not print anything (not
> even the space) when I insert a combining character. If I then move the point
> over the invisible combining character, the point moves, but the screen cursor
> does not. This is a very confusing behaviour.

I think it's expected for accents.

> When running in the Linux console emacs's term/linux.el sets
> auto-composition-mode to a special value of "linux". I don't know what this
> means.

That is explained in the doc string of auto-composition-mode.

Does term/linux.el get loaded when you run Emacs on that terminal?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 13:35                 ` Eli Zaretskii
@ 2021-10-01 14:33                   ` Aura Kelloniemi
  2021-10-01 15:41                     ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-10-01 14:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

On 2021-10-01 at 16:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > > From: Aura Kelloniemi <kaura.dev@sange.fi>
 > > Cc: 50865@debbugs.gnu.org
 > > Date: Fri, 01 Oct 2021 16:23:01 +0300
[--]
 > > Here is a sample Bash session for demonstration:
 > > $ echo $'i\u300'
 > > i◈
 > > $ echo $'\uEC'
 > > ì

 > Ouch!  What a terrible misfeature!

Yes, but of course unintentional.

Variation selectors work in a similar way. <HEAVY BLACK HEART> <VARIATION
SELECTOR-16> prints two diamonds which is two 1-column wide characters.

There has been a recent change in emacs after which it no more prints the
variation selectors. This is probably related to the addition of
glyphless-char-display-control.

Anyway, with Linux it is safest to think the console as an old text-mode VGA
display which has been extended to support more codepoitns.

 > Does term/linux.el get loaded when you run Emacs on that terminal?

Yes. GNU/Linux distros are configured to set TERM=linux at latest in the
console getty processes.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 14:33                   ` Aura Kelloniemi
@ 2021-10-01 15:41                     ` Eli Zaretskii
  2021-10-01 16:02                       ` Aura Kelloniemi
  2021-10-02  8:11                       ` Lars Ingebrigtsen
  0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-01 15:41 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Fri, 01 Oct 2021 17:33:19 +0300
> 
>  > Does term/linux.el get loaded when you run Emacs on that terminal?
> 
> Yes. GNU/Linux distros are configured to set TERM=linux at latest in the
> console getty processes.

Then by default auto-composition-mode should be disabled on that
console.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 15:41                     ` Eli Zaretskii
@ 2021-10-01 16:02                       ` Aura Kelloniemi
  2021-10-01 17:48                         ` Eli Zaretskii
  2021-10-02  8:11                       ` Lars Ingebrigtsen
  1 sibling, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-10-01 16:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

On 2021-10-01 at 18:41 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
Hi,

 > Then by default auto-composition-mode should be disabled on that
 > console.

Yes, sure. What about Linux's lack of support for 2-column wide characters?

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 16:02                       ` Aura Kelloniemi
@ 2021-10-01 17:48                         ` Eli Zaretskii
  2021-10-02 10:58                           ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-01 17:48 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Fri, 01 Oct 2021 19:02:53 +0300
> 
> On 2021-10-01 at 18:41 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
> Hi,
> 
>  > Then by default auto-composition-mode should be disabled on that
>  > console.
> 
> Yes, sure. What about Linux's lack of support for 2-column wide characters?

Are you sure they don't? what do the developers say about that?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 15:41                     ` Eli Zaretskii
  2021-10-01 16:02                       ` Aura Kelloniemi
@ 2021-10-02  8:11                       ` Lars Ingebrigtsen
  2021-10-02  8:54                         ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-02  8:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865, Aura Kelloniemi

Eli Zaretskii <eliz@gnu.org> writes:

>> Yes. GNU/Linux distros are configured to set TERM=linux at latest in the
>> console getty processes.
>
> Then by default auto-composition-mode should be disabled on that
> console.

I thought this was the case in Emacs 28?

termp/linux.el has

  ;; Compositions confuse cursor movement.
  (setq-default auto-composition-mode "linux")

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-02  8:11                       ` Lars Ingebrigtsen
@ 2021-10-02  8:54                         ` Eli Zaretskii
  0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-02  8:54 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50865, kaura.dev

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Aura Kelloniemi <kaura.dev@sange.fi>,  50865@debbugs.gnu.org
> Date: Sat, 02 Oct 2021 10:11:52 +0200
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Yes. GNU/Linux distros are configured to set TERM=linux at latest in the
> >> console getty processes.
> >
> > Then by default auto-composition-mode should be disabled on that
> > console.
> 
> I thought this was the case in Emacs 28?
> 
> termp/linux.el has
> 
>   ;; Compositions confuse cursor movement.
>   (setq-default auto-composition-mode "linux")

Yes, that's what I said.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-01 17:48                         ` Eli Zaretskii
@ 2021-10-02 10:58                           ` Eli Zaretskii
  2021-10-02 11:21                             ` Andreas Schwab
  2021-10-04 12:25                             ` Aura Kelloniemi
  0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-02 10:58 UTC (permalink / raw)
  To: kaura.dev; +Cc: 50865

> Resent-From: Eli Zaretskii <eliz@gnu.org>
> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
> Resent-CC: bug-gnu-emacs@gnu.org
> Resent-Sender: help-debbugs@gnu.org
> Date: Fri, 01 Oct 2021 20:48:32 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 50865@debbugs.gnu.org
> 
> > From: Aura Kelloniemi <kaura.dev@sange.fi>
> > Cc: 50865@debbugs.gnu.org
> > Date: Fri, 01 Oct 2021 19:02:53 +0300
> > 
> > On 2021-10-01 at 18:41 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
> > Hi,
> > 
> >  > Then by default auto-composition-mode should be disabled on that
> >  > console.
> > 
> > Yes, sure. What about Linux's lack of support for 2-column wide characters?
> 
> Are you sure they don't? what do the developers say about that?

If indeed the Linux console doesn't support double-width characters,
or at least enough of them to cause trouble with Emacs display, my
suggestion would be to use this setting:

  M-x set-terminal-coding-system RET latin-1 RET

This will display characters outside the Latin-1 range as \uNNNN or
\U0nnnnn (depending on the codepoint), with an underline attribute to
make it easier to tell where the character's code ends and the
following text begins (in case it begins with a digit).  This should
allow you to read the rest of the text without messing up the display.
I don't really see a better solution for such problematic terminals.
Emacs relies on the terminal to display characters correctly, using 2
columns (with padding by empty space) when the character is
double-width.  If the terminal doesn't live up to these expectations,
the display will become garbled.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-02 10:58                           ` Eli Zaretskii
@ 2021-10-02 11:21                             ` Andreas Schwab
  2021-10-02 11:56                               ` Eli Zaretskii
  2021-10-04 12:25                             ` Aura Kelloniemi
  1 sibling, 1 reply; 29+ messages in thread
From: Andreas Schwab @ 2021-10-02 11:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865, kaura.dev

On Okt 02 2021, Eli Zaretskii wrote:

> If indeed the Linux console doesn't support double-width characters,
> or at least enough of them to cause trouble with Emacs display, my
> suggestion would be to use this setting:
>
>   M-x set-terminal-coding-system RET latin-1 RET

How can that work?  The terminal's encoding is utf-8, not latin-1.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-02 11:21                             ` Andreas Schwab
@ 2021-10-02 11:56                               ` Eli Zaretskii
  0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-02 11:56 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 50865, kaura.dev

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: kaura.dev@sange.fi,  50865@debbugs.gnu.org
> Date: Sat, 02 Oct 2021 13:21:21 +0200
> 
> On Okt 02 2021, Eli Zaretskii wrote:
> 
> > If indeed the Linux console doesn't support double-width characters,
> > or at least enough of them to cause trouble with Emacs display, my
> > suggestion would be to use this setting:
> >
> >   M-x set-terminal-coding-system RET latin-1 RET
> 
> How can that work?  The terminal's encoding is utf-8, not latin-1.

Then perhaps us-ascii is the best we can do in that case.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-02 10:58                           ` Eli Zaretskii
  2021-10-02 11:21                             ` Andreas Schwab
@ 2021-10-04 12:25                             ` Aura Kelloniemi
  2021-10-04 13:15                               ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-10-04 12:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

Hi,

On 2021-10-02 at 13:58 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > > Are you sure they don't? what do the developers say about that?

I am actually a bit confused about the fact that Linux console doesn't seem to
be well known on this list. I am not blaming, just wondering. I would think it
would be very easy for all GNU/Linux users to reproduce this bug any time.

Anyhow, here I provide a proof that Linux really does not understand
two-column characters.

This is again a Bash session in a bare Linu console:

$ echo $'ab\U0001F64Fxy\rabc'
abcxy

This prints letters a and b followed by a wide emoji, followed by letters x
and y. Then it moves the cursor back to the beginning of line with \r and
writes letters a b and c. These should override the first two letters and the
first half of the emoji. This leaves the letters x and y in tact.

But as you see, the c letter here overrides the whole emoji. If the emoji
really was wide, then the output would be

$ echo $'ab\U0001F64Fxy\rabc'
abc xy

Here the space represents the right half of the broken emoji. This later
example is run in a VTE-based terminal that supports Unicode properly.

 > If indeed the Linux console doesn't support double-width characters,
 > or at least enough of them to cause trouble with Emacs display, my
 > suggestion would be to use this setting:

 >   M-x set-terminal-coding-system RET latin-1 RET

As Andreas pointed out, this would not work. Using only ASCII would be a
horrible regression. My native language uses many letters outside the ascii
range. Nowadays even programming becomes difficult without Unicode. This is
not a feasible solution.

 > This will display characters outside the Latin-1 range as \uNNNN or
 > \U0nnnnn (depending on the codepoint), with an underline attribute to
 > make it easier to tell where the character's code ends and the
 > following text begins (in case it begins with a digit).

Linux console does not support the underline attribute. See man 4
console_codes. It talks about simulating the attributes.

 > This should allow you to read the rest of the text without messing up the
 > display. I don't really see a better solution for such problematic
 > terminals.

The solution of modifying char-width-table at least worked very well for me.
Of course I am intetrested in the things that will break, if I use it, but
most likely those will be smaller annoyances than a garbled display.

I can document this hack on emacs wiki, if nothing else can be done.

 > Emacs relies on the terminal to display characters correctly, using 2
 > columns (with padding by empty space) when the character is
 > double-width.  If the terminal doesn't live up to these expectations,
 > the display will become garbled.

Couldn't emacs add a padding space after every two-column character. This
would fix the alignment/garbling issues altogether. This setting could be
controlled by a terminal-local variable and it could be automatically set for
terminals that don't support multi-column characters.

Emacs already kind of adds a padding space if I type characters one at a time
(because it repositions the cursor after every command), but this does not
happen if the text is sent to the terminal in a batch (e.g. when drawing the
contents of a buffer, or when doing a redraw).

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-04 12:25                             ` Aura Kelloniemi
@ 2021-10-04 13:15                               ` Eli Zaretskii
  2021-10-04 16:35                                 ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-04 13:15 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Mon, 04 Oct 2021 15:25:23 +0300
> 
> The solution of modifying char-width-table at least worked very well for me.
> Of course I am intetrested in the things that will break, if I use it, but
> most likely those will be smaller annoyances than a garbled display.
> 
> I can document this hack on emacs wiki, if nothing else can be done.

I don't recommend documenting such a "solution", because
char-width-table affects more than just the display of wide
characters, it also affects Lisp programs that use string-width and
similar functions.

>  > Emacs relies on the terminal to display characters correctly, using 2
>  > columns (with padding by empty space) when the character is
>  > double-width.  If the terminal doesn't live up to these expectations,
>  > the display will become garbled.
> 
> Couldn't emacs add a padding space after every two-column character.

It could, but for which characters and under what conditions?  Who can
produce a full comprehensive description of the problems related to
character width that are inherent in the Linux console?

Anyway, patches to cater to the Linux console will be welcome, if
someone can come up with a method to DTRT.  The problem is that the
changes will need to be on the C level, where we currently simply use
fwrite to output the (UTF-8) encoded text to the device.  Padding
would mean we'd need to write it character by character, or introduce
some logic that looks for the problematic characters and outputs them
specially.

> Emacs already kind of adds a padding space if I type characters one at a time
> (because it repositions the cursor after every command)

No, Emacs doesn't add any padding when it writes to the terminal, at
least AFAICS.  It simply relies on the terminal to produce a 2-column
glyph for a wide character, and positions the cursor accordingly.
Positioning the cursor doesn't in general write a space to the device,
it just outputs a terminfo sequence for cursor addressing.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-04 13:15                               ` Eli Zaretskii
@ 2021-10-04 16:35                                 ` Eli Zaretskii
  2021-10-04 16:51                                   ` Aura Kelloniemi
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-04 16:35 UTC (permalink / raw)
  To: kaura.dev; +Cc: 50865

> Date: Mon, 04 Oct 2021 16:15:45 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 50865@debbugs.gnu.org
> 
> > From: Aura Kelloniemi <kaura.dev@sange.fi>
> > Cc: 50865@debbugs.gnu.org
> > Date: Mon, 04 Oct 2021 15:25:23 +0300
> > 
> > The solution of modifying char-width-table at least worked very well for me.
> > Of course I am intetrested in the things that will break, if I use it, but
> > most likely those will be smaller annoyances than a garbled display.
> > 
> > I can document this hack on emacs wiki, if nothing else can be done.
> 
> I don't recommend documenting such a "solution", because
> char-width-table affects more than just the display of wide
> characters, it also affects Lisp programs that use string-width and
> similar functions.

Here's a potentially better solution, which uses the display-table
feature built into Emacs to display problematic characters as some
other characters:

  (or standard-display-table
      (setq standard-display-table (make-display-table)))
  (aset standard-display-table
	#x1f64f (vector (make-glyph-code #xFFFD 'escape-glyph)))

This sets Emacs to display the U+01F64F PERSON WITH FOLDED HANDS
character as a diamond with a special face.  If the diamond also
causes trouble, try replacing it with some ASCII character, like '?'.

If this gives good results, you can do the same for any other
problematic character.  The disadvantage is that they all will look
the same on display, and the only way of knowing what is the real
codepoint in the buffer is to go to the character and type "C-u C-x =".






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-04 16:35                                 ` Eli Zaretskii
@ 2021-10-04 16:51                                   ` Aura Kelloniemi
  2021-10-04 17:06                                     ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Aura Kelloniemi @ 2021-10-04 16:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865

Hello,

On 2021-10-04 at 19:35 +0300, Eli Zaretskii <eliz@gnu.org> wrote:
 > Here's a potentially better solution, which uses the display-table
 > feature built into Emacs to display problematic characters as some
 > other characters:

Thank you very much.

I looked at the Linux kernel sources, and found from drivers/tty/vt/vt.c a
table of characters which will be automatically padded with space when writing
the glyph to the console. It recognizes two-column characters from Unicode
5.0. I try to get a patch into Linux itself which would extend support for
automatic padding to include current Unicode multi-column characters.

Whether this succeeds or fails, I'll report it here. This might take some time
though, I'm afraid. It probably would fix this alignment issue for good.

-- 
Aura





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-04 16:51                                   ` Aura Kelloniemi
@ 2021-10-04 17:06                                     ` Eli Zaretskii
  2022-09-02 12:08                                       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-10-04 17:06 UTC (permalink / raw)
  To: Aura Kelloniemi; +Cc: 50865

> From: Aura Kelloniemi <kaura.dev@sange.fi>
> Cc: 50865@debbugs.gnu.org
> Date: Mon, 04 Oct 2021 19:51:58 +0300
> 
> I looked at the Linux kernel sources, and found from drivers/tty/vt/vt.c a
> table of characters which will be automatically padded with space when writing
> the glyph to the console. It recognizes two-column characters from Unicode
> 5.0.

And does Emacs work correctly as long as you restrict yourself only to
those double-width characters? or are there still problems?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2021-10-04 17:06                                     ` Eli Zaretskii
@ 2022-09-02 12:08                                       ` Lars Ingebrigtsen
  2022-09-02 12:31                                         ` Gregory Heytings
  2022-09-02 12:59                                         ` Eli Zaretskii
  0 siblings, 2 replies; 29+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-02 12:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865, Aura Kelloniemi

Eli Zaretskii <eliz@gnu.org> writes:

>> I looked at the Linux kernel sources, and found from drivers/tty/vt/vt.c a
>> table of characters which will be automatically padded with space when writing
>> the glyph to the console. It recognizes two-column characters from Unicode
>> 5.0.
>
> And does Emacs work correctly as long as you restrict yourself only to
> those double-width characters? or are there still problems?

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

I only lightly skimmed this thread, but there's been some work done in
this area (displaying non-displayble characters on the console) over the
last week, and I wonder whether that's fixed this issue, too?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2022-09-02 12:08                                       ` Lars Ingebrigtsen
@ 2022-09-02 12:31                                         ` Gregory Heytings
  2022-09-02 12:59                                         ` Eli Zaretskii
  1 sibling, 0 replies; 29+ messages in thread
From: Gregory Heytings @ 2022-09-02 12:31 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50865, Eli Zaretskii, Aura Kelloniemi


>>> I looked at the Linux kernel sources, and found from 
>>> drivers/tty/vt/vt.c a table of characters which will be automatically 
>>> padded with space when writing the glyph to the console. It recognizes 
>>> two-column characters from Unicode 5.0.
>>
>> And does Emacs work correctly as long as you restrict yourself only to 
>> those double-width characters? or are there still problems?
>
> I only lightly skimmed this thread, but there's been some work done in 
> this area (displaying non-displayble characters on the console) over the 
> last week, and I wonder whether that's fixed this issue, too?
>

It seems to have improved indeed, but I'm not sure it's really fixed.

Note that the display is garbled on terminals under X too, with the recipe 
given by the OP (I tried xterm and rxvt).

That being said, emacs running in fbterm seems to give better results than 
what the OP saw: after C-x 8 RET PERSON WITH FOLDED HANDS RET you see a 
single diamond character followed by a space, which seems correct given 
that the Emoji character has a width = 2.  (In a terminal under X you'd 
see a two character wide empty box at that point.)

After C-x 8 RET EMOJI MODIFIER FITZPATRICK TYPE-3 RET the space that 
followed the diamond on display is replaced by another diamond character. 
(In a terminal under X you'd see two two character wide empty boxes at 
that point, which is worse.)

Aura, can you try either the current emacs-28 or the current master, and 
tell us if you think the issue is fixed?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2022-09-02 12:08                                       ` Lars Ingebrigtsen
  2022-09-02 12:31                                         ` Gregory Heytings
@ 2022-09-02 12:59                                         ` Eli Zaretskii
  2022-09-02 13:19                                           ` Gregory Heytings
  1 sibling, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2022-09-02 12:59 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 50865, kaura.dev

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Aura Kelloniemi <kaura.dev@sange.fi>,  50865@debbugs.gnu.org
> Date: Fri, 02 Sep 2022 14:08:42 +0200
> 
> I only lightly skimmed this thread, but there's been some work done in
> this area (displaying non-displayble characters on the console) over the
> last week, and I wonder whether that's fixed this issue, too?

Not really, no.  What was improved was the fallback display via the
extra-slot of glyphless-char-display.  Nothing was done to somehow
make the Linux console display correctly characters from latest
versions of Unicode, which it evidently doesn't support well.

I think the best solution for the Linux console's problems, short of
using fbterm or something similar, is to set up the standard display
table to show unsupported characters as U+FFFD replacements, perhaps
augmented by latin1-display-ucs-per-lynx.  Unfortunately, this
requires customization by users, since each one of them wants to see
certain characters in legible form, and doesn't care about the others.

Bottom line: the Linux console is simply unsuitable for showing
multi-lingual text, let alone Emoji sequences, as users expect
nowadays.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display
  2022-09-02 12:59                                         ` Eli Zaretskii
@ 2022-09-02 13:19                                           ` Gregory Heytings
  0 siblings, 0 replies; 29+ messages in thread
From: Gregory Heytings @ 2022-09-02 13:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 50865, Lars Ingebrigtsen, kaura.dev


>> I only lightly skimmed this thread, but there's been some work done in 
>> this area (displaying non-displayble characters on the console) over 
>> the last week, and I wonder whether that's fixed this issue, too?
>
> Not really, no.
>

Note that the OP said: "I'm also interested in any (dirty) hacks that 
could be used to work around this issue."  So in that respect at least the 
situation has improved, there is now a documented way to work around the 
issue: using fbterm instead of the raw Linux console.

>
> Bottom line: the Linux console is simply unsuitable for showing 
> multi-lingual text, let alone Emoji sequences, as users expect nowadays.
>

That's correct.  In fact it's not only "the Linux console", that's also 
true for most terminal emulators.





^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-09-02 13:19 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28 14:11 bug#50865: 28.0.50; Emoji with emoji modifier in Linux console garbles emacs display Aura Kelloniemi
2021-09-28 16:12 ` Eli Zaretskii
2021-09-28 16:54   ` Aura Kelloniemi
2021-09-28 17:21     ` Eli Zaretskii
2021-09-28 17:41       ` Aura Kelloniemi
2021-09-28 18:35         ` Eli Zaretskii
2021-09-28 19:20           ` Aura Kelloniemi
2021-09-28 20:32           ` Aura Kelloniemi
2021-09-29 13:00             ` Eli Zaretskii
2021-10-01 13:23               ` Aura Kelloniemi
2021-10-01 13:35                 ` Eli Zaretskii
2021-10-01 14:33                   ` Aura Kelloniemi
2021-10-01 15:41                     ` Eli Zaretskii
2021-10-01 16:02                       ` Aura Kelloniemi
2021-10-01 17:48                         ` Eli Zaretskii
2021-10-02 10:58                           ` Eli Zaretskii
2021-10-02 11:21                             ` Andreas Schwab
2021-10-02 11:56                               ` Eli Zaretskii
2021-10-04 12:25                             ` Aura Kelloniemi
2021-10-04 13:15                               ` Eli Zaretskii
2021-10-04 16:35                                 ` Eli Zaretskii
2021-10-04 16:51                                   ` Aura Kelloniemi
2021-10-04 17:06                                     ` Eli Zaretskii
2022-09-02 12:08                                       ` Lars Ingebrigtsen
2022-09-02 12:31                                         ` Gregory Heytings
2022-09-02 12:59                                         ` Eli Zaretskii
2022-09-02 13:19                                           ` Gregory Heytings
2021-10-02  8:11                       ` Lars Ingebrigtsen
2021-10-02  8:54                         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).