From: Eli Zaretskii <eliz@gnu.org>
To: Augusto Stoffel <arstoffel@gmail.com>
Cc: 61726@debbugs.gnu.org, joaotavora@gmail.com
Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability
Date: Thu, 23 Feb 2023 14:54:50 +0200 [thread overview]
Message-ID: <831qmgr17p.fsf@gnu.org> (raw)
In-Reply-To: <875ybsfvtj.fsf@gmail.com> (message from Augusto Stoffel on Thu, 23 Feb 2023 12:46:48 +0100)
> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 61726@debbugs.gnu.org, joaotavora@gmail.com
> Date: Thu, 23 Feb 2023 12:46:48 +0100
>
> >> +(defun eglot--current-column-utf-8 ()
> >> + "Calculate current column, counting bytes."
> >> + (- (position-bytes (point)) (position-bytes (line-beginning-position))))
> >
> > This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
> > counts the bytes in the internal representation Emacs uses for buffer
> > and string text. The differences are minor and subtle, but not
> > negligible.
>
> Right, if the buffer contains a char outside of the Unicode range, we
> lose.
>
> But just to confirm: position-bytes and byte-to-position are always with
> respect to Emacs's internal extended UTF-8 representation and have
> nothing to do with the buffer file enconding, right?
Yes. See bufferpos-to-filepos to get an idea of what hoops we need to
jump through to get it right, even just with UTF-8.
> > What does this stuff do with double-width or zero-width characters?
> > Emacs takes character-width into consideration when it counts columns,
> > but it is unclear to me what do LSP servers do in those cases.
> > Likewise with characters that are composed on display.
>
> `eglot-move-to-column' is supposed so count Unicode codepoints, so
> e.g. x, ⇒ and 😃 all contribute 1 unit.
But if the resulting column is then used in move-to-column etc., it
might go to the wrong column, because in Emacs each column is not
necessarily a single codepoint. The simplest example is a TAB
character, but there are more examples, some of which are quite
complicated (see below).
> One the other hand, the Emoji
> 🧛♀️ contributes 4 units. This is independent of with screen display.
Not in Emacs.
> By the way, I don't undertand your claim about column counting. If I
> move point over 🧛♀️, the mode line column count increments by 3 units,
> which seems to make no sense: this Emoji is 4 codepoints longs and
> occupies 1 screen column. What's the logic here?
If that is what you see, it could be a bug. Does current-column agree
with what you see in the mode line?
In general, characters (codepoints) that are composed on display into
a single glyph or "grapheme cluster" are supposed to be counted as a
single column. Try typing this in "emacs -Q"
a C-x 8 RET COMBINING ACUTE ACCENT RET
If your default font is capable enough, you will see a single glyph of
'a' with acute accent (á), and it will count as 1 column, although
there are 2 codepoints in the buffer. And "M-: (move-to-column 1) RET"
will move past both codepoints. Now imagine that we get such sequences
from the LSP server -- what will Eglot do in terms of column counting?
next prev parent reply other threads:[~2023-02-23 12:54 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-23 8:05 bug#61726: [PATCH] Eglot: Support positionEncoding capability Augusto Stoffel
2023-02-23 10:39 ` Eli Zaretskii
2023-02-23 11:32 ` João Távora
2023-02-23 12:04 ` Augusto Stoffel
2023-02-23 12:24 ` João Távora
2023-02-23 11:46 ` Augusto Stoffel
2023-02-23 12:54 ` Eli Zaretskii [this message]
2023-02-23 13:31 ` Augusto Stoffel
2023-02-23 15:04 ` Eli Zaretskii
2023-02-23 18:52 ` Augusto Stoffel
2023-02-23 19:20 ` Eli Zaretskii
2023-02-23 19:28 ` João Távora
2023-02-23 19:52 ` Augusto Stoffel
2023-02-24 6:43 ` Eli Zaretskii
2023-02-24 7:18 ` Augusto Stoffel
2023-02-24 8:38 ` Eli Zaretskii
2023-02-24 9:15 ` Augusto Stoffel
2023-02-24 10:20 ` João Távora
2023-02-24 11:01 ` Augusto Stoffel
2023-02-24 11:18 ` João Távora
2023-02-24 11:47 ` Augusto Stoffel
2023-02-24 12:05 ` João Távora
2023-02-24 12:14 ` Augusto Stoffel
2023-02-24 11:38 ` Eli Zaretskii
2023-02-24 11:55 ` João Távora
2023-02-24 11:27 ` Eli Zaretskii
2023-02-24 11:43 ` João Távora
2023-02-24 11:57 ` Eli Zaretskii
2023-02-24 12:09 ` João Távora
2023-02-24 12:18 ` Eli Zaretskii
2023-02-24 12:31 ` Augusto Stoffel
2023-02-24 12:01 ` Augusto Stoffel
2023-02-24 12:16 ` Eli Zaretskii
2023-02-24 12:35 ` Augusto Stoffel
2023-02-24 12:55 ` João Távora
2023-02-24 13:34 ` Eli Zaretskii
2023-02-24 13:45 ` João Távora
2023-02-24 13:51 ` Eli Zaretskii
2023-02-24 14:45 ` Augusto Stoffel
2023-02-24 15:19 ` Eli Zaretskii
2023-02-24 15:52 ` Augusto Stoffel
2023-02-24 16:01 ` Eli Zaretskii
2023-02-24 16:39 ` Augusto Stoffel
2023-02-24 17:07 ` Eli Zaretskii
2023-02-24 18:08 ` Augusto Stoffel
2023-02-24 18:55 ` João Távora
2023-02-25 10:58 ` Eli Zaretskii
2023-03-05 10:26 ` Augusto Stoffel
2023-02-25 10:57 ` Eli Zaretskii
2023-02-25 11:29 ` Augusto Stoffel
2023-02-25 13:47 ` Eli Zaretskii
2023-02-25 14:14 ` Augusto Stoffel
2023-02-25 16:26 ` Eli Zaretskii
2023-02-25 18:10 ` Augusto Stoffel
2023-02-25 22:15 ` João Távora
2023-02-25 22:13 ` João Távora
2023-02-25 22:34 ` Augusto Stoffel
2023-02-25 23:16 ` João Távora
2023-02-25 23:57 ` Augusto Stoffel
2023-02-26 6:03 ` Eli Zaretskii
2023-02-26 10:33 ` João Távora
2023-02-26 13:13 ` João Távora
2023-02-26 13:16 ` Eli Zaretskii
2023-02-26 13:25 ` Eli Zaretskii
2023-02-26 14:17 ` João Távora
2023-02-26 14:50 ` Eli Zaretskii
2023-02-26 15:15 ` João Távora
2023-02-26 15:37 ` Eli Zaretskii
2023-02-27 11:15 ` João Távora
2023-02-26 5:31 ` Eli Zaretskii
2023-02-26 10:38 ` João Távora
2023-02-24 14:54 ` Augusto Stoffel
2023-02-24 15:23 ` Eli Zaretskii
2023-02-24 15:56 ` Augusto Stoffel
2023-02-24 17:02 ` Eli Zaretskii
2023-02-24 16:34 ` João Távora
2023-02-24 17:06 ` Eli Zaretskii
2023-02-23 11:37 ` João Távora
2023-02-23 17:01 ` Felician Nemeth
2023-02-23 17:11 ` João Távora
2023-02-23 18:42 ` Augusto Stoffel
2023-02-27 10:11 ` Felician Nemeth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=831qmgr17p.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=61726@debbugs.gnu.org \
--cc=arstoffel@gmail.com \
--cc=joaotavora@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.