bug#61726: [PATCH] Eglot: Support positionEncoding capability

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
@ 2023-02-23  8:05 Augusto Stoffel
  2023-02-23 10:39 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23  8:05 UTC (permalink / raw)
  To: 61726; +Cc: João Távora

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

Tags: patch

There is a new LSP capability allowing the server and client to agree on
a way to count character offsets.  What do you think fo the attached
patch?

It expresses Eglot's preferences as counting character offsets, then
byte offsets, then the UTF-16 nonsense, in that order.

I would also suggest preparing the stage to eventually make
`eglot-current-column-function' and `eglot-move-to-column-function'
obsolete.  For that, I suggest renaming

- eglot-current-column -> eglot--current-column-utf-32
- eglot-lsp-abiding-column -> eglot--current-columns-utf-16
- eglot-move-to-column -> eglot--move-to-columns-utf-32
- eglot-move-to-lsp-abiding-column -> eglot--move-to-columns-utf-16

and then making the old names obsolete aliases of the new names.

(I have tested the utf-32 case superficially, and not at all the utf-8.
So this is a RFC.)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-lisp-progmodes-eglot.el-Support-positionEncoding-cap.patch --]
[-- Type: text/patch, Size: 6246 bytes --]

From a86601f4e80dfbf21a84230433c431375e3012aa Mon Sep 17 00:00:00 2001
From: Augusto Stoffel <arstoffel@gmail.com>
Date: Thu, 23 Feb 2023 08:55:58 +0100
Subject: [PATCH] * lisp/progmodes/eglot.el: Support positionEncoding
 capability

---
 lisp/progmodes/eglot.el | 65 +++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index b569c03e8c2..0268fbf63a5 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -816,6 +816,9 @@ eglot-client-capabilities
                                        `(:valueSet
                                          [,@(mapcar
                                              #'car eglot--tag-faces)])))
+            :general
+            (list
+             :positionEncodings ["utf-32" "utf-8" "utf-16"])
             :experimental eglot--{})))
 
 (cl-defgeneric eglot-workspace-folders (server)
@@ -1439,20 +1442,26 @@ eglot--warn
   (let ((warning-minimum-level :error))
     (display-warning 'eglot (apply #'format format args) :warning)))
 
-(defun eglot-current-column () (- (point) (line-beginning-position)))
+(defun eglot-current-column ()
+  "Calculate current column, counting Unicode codepoints."
+  (- (point) (line-beginning-position)))
+
+(defun eglot--current-column-utf-8 ()
+  "Calculate current column, counting bytes."
+  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
 
-(defvar eglot-current-column-function #'eglot-lsp-abiding-column
+(defvar eglot-current-column-function nil
   "Function to calculate the current column.
 
 This is the inverse operation of
 `eglot-move-to-column-function' (which see).  It is a function of
 no arguments returning a column number.  For buffers managed by
-fully LSP-compliant servers, this should be set to
-`eglot-lsp-abiding-column' (the default), and
-`eglot-current-column' for all others.")
+fully LSP-compliant servers, this should be nil.  For others, it
+can be set to `eglot-current-colum' or
+`eglot--current-column-utf-8'.")
 
 (defun eglot-lsp-abiding-column (&optional lbp)
-  "Calculate current COLUMN as defined by the LSP spec.
+  "Calculate current column, counting UTF-16 code units as in the original LSP spec.
 LBP defaults to `line-beginning-position'."
   (/ (- (length (encode-coding-region (or lbp (line-beginning-position))
                                       ;; Fix github#860
@@ -1462,13 +1471,19 @@ eglot-lsp-abiding-column
 
 (defun eglot--pos-to-lsp-position (&optional pos)
   "Convert point POS to LSP position."
-  (eglot--widening
-   ;; LSP line is zero-origin; emacs is one-origin.
-   (list :line (1- (line-number-at-pos pos t))
-         :character (progn (when pos (goto-char pos))
-                           (funcall eglot-current-column-function)))))
-
-(defvar eglot-move-to-column-function #'eglot-move-to-lsp-abiding-column
+  (let ((columnfn (or eglot-current-column-function
+                      (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                                        :positionEncoding)
+                        ("utf-32" #'eglot-current-column)
+                        ("utf-8" #'eglot--current-column-utf-8)
+                        (_ #'eglot-lsp-abiding-column)))))
+    (eglot--widening
+     ;; LSP line is zero-origin; emacs is one-origin.
+     (list :line (1- (line-number-at-pos pos t))
+           :character (progn (when pos (goto-char pos))
+                             (funcall columnfn))))))
+
+(defvar eglot-move-to-column-function nil
   "Function to move to a column reported by the LSP server.
 
 According to the standard, LSP column/character offsets are based
@@ -1478,11 +1493,11 @@ eglot-move-to-column-function
 `c'. However, many servers don't follow the spec this closely.
 
 For buffers managed by fully LSP-compliant servers, this should
-be set to `eglot-move-to-lsp-abiding-column' (the default), and
-`eglot-move-to-column' for all others.")
+be letft nil.  For others, it can be set to
+`eglot-move-to-column' or `eglot--move-to-column-utf-8'.")
 
 (defun eglot-move-to-column (column)
-  "Move to COLUMN without closely following the LSP spec."
+  "Move to COLUMN, counting Unicode codepoints."
   ;; We cannot use `move-to-column' here, because it moves to *visual*
   ;; columns, which can be different from LSP columns in case of
   ;; `whitespace-mode', `prettify-symbols-mode', etc.  (github#296,
@@ -1490,8 +1505,14 @@ eglot-move-to-column
   (goto-char (min (+ (line-beginning-position) column)
                   (line-end-position))))
 
+(defun eglot--move-to-column-utf-8 (column)
+  "Move to COLUMN, regarded as a byte offset."
+  (goto-char (min (byte-to-position
+                   (+ (position-bytes (line-beginning-position)) column))
+                  (line-end-position))))
+
 (defun eglot-move-to-lsp-abiding-column (column)
-  "Move to COLUMN abiding by the LSP spec."
+  "Move to COLUMN, counting UTF-16 code units as in the original LSP spec."
   (save-restriction
     (cl-loop
      with lbp = (line-beginning-position)
@@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
       (forward-line (min most-positive-fixnum
                          (plist-get pos-plist :line)))
       (unless (eobp) ;; if line was excessive leave point at eob
-        (let ((tab-width 1)
+        (let ((movefn (or eglot-move-to-column-function
+                          (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                                            :positionEncoding)
+                            ("utf-32" #'eglot-move-to-column)
+                            ("utf-8" #'eglot--move-to-column-utf-8)
+                            (_ #'eglot-move-to-lsp-abiding-column))))
+              (tab-width 1)
               (col (plist-get pos-plist :character)))
           (unless (wholenump col)
             (eglot--warn
              "Caution: LSP server sent invalid character position %s. Using 0 instead."
              col)
             (setq col 0))
-          (funcall eglot-move-to-column-function col)))
+          (funcall movefn col)))
       (if marker (copy-marker (point-marker)) (point)))))
 
 (defconst eglot--uri-path-allowed-chars
-- 
2.39.2


[-- Attachment #3: Type: text/plain, Size: 95 bytes --]


PS: João, since you may have had trouble receiving some emails, did you
notice bug#58141?

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23  8:05 bug#61726: [PATCH] Eglot: Support positionEncoding capability Augusto Stoffel
@ 2023-02-23 10:39 ` Eli Zaretskii
  2023-02-23 11:32   ` João Távora
  2023-02-23 11:46   ` Augusto Stoffel
  2023-02-23 11:37 ` João Távora
  2023-02-23 17:01 ` Felician Nemeth
  2 siblings, 2 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-23 10:39 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> Cc: João Távora <joaotavora@gmail.com>
> From: Augusto Stoffel <arstoffel@gmail.com>
> Date: Thu, 23 Feb 2023 09:05:35 +0100
> 
> There is a new LSP capability allowing the server and client to agree on
> a way to count character offsets.  What do you think fo the attached
> patch?
> 
> It expresses Eglot's preferences as counting character offsets, then
> byte offsets, then the UTF-16 nonsense, in that order.
> 
> I would also suggest preparing the stage to eventually make
> `eglot-current-column-function' and `eglot-move-to-column-function'
> obsolete.  For that, I suggest renaming
> 
> - eglot-current-column -> eglot--current-column-utf-32
> - eglot-lsp-abiding-column -> eglot--current-columns-utf-16
> - eglot-move-to-column -> eglot--move-to-columns-utf-32
> - eglot-move-to-lsp-abiding-column -> eglot--move-to-columns-utf-16
> 
> and then making the old names obsolete aliases of the new names.

Please tell more about this, as I don't think I have a clear enough
idea of the issues and the implications for Emacs.

> +(defun eglot--current-column-utf-8 ()
> +  "Calculate current column, counting bytes."
> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))

This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
counts the bytes in the internal representation Emacs uses for buffer
and string text.  The differences are minor and subtle, but not
negligible.

>  (defun eglot-move-to-column (column)
> -  "Move to COLUMN without closely following the LSP spec."
> +  "Move to COLUMN, counting Unicode codepoints."
>    ;; We cannot use `move-to-column' here, because it moves to *visual*
>    ;; columns, which can be different from LSP columns in case of
>    ;; `whitespace-mode', `prettify-symbols-mode', etc.  (github#296,
> @@ -1490,8 +1505,14 @@ eglot-move-to-column
>    (goto-char (min (+ (line-beginning-position) column)
>                    (line-end-position))))
>  
> +(defun eglot--move-to-column-utf-8 (column)
> +  "Move to COLUMN, regarded as a byte offset."
> +  (goto-char (min (byte-to-position
> +                   (+ (position-bytes (line-beginning-position)) column))
> +                  (line-end-position))))
> +
>  (defun eglot-move-to-lsp-abiding-column (column)
> -  "Move to COLUMN abiding by the LSP spec."
> +  "Move to COLUMN, counting UTF-16 code units as in the original LSP spec."
>    (save-restriction
>      (cl-loop
>       with lbp = (line-beginning-position)
> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>        (forward-line (min most-positive-fixnum
>                           (plist-get pos-plist :line)))
>        (unless (eobp) ;; if line was excessive leave point at eob
> -        (let ((tab-width 1)
> +        (let ((movefn (or eglot-move-to-column-function
> +                          (pcase (plist-get (eglot--capabilities (eglot-current-server))
> +                                            :positionEncoding)
> +                            ("utf-32" #'eglot-move-to-column)
> +                            ("utf-8" #'eglot--move-to-column-utf-8)
> +                            (_ #'eglot-move-to-lsp-abiding-column))))
> +              (tab-width 1)
>                (col (plist-get pos-plist :character)))
>            (unless (wholenump col)
>              (eglot--warn
>               "Caution: LSP server sent invalid character position %s. Using 0 instead."
>               col)
>              (setq col 0))
> -          (funcall eglot-move-to-column-function col)))
> +          (funcall movefn col)))
>        (if marker (copy-marker (point-marker)) (point)))))

What does this stuff do with double-width or zero-width characters?
Emacs takes character-width into consideration when it counts columns,
but it is unclear to me what do LSP servers do in those cases.
Likewise with characters that are composed on display.

So I think this mess needs to be carefully and elaborately discussed
before we decide how to implement it correctly.

Thanks.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 10:39 ` Eli Zaretskii
@ 2023-02-23 11:32   ` João Távora
  2023-02-23 12:04     ` Augusto Stoffel
  2023-02-23 11:46   ` Augusto Stoffel
  1 sibling, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-23 11:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Thu, Feb 23, 2023 at 10:38 AM Eli Zaretskii <eliz@gnu.org> wrote:

> So I think this mess needs to be carefully and elaborately discussed
> before we decide how to implement it correctly.

I agree.  Just thinking about encodings makes my head hurt,
so I'll probably skip this one and leave Eli or some
other expert to this. It was hard enough to implement the
"lsp abiding" case a few years ago.  But, at least, and as
far as I know, most if not all servers correctly implement it
now and so do we.

So it's nice that this new capability popped up. But, as far
as I understand, the only benefit of leveraging it is for better
efficiency.  Right?  Or are we risking incompatibility with
some servers until we implement support for it? Please confirm
this, Augusto.

Anyway, some kind of renaming like the one Augusto proposed
could be a first step in re-understanding the problem. The
name "lsp abiding" is very bad, as I don't remember what I meant
by it anymore.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23  8:05 bug#61726: [PATCH] Eglot: Support positionEncoding capability Augusto Stoffel
  2023-02-23 10:39 ` Eli Zaretskii
@ 2023-02-23 11:37 ` João Távora
  2023-02-23 17:01 ` Felician Nemeth
  2 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-23 11:37 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726

On Thu, Feb 23, 2023 at 8:06 AM Augusto Stoffel <arstoffel@gmail.com> wrote:

> PS: João, since you may have had trouble receiving some emails, did you
> notice bug#58141?

I did, but it's face-related bug and I usually have not much interest
in those, since anyone customize faces easily.  I'll reply.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 10:39 ` Eli Zaretskii
  2023-02-23 11:32   ` João Távora
@ 2023-02-23 11:46   ` Augusto Stoffel
  2023-02-23 12:54     ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 11:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Thu, 23 Feb 2023 at 12:39, Eli Zaretskii wrote:

>> I would also suggest preparing the stage to eventually make
>> `eglot-current-column-function' and `eglot-move-to-column-function'
>> obsolete.  For that, I suggest renaming
> Please tell more about this, as I don't think I have a clear enough
> idea of the issues and the implications for Emacs.

These vars were meant to make nonconformant servers (regarding the way
they count character offsets) work with Eglot.  A new addition to the
LSP spec allows the server can announce how it counts character offsets,
so ther should be no reason for servers to be nonconformant hence no
reasons for a workaround variable.

>> +(defun eglot--current-column-utf-8 ()
>> +  "Calculate current column, counting bytes."
>> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
>
> This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
> counts the bytes in the internal representation Emacs uses for buffer
> and string text.  The differences are minor and subtle, but not
> negligible.

Right, if the buffer contains a char outside of the Unicode range, we
lose.

But just to confirm: position-bytes and byte-to-position are always with
respect to Emacs's internal extended UTF-8 representation and have
nothing to do with the buffer file enconding, right?

> What does this stuff do with double-width or zero-width characters?
> Emacs takes character-width into consideration when it counts columns,
> but it is unclear to me what do LSP servers do in those cases.
> Likewise with characters that are composed on display.

`eglot-move-to-column' is supposed so count Unicode codepoints, so
e.g. x, ⇒ and 😃 all contribute 1 unit.  One the other hand, the Emoji
🧛‍♀️ contributes 4 units. This is independent of with screen display.

By the way, I don't undertand your claim about column counting.  If I
move point over 🧛‍♀️, the mode line column count increments by 3 units,
which seems to make no sense: this Emoji is 4 codepoints longs and
occupies 1 screen column.  What's the logic here?

> So I think this mess needs to be carefully and elaborately discussed
> before we decide how to implement it correctly.

Sure.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 11:32   ` João Távora
@ 2023-02-23 12:04     ` Augusto Stoffel
  2023-02-23 12:24       ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 12:04 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Thu, 23 Feb 2023 at 11:32, João Távora wrote:

> So it's nice that this new capability popped up. But, as far as I
> understand, the only benefit of leveraging it is for better
> efficiency.  Right? Or are we risking incompatibility with some
> servers until we implement support for it? Please confirm this,
> Augusto.

There is no real risk in not implementing this.

I don't know how many servers out there are nonconforming, because, in
practice, the problem is very rare and basically will only appear if the
user is operating on a line containing Emoji or uncommon math
characters.  So there may well be a lot of nonconforming server out
there but we don't see the consequence of that very often.

Digestif (which is not very important) is nonconforming because I didn't
want to implement such a dumb spec.

IIUC correctly, clangd was the group that pushed for this new LSP spec,
but being a such a big project they surely support the official way of
counting.

Anyway, more than efficiency, this to me is an aesthetic question.  The
UTF-16 way of counting of the original LSP spec is a totally misguided
idea and should be avoided.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 12:04     ` Augusto Stoffel
@ 2023-02-23 12:24       ` João Távora
  0 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-23 12:24 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Thu, Feb 23, 2023 at 12:04 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Thu, 23 Feb 2023 at 11:32, João Távora wrote:

> IIUC correctly, clangd was the group that pushed for this new LSP spec,
> but being a such a big project they surely support the official way of
> counting.

Yep, they do. This was a bug report from our end and they fixed it
some time ago, IIRC.

> Anyway, more than efficiency, this to me is an aesthetic question.  The
> UTF-16 way of counting of the original LSP spec is a totally misguided
> idea and should be avoided.

I agree that it is poor design.  But if it works and isn't particularly
slow, then I don't think it's a priority to change things in Eglot,
mostly because that adds complexity for relatively little value.
That it isn't "priority" doesn't mean I would oppose it if you work
things out with an encoding expert like Eli and give it suitable
testing.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 11:46   ` Augusto Stoffel
@ 2023-02-23 12:54     ` Eli Zaretskii
  2023-02-23 13:31       ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-23 12:54 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 61726@debbugs.gnu.org,  joaotavora@gmail.com
> Date: Thu, 23 Feb 2023 12:46:48 +0100
> 
> >> +(defun eglot--current-column-utf-8 ()
> >> +  "Calculate current column, counting bytes."
> >> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
> >
> > This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
> > counts the bytes in the internal representation Emacs uses for buffer
> > and string text.  The differences are minor and subtle, but not
> > negligible.
> 
> Right, if the buffer contains a char outside of the Unicode range, we
> lose.
> 
> But just to confirm: position-bytes and byte-to-position are always with
> respect to Emacs's internal extended UTF-8 representation and have
> nothing to do with the buffer file enconding, right?

Yes.  See bufferpos-to-filepos to get an idea of what hoops we need to
jump through to get it right, even just with UTF-8.

> > What does this stuff do with double-width or zero-width characters?
> > Emacs takes character-width into consideration when it counts columns,
> > but it is unclear to me what do LSP servers do in those cases.
> > Likewise with characters that are composed on display.
> 
> `eglot-move-to-column' is supposed so count Unicode codepoints, so
> e.g. x, ⇒ and 😃 all contribute 1 unit.

But if the resulting column is then used in move-to-column etc., it
might go to the wrong column, because in Emacs each column is not
necessarily a single codepoint.  The simplest example is a TAB
character, but there are more examples, some of which are quite
complicated (see below).

> One the other hand, the Emoji
> 🧛‍♀️ contributes 4 units. This is independent of with screen display.

Not in Emacs.

> By the way, I don't undertand your claim about column counting.  If I
> move point over 🧛‍♀️, the mode line column count increments by 3 units,
> which seems to make no sense: this Emoji is 4 codepoints longs and
> occupies 1 screen column.  What's the logic here?

If that is what you see, it could be a bug.  Does current-column agree
with what you see in the mode line?

In general, characters (codepoints) that are composed on display into
a single glyph or "grapheme cluster" are supposed to be counted as a
single column.  Try typing this in "emacs -Q"

  a C-x 8 RET COMBINING ACUTE ACCENT RET

If your default font is capable enough, you will see a single glyph of
'a' with acute accent (á), and it will count as 1 column, although
there are 2 codepoints in the buffer.  And "M-: (move-to-column 1) RET"
will move past both codepoints.  Now imagine that we get such sequences
from the LSP server -- what will Eglot do in terms of column counting?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 12:54     ` Eli Zaretskii
@ 2023-02-23 13:31       ` Augusto Stoffel
  2023-02-23 15:04         ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 13:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Thu, 23 Feb 2023 at 14:54, Eli Zaretskii wrote:

>> But just to confirm: position-bytes and byte-to-position are always with
>> respect to Emacs's internal extended UTF-8 representation and have
>> nothing to do with the buffer file enconding, right?
>
> Yes.  See bufferpos-to-filepos to get an idea of what hoops we need to
> jump through to get it right, even just with UTF-8.

Okay, then we're on the same page.  Just to emphasize, the buffer file
is totally irrelevant for Eglot's purposes.  The only thing that matters
is the representation of the buffer text when it's serialized as an
UTF-8-encoded string inside a JSON object.

>> > What does this stuff do with double-width or zero-width characters?
>> > Emacs takes character-width into consideration when it counts columns,
>> > but it is unclear to me what do LSP servers do in those cases.
>> > Likewise with characters that are composed on display.
>> 
>> `eglot-move-to-column' is supposed so count Unicode codepoints, so
>> e.g. x, ⇒ and 😃 all contribute 1 unit.
>
> But if the resulting column is then used in move-to-column etc., it
> might go to the wrong column, because in Emacs each column is not
> necessarily a single codepoint.  The simplest example is a TAB
> character, but there are more examples, some of which are quite
> complicated (see below).

There's only one function that uses `move-to-column'.  It's very old and
I didn't touch it.

>> One the other hand, the Emoji
>> 🧛‍♀️ contributes 4 units. This is independent of with screen display.
>
> Not in Emacs.

Sorry, I don't understand what you mean.  Emas has no say as to how
Emoji are represented as sequences of codepoints.  The female vampire
Emoji is 4 codepoints, if I'm counting it right.

Of course I undestand taht the Emoji occupies 1 column in my screen.

>> By the way, I don't undertand your claim about column counting.  If I
>> move point over 🧛‍♀️, the mode line column count increments by 3 units,
>> which seems to make no sense: this Emoji is 4 codepoints longs and
>> occupies 1 screen column.  What's the logic here?
>
> If that is what you see, it could be a bug.  Does current-column agree
> with what you see in the mode line?

Yes.

> In general, characters (codepoints) that are composed on display into
> a single glyph or "grapheme cluster" are supposed to be counted as a
> single column.  Try typing this in "emacs -Q"
>
>   a C-x 8 RET COMBINING ACUTE ACCENT RET
>
> If your default font is capable enough, you will see a single glyph of
> 'a' with acute accent (á), and it will count as 1 column, although
> there are 2 codepoints in the buffer.  And "M-: (move-to-column 1) RET"
> will move past both codepoints.  Now imagine that we get such sequences
> from the LSP server -- what will Eglot do in terms of column counting?

Right, I undestand the Unicode business (thanks for the pointers in any
case).

If you look carefully at the Eglot code, you will see that
`move-to-column' only appears in the code pertaining the “UTF-16 way of
counting offsets”, which

1. is old and I didn't touch in this patch,
2. seems to work correctly, despite looking suspicious, and
3. will not be used anymore when both Eglot and the LSP server supports
   the positionEncodings capabitily.

I hope this motivates you to add this feature 🙂.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 13:31       ` Augusto Stoffel
@ 2023-02-23 15:04         ` Eli Zaretskii
  2023-02-23 18:52           ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-23 15:04 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 61726@debbugs.gnu.org,  joaotavora@gmail.com
> Date: Thu, 23 Feb 2023 14:31:52 +0100
> 
> On Thu, 23 Feb 2023 at 14:54, Eli Zaretskii wrote:
> 
> >> But just to confirm: position-bytes and byte-to-position are always with
> >> respect to Emacs's internal extended UTF-8 representation and have
> >> nothing to do with the buffer file enconding, right?
> >
> > Yes.  See bufferpos-to-filepos to get an idea of what hoops we need to
> > jump through to get it right, even just with UTF-8.
> 
> Okay, then we're on the same page.  Just to emphasize, the buffer file
> is totally irrelevant for Eglot's purposes.  The only thing that matters
> is the representation of the buffer text when it's serialized as an
> UTF-8-encoded string inside a JSON object.

The buffer's file is not important for the issue at hand, only its
encoding is.  And in the case of Eglot, the encoding is still there,
even though there's no file involved.  So the code in
bufferpos-to-filepos is still very relevant, as it shows what has to
be done for such conversions.

> >> `eglot-move-to-column' is supposed so count Unicode codepoints, so
> >> e.g. x, ⇒ and 😃 all contribute 1 unit.
> >
> > But if the resulting column is then used in move-to-column etc., it
> > might go to the wrong column, because in Emacs each column is not
> > necessarily a single codepoint.  The simplest example is a TAB
> > character, but there are more examples, some of which are quite
> > complicated (see below).
> 
> There's only one function that uses `move-to-column'.  It's very old and
> I didn't touch it.

Then why does Eglot want to know the column at all?

> >> One the other hand, the Emoji
> >> 🧛‍♀️ contributes 4 units. This is independent of with screen display.
> >
> > Not in Emacs.
> 
> Sorry, I don't understand what you mean.  Emas has no say as to how
> Emoji are represented as sequences of codepoints.  The female vampire
> Emoji is 4 codepoints, if I'm counting it right.

What I meant is that the number of columns a given sequence of
codepoints will take on display is not equal to the number of
codepoints in the sequence.  This is so for Emoji sequences as well.

> > If that is what you see, it could be a bug.  Does current-column agree
> > with what you see in the mode line?
> 
> Yes.

Then at least it's not a grave bug.  current-column and friends
doesn't support all the quirks of our display code which can change
how many columns some sequence of codepoints can take on display.  It
does support quite a few of them, though.

> If you look carefully at the Eglot code, you will see that
> `move-to-column' only appears in the code pertaining the “UTF-16 way of
> counting offsets”, which
> 
> 1. is old and I didn't touch in this patch,
> 2. seems to work correctly, despite looking suspicious, and
> 3. will not be used anymore when both Eglot and the LSP server supports
>    the positionEncodings capabitily.

Positions do not necessarily transform to columns easily.  So,
depending on how Eglot uses this information, we may or may not have
problems.

In general, reporting coordinates in columns between programs is
problematic.  We see this in many cases, starting with spellers.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23  8:05 bug#61726: [PATCH] Eglot: Support positionEncoding capability Augusto Stoffel
  2023-02-23 10:39 ` Eli Zaretskii
  2023-02-23 11:37 ` João Távora
@ 2023-02-23 17:01 ` Felician Nemeth
  2023-02-23 17:11   ` João Távora
  2023-02-23 18:42   ` Augusto Stoffel
  2 siblings, 2 replies; 82+ messages in thread
From: Felician Nemeth @ 2023-02-23 17:01 UTC (permalink / raw)
  To: 61726; +Cc: Augusto Stoffel, João Távora

Augusto Stoffel <arstoffel@gmail.com> writes:

> Tags: patch
>
> There is a new LSP capability allowing the server and client to agree on
> a way to count character offsets.  What do you think fo the attached
> patch?

It closes https://github.com/joaotavora/eglot/pull/916, hooray!

I think the patch has a small bug.  With it, Eglot always negotiate the
encoding with the server, but when the user sets
eglot-current-column-function or eglot-move-to-column-function, the
result of the negotiation is ignored, which might confuse the server.

(In the long run, it might make sense to let the list of the offered
encodings configurable.)

Thanks.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 17:01 ` Felician Nemeth
@ 2023-02-23 17:11   ` João Távora
  2023-02-23 18:42   ` Augusto Stoffel
  1 sibling, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-23 17:11 UTC (permalink / raw)
  To: Felician Nemeth; +Cc: 61726, Augusto Stoffel

On Thu, Feb 23, 2023 at 5:01 PM Felician Nemeth
<felician.nemeth@gmail.com> wrote:
>
> Augusto Stoffel <arstoffel@gmail.com> writes:
>
> > Tags: patch
> >
> > There is a new LSP capability allowing the server and client to agree on
> > a way to count character offsets.  What do you think fo the attached
> > patch?
>
> It closes https://github.com/joaotavora/eglot/pull/916, hooray!

I had completely forgotten about your patch!  Sorry!  Well, re-reading
it, it doesn't seem as complex as Augusto's.  Is it also as functional?
Do you want to abandon it in favor of Augusto's?

And can we get a simple performance measurement before&after, maybe
for both patches?

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 17:01 ` Felician Nemeth
  2023-02-23 17:11   ` João Távora
@ 2023-02-23 18:42   ` Augusto Stoffel
  2023-02-27 10:11     ` Felician Nemeth
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 18:42 UTC (permalink / raw)
  To: Felician Nemeth; +Cc: 61726, Eli Zaretskii, João Távora

On Thu, 23 Feb 2023 at 18:01, Felician Nemeth wrote:

> Augusto Stoffel <arstoffel@gmail.com> writes:
>
>> Tags: patch
>>
>> There is a new LSP capability allowing the server and client to agree on
>> a way to count character offsets.  What do you think fo the attached
>> patch?
>
> It closes https://github.com/joaotavora/eglot/pull/916, hooray!
>
> I think the patch has a small bug.  With it, Eglot always negotiate the
> encoding with the server, but when the user sets
> eglot-current-column-function or eglot-move-to-column-function, the
> result of the negotiation is ignored, which might confuse the server.

This is intentional, so let's see if you agree with my rationale.

Currently, the point of eglot-{current-,move-to-}column-function is to
override the behavior of nonconforming servers.  There's no reason to
touch it if your server conforms 100% to the spec.  (Note that if you
update your nonconforming server and it now became conforming, then all
of a sudden you setup is guaranteed to be broken; you need to unset
those variables!)

With my patch, the purpose of these vars continues to be to override the
wrong behavior of a defective server.  So of course it should ignore the
result of the negotiation.

Moreover, I would like to mark eglot-{current-,move-to-}column-function
obsolete (preferably right now), so it can be removed entirely in a few
years. By then, there should be no excuse for nonconforming servers,
since they can use the positionEncoding protocol to stay away of the
UTF-16 nonsense.  Of course, if that proves a wrong assumption, we'll
just keep those vars around for longer.

What do you think?

> (In the long run, it might make sense to let the list of the offered
> encodings configurable.)

I don't think I understand why.  We would already be offering all 3
positionEncoding options.  The server can pick whatever they like and
we'll deal with it.  Everything should "just work", no need for
configuration.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 15:04         ` Eli Zaretskii
@ 2023-02-23 18:52           ` Augusto Stoffel
  2023-02-23 19:20             ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 18:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Thu, 23 Feb 2023 at 17:04, Eli Zaretskii wrote:

> Then why does Eglot want to know the column at all?

In the Eglot context, "column" simply means "offset since the beginning
of the line", where "offset" can be counted in one of three diferent
ways: codepoints, bytes of the UTF-8 representation, or bytes of the
UTF-16 representation divided by two (a.k.a. code units).

(Number of cells occupied in the terminal is _not_ one of the ways to
count "offset", thankfully!)

I don't think "column" is a particularly confusing name here, but if you
want to rename this, now might be a good opportunity.

>> If you look carefully at the Eglot code, you will see that
>> `move-to-column' only appears in the code pertaining the “UTF-16 way of
>> counting offsets”, which
>> 
>> 1. is old and I didn't touch in this patch,
>> 2. seems to work correctly, despite looking suspicious, and
>> 3. will not be used anymore when both Eglot and the LSP server supports
>>    the positionEncodings capabitily.
>
> Positions do not necessarily transform to columns easily.  So,
> depending on how Eglot uses this information, we may or may not have
> problems.
>
> In general, reporting coordinates in columns between programs is
> problematic.  We see this in many cases, starting with spellers.

Yes, I agree this is problematic -- as soon as you leave the wonderful
world of UTF-8.  The point of this patch is to stay in that wonderful
world as long as the server agrees to it, which I hope will become
commonplace.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 18:52           ` Augusto Stoffel
@ 2023-02-23 19:20             ` Eli Zaretskii
  2023-02-23 19:28               ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-23 19:20 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: 61726@debbugs.gnu.org,  joaotavora@gmail.com
> Date: Thu, 23 Feb 2023 19:52:41 +0100
> 
> > In general, reporting coordinates in columns between programs is
> > problematic.  We see this in many cases, starting with spellers.
> 
> Yes, I agree this is problematic -- as soon as you leave the wonderful
> world of UTF-8.  The point of this patch is to stay in that wonderful
> world as long as the server agrees to it, which I hope will become
> commonplace.

Actually, the really "wonderful" world would be if the offsets were
reported in character codepoints, because buffer positions are both
the easiest for us to count and are unequivocally defined.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 19:20             ` Eli Zaretskii
@ 2023-02-23 19:28               ` João Távora
  2023-02-23 19:52                 ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-23 19:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Thu, Feb 23, 2023 at 7:20 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Augusto Stoffel <arstoffel@gmail.com>
> > Cc: 61726@debbugs.gnu.org,  joaotavora@gmail.com
> > Date: Thu, 23 Feb 2023 19:52:41 +0100
> >
> > > In general, reporting coordinates in columns between programs is
> > > problematic.  We see this in many cases, starting with spellers.
> >
> > Yes, I agree this is problematic -- as soon as you leave the wonderful
> > world of UTF-8.  The point of this patch is to stay in that wonderful
> > world as long as the server agrees to it, which I hope will become
> > commonplace.
>
> Actually, the really "wonderful" world would be if the offsets were
> reported in character codepoints, because buffer positions are both
> the easiest for us to count and are unequivocally defined.

I quite agree.  Is this "count in codepoints" capability defined
in LSP at all?  If so, I'd rather we add support for it over other
slightly less imperfect methods.  If not, we should lobby in LSP
forums for its addition to the LSP spec.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 19:28               ` João Távora
@ 2023-02-23 19:52                 ` Augusto Stoffel
  2023-02-24  6:43                   ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-23 19:52 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Thu, 23 Feb 2023 at 19:28, João Távora wrote:

>> > > In general, reporting coordinates in columns between programs is
>> > > problematic.  We see this in many cases, starting with spellers.
>> >
>> > Yes, I agree this is problematic -- as soon as you leave the wonderful
>> > world of UTF-8.  The point of this patch is to stay in that wonderful
>> > world as long as the server agrees to it, which I hope will become
>> > commonplace.
>>
>> Actually, the really "wonderful" world would be if the offsets were
>> reported in character codepoints, because buffer positions are both
>> the easiest for us to count and are unequivocally defined.
>
> I quite agree.

Yes, everybody agrees!

> Is this "count in codepoints" capability defined in LSP at all?

That's _exactly_ what the patch implements 🥳.

And then it also provides a second option, which is to use "bytes of the
UTF-8 representation" as the unit of measurement.  And then only as the
third option it falls back to the "LSP abiding" nonsense.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 19:52                 ` Augusto Stoffel
@ 2023-02-24  6:43                   ` Eli Zaretskii
  2023-02-24  7:18                     ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24  6:43 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  61726@debbugs.gnu.org
> Date: Thu, 23 Feb 2023 20:52:06 +0100
> 
> On Thu, 23 Feb 2023 at 19:28, João Távora wrote:
> 
> >> Actually, the really "wonderful" world would be if the offsets were
> >> reported in character codepoints, because buffer positions are both
> >> the easiest for us to count and are unequivocally defined.
> >
> > I quite agree.
> 
> Yes, everybody agrees!
> 
> > Is this "count in codepoints" capability defined in LSP at all?
> 
> That's _exactly_ what the patch implements 🥳.

It does? then please humor me by walking me through the code and the
patch to show how that would work after applying the patch.

Thanks.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24  6:43                   ` Eli Zaretskii
@ 2023-02-24  7:18                     ` Augusto Stoffel
  2023-02-24  8:38                       ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24  7:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 08:43, Eli Zaretskii wrote:

> It does? then please humor me by walking me through the code and the
> patch to show how that would work after applying the patch.

From a86601f4e80dfbf21a84230433c431375e3012aa Mon Sep 17 00:00:00 2001
From: Augusto Stoffel <arstoffel@gmail.com>
Date: Thu, 23 Feb 2023 08:55:58 +0100
Subject: [PATCH] * lisp/progmodes/eglot.el: Support positionEncoding
 capability

---
 lisp/progmodes/eglot.el | 65 +++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index b569c03e8c2..0268fbf63a5 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -816,6 +816,9 @@ eglot-client-capabilities
                                        `(:valueSet
                                          [,@(mapcar
                                              #'car eglot--tag-faces)])))
+            :general
+            (list
+             :positionEncodings ["utf-32" "utf-8" "utf-16"])
             :experimental eglot--{})))

We announce our position encoding capabilities, in this order of
preference:

A. counting characters a.k.a. Unicode codepoints
B. counting bytes of the UTF-8 representation
C. counting bytes of the UTF-16 representation divided by two (which is
   currently the default).

 (cl-defgeneric eglot-workspace-folders (server)
@@ -1439,20 +1442,26 @@ eglot--warn
   (let ((warning-minimum-level :error))
     (display-warning 'eglot (apply #'format format args) :warning)))
 
-(defun eglot-current-column () (- (point) (line-beginning-position)))
+(defun eglot-current-column ()
+  "Calculate current column, counting Unicode codepoints."
+  (- (point) (line-beginning-position)))

I added a docstring.

+(defun eglot--current-column-utf-8 ()
+  "Calculate current column, counting bytes."
+  (- (position-bytes (point)) (position-bytes (line-beginning-position))))

I defined a new function to support the style B. of counting offsets.

-(defvar eglot-current-column-function #'eglot-lsp-abiding-column
+(defvar eglot-current-column-function nil
   "Function to calculate the current column.

I changed the default so this variable can be eventually made obsolete.
Note that it is a workaround variable introduced for the sole purpose of
making nonconforming servers work with Eglot.  But this problem should
slowly vanish with the introduction of the :positionEncoding capability.
Hence my suggestion to obsolete this workaround variable.

 This is the inverse operation of
 `eglot-move-to-column-function' (which see).  It is a function of
 no arguments returning a column number.  For buffers managed by
-fully LSP-compliant servers, this should be set to
-`eglot-lsp-abiding-column' (the default), and
-`eglot-current-column' for all others.")
+fully LSP-compliant servers, this should be nil.  For others, it
+can be set to `eglot-current-colum' or
+`eglot--current-column-utf-8'.")
 
 (defun eglot-lsp-abiding-column (&optional lbp)
-  "Calculate current COLUMN as defined by the LSP spec.
+  "Calculate current column, counting UTF-16 code units as in the original LSP spec.
 LBP defaults to `line-beginning-position'."
   (/ (- (length (encode-coding-region (or lbp (line-beginning-position))
                                       ;; Fix github#860

The LSP spec now describes 3 ways of counting offsets, hence the
documenation clarification.

@@ -1462,13 +1471,19 @@ eglot-lsp-abiding-column
 
 (defun eglot--pos-to-lsp-position (&optional pos)
   "Convert point POS to LSP position."
-  (eglot--widening
-   ;; LSP line is zero-origin; emacs is one-origin.
-   (list :line (1- (line-number-at-pos pos t))
-         :character (progn (when pos (goto-char pos))
-                           (funcall eglot-current-column-function)))))
-
+  (let ((columnfn (or eglot-current-column-function
+                      (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                                        :positionEncoding)
+                        ("utf-32" #'eglot-current-column)
+                        ("utf-8" #'eglot--current-column-utf-8)
+                        (_ #'eglot-lsp-abiding-column)))))
+    (eglot--widening
+     ;; LSP line is zero-origin; emacs is one-origin.
+     (list :line (1- (line-number-at-pos pos t))
+           :character (progn (when pos (goto-char pos))
+                             (funcall columnfn))))))
+

This is the heart of the patch.

A “good” server will provide :positionEncoding "utf-32", and we'll keep
the workaround variable `eglot-current-column-function' at its new default
value of nil.  So columnfn, which is called in the last line of the
chunck, will be bound to `eglot-current-column'.

A “bad” server will provide provide :positionEncoding "utf-16" or nil,
and then we will call `eglot-lsp-abiding-column' near the end.

An “inbetween” server will provide :positionEncoding "utf-8" and we will
call the newly added `eglot--current-column-utf-8' near the end.

If the user sets `eglot-current-column-function' to work around an
issue, nothing changes in relation to the original version.

-(defvar eglot-move-to-column-function #'eglot-move-to-lsp-abiding-column
+(defvar eglot-move-to-column-function nil
   "Function to move to a column reported by the LSP server.
 
I changed the default so this variable can be eventually made obsolete.
Note that it is a workaround variable introduced for the sole purpose of
making nonconforming servers work with Eglot.  But this problem should
slowly vanish with the introduction of the :positionEncoding capability.
Hence my suggestion to obsolete this workaround variable.

@@ -1478,11 +1493,11 @@ eglot-move-to-column-function
 `c'. However, many servers don't follow the spec this closely.
 
 For buffers managed by fully LSP-compliant servers, this should
-be set to `eglot-move-to-lsp-abiding-column' (the default), and
-`eglot-move-to-column' for all others.")
+be letft nil.  For others, it can be set to
+`eglot-move-to-column' or `eglot--move-to-column-utf-8'.")
 
 (defun eglot-move-to-column (column)
-  "Move to COLUMN without closely following the LSP spec."
+  "Move to COLUMN, counting Unicode codepoints."
   ;; We cannot use `move-to-column' here, because it moves to *visual*
   ;; columns, which can be different from LSP columns in case of
   ;; `whitespace-mode', `prettify-symbols-mode', etc.  (github#296,

The LSP spec now describes 3 ways of counting offsets, hence the
documenation clarification.

@@ -1490,8 +1505,14 @@ eglot-move-to-column
   (goto-char (min (+ (line-beginning-position) column)
                   (line-end-position))))
 
+(defun eglot--move-to-column-utf-8 (column)
+  "Move to COLUMN, regarded as a byte offset."
+  (goto-char (min (byte-to-position
+                   (+ (position-bytes (line-beginning-position)) column))
+                  (line-end-position))))
+

I defined a new function to support the style B. of counting offsets.

 (defun eglot-move-to-lsp-abiding-column (column)
-  "Move to COLUMN abiding by the LSP spec."
+  "Move to COLUMN, counting UTF-16 code units as in the original LSP spec."
   (save-restriction
     (cl-loop
      with lbp = (line-beginning-position)
@@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
       (forward-line (min most-positive-fixnum
                          (plist-get pos-plist :line)))
       (unless (eobp) ;; if line was excessive leave point at eob
-        (let ((tab-width 1)
+        (let ((movefn (or eglot-move-to-column-function
+                          (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                                            :positionEncoding)
+                            ("utf-32" #'eglot-move-to-column)
+                            ("utf-8" #'eglot--move-to-column-utf-8)
+                            (_ #'eglot-move-to-lsp-abiding-column))))
+              (tab-width 1)
               (col (plist-get pos-plist :character)))
           (unless (wholenump col)
             (eglot--warn
              "Caution: LSP server sent invalid character position %s. Using 0 instead."
              col)
             (setq col 0))
-          (funcall eglot-move-to-column-function col)))
+          (funcall movefn col)))
       (if marker (copy-marker (point-marker)) (point)))))

This is the second heart of the patch.

A “good” server will provide :positionEncoding "utf-32", and we'll keep
the workaround variable `eglot-move-to-column-function' at its new default
value of nil.  So columnfn, which is called in the last line of the
chunck, will be bound to `eglot-move-to-column'.

A “bad” server will provide provide :positionEncoding "utf-16" or nil,
and then we will call `eglot-lsp-abiding-column' near the end.

An “inbetween” server will provide :positionEncoding "utf-8" and we will
call the newly added `eglot--move-to-column-utf-8' near the end.

If the user sets `eglot-move-to-column-function' to work around an
issue, nothing changes in relation to the original version.



I hope this helps clarifying things.





^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24  7:18                     ` Augusto Stoffel
@ 2023-02-24  8:38                       ` Eli Zaretskii
  2023-02-24  9:15                         ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24  8:38 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 08:18:30 +0100
> 
> On Fri, 24 Feb 2023 at 08:43, Eli Zaretskii wrote:
> 
> > It does? then please humor me by walking me through the code and the
> > patch to show how that would work after applying the patch.
> 
> +            :general
> +            (list
> +             :positionEncodings ["utf-32" "utf-8" "utf-16"])
>              :experimental eglot--{})))

Is "UTF-32" an LSP thing and terminology?  Because I'd prefer a
different name if we can.  At least for our internal nomenclature,
let's use "codepoint" or "character" instead.

> -(defun eglot-current-column () (- (point) (line-beginning-position)))
> +(defun eglot-current-column ()
> +  "Calculate current column, counting Unicode codepoints."
> +  (- (point) (line-beginning-position)))

Can we please take this opportunity to get rid of the confusing
"column" terminology?  As became evident from this discussion, we are
not talking columns here, we are talking offsets in characters from
BOL.  So something like "pos" or "linepos" or "line-offset" should be
better.

João, are you okay with such a sweeping change in all of eglot.el?

> +(defun eglot--current-column-utf-8 ()
> +  "Calculate current column, counting bytes."
> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))

As discussed, position-bytes is incorrect.  You should instead do
something like

  (length (encode-coding-string
           (buffer-substring-no-properties (point)
                                           (line-beginning-position))
           'utf-8-unix t))

Also, for 100% reliable results, we should bind
inhibit-field-text-motion to t when calling line-beginning-position.

> +(defun eglot--move-to-column-utf-8 (column)
> +  "Move to COLUMN, regarded as a byte offset."
> +  (goto-char (min (byte-to-position
> +                   (+ (position-bytes (line-beginning-position)) column))
> +                  (line-end-position))))

Likewise here.

> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>        (forward-line (min most-positive-fixnum
>                           (plist-get pos-plist :line)))
>        (unless (eobp) ;; if line was excessive leave point at eob
> -        (let ((tab-width 1)
> +        (let ((movefn (or eglot-move-to-column-function
> +                          (pcase (plist-get (eglot--capabilities (eglot-current-server))
> +                                            :positionEncoding)
> +                            ("utf-32" #'eglot-move-to-column)
> +                            ("utf-8" #'eglot--move-to-column-utf-8)
> +                            (_ #'eglot-move-to-lsp-abiding-column))))
> +              (tab-width 1)
                  ^^^^^^^^^^^
This last part shouldn't be necessary: we should move by characters,
not by columns.  Why is it necessary?

> I hope this helps clarifying things.

Yes, thank you very much.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24  8:38                       ` Eli Zaretskii
@ 2023-02-24  9:15                         ` Augusto Stoffel
  2023-02-24 10:20                           ` João Távora
  2023-02-24 11:27                           ` Eli Zaretskii
  0 siblings, 2 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24  9:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 10:38, Eli Zaretskii wrote:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
>> Date: Fri, 24 Feb 2023 08:18:30 +0100
>> 
>> On Fri, 24 Feb 2023 at 08:43, Eli Zaretskii wrote:
>> 
>> > It does? then please humor me by walking me through the code and the
>> > patch to show how that would work after applying the patch.
>> 
>> +            :general
>> +            (list
>> +             :positionEncodings ["utf-32" "utf-8" "utf-16"])
>>              :experimental eglot--{})))
>
> Is "UTF-32" an LSP thing and terminology?  Because I'd prefer a
> different name if we can.  At least for our internal nomenclature,
> let's use "codepoint" or "character" instead.

Yes, this is how the LSP spec refers to the 3 offset counting methods.

>> -(defun eglot-current-column () (- (point) (line-beginning-position)))
>> +(defun eglot-current-column ()
>> +  "Calculate current column, counting Unicode codepoints."
>> +  (- (point) (line-beginning-position)))
>
> Can we please take this opportunity to get rid of the confusing
> "column" terminology?  As became evident from this discussion, we are
> not talking columns here, we are talking offsets in characters from
> BOL.  So something like "pos" or "linepos" or "line-offset" should be
> better.
>
> João, are you okay with such a sweeping change in all of eglot.el?

I like linepos, if João is fine with not making the absolute minimal
amount of changes to the code.

>> +(defun eglot--current-column-utf-8 ()
>> +  "Calculate current column, counting bytes."
>> +  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
>
> As discussed, position-bytes is incorrect.  You should instead do
> something like
>
>   (length (encode-coding-string
>            (buffer-substring-no-properties (point)
>                                            (line-beginning-position))
>            'utf-8-unix t))

But it is incorrect only if the buffer contains characters outside of
the Unicode range, right?  If that happens, we already lost, because a
few steps later we will serialize the buffer text as JSON to send it to
the server:

    (progn
     (insert ?x (max-char) ?y)
     (json-serialize (buffer-substring-no-properties (pos-bol)
     (pos-eol))))

    ⇒ Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p " (json-serialize (buffer-substring-no-properties (...")

> Also, for 100% reliable results, we should bind
> inhibit-field-text-motion to t when calling line-beginning-position.

We should rather be using pos-bol, no?  But how do we keep compatibility
with older Emacsen?

>> +(defun eglot--move-to-column-utf-8 (column)
>> +  "Move to COLUMN, regarded as a byte offset."
>> +  (goto-char (min (byte-to-position
>> +                   (+ (position-bytes (line-beginning-position)) column))
>> +                  (line-end-position))))
>
> Likewise here.
>
>> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>>        (forward-line (min most-positive-fixnum
>>                           (plist-get pos-plist :line)))
>>        (unless (eobp) ;; if line was excessive leave point at eob
>> -        (let ((tab-width 1)
>> +        (let ((movefn (or eglot-move-to-column-function
>> +                          (pcase (plist-get (eglot--capabilities (eglot-current-server))
>> +                                            :positionEncoding)
>> +                            ("utf-32" #'eglot-move-to-column)
>> +                            ("utf-8" #'eglot--move-to-column-utf-8)
>> +                            (_ #'eglot-move-to-lsp-abiding-column))))
>> +              (tab-width 1)
>                   ^^^^^^^^^^^
> This last part shouldn't be necessary: we should move by characters,
> not by columns.  Why is it necessary?

Maybe João can clarify, but I'm pretty sure this is there to support the
UTF-16 way of counting offsets, so this ideally should move to
eglot-move-to-lsp-abiding-column.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24  9:15                         ` Augusto Stoffel
@ 2023-02-24 10:20                           ` João Távora
  2023-02-24 11:01                             ` Augusto Stoffel
  2023-02-24 11:27                           ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 10:20 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 9:15 AM Augusto Stoffel <arstoffel@gmail.com> wrote:
> >                   ^^^^^^^^^^^
> > This last part shouldn't be necessary: we should move by characters,
> > not by columns.  Why is it necessary?
>
> Maybe João can clarify, but I'm pretty sure this is there to support the
> UTF-16 way of counting offsets, so this ideally should move to
> eglot-move-to-lsp-abiding-column.

I have to be brutally honest here: I don't like this patch.  I
appreciate the effort, I really do, and thank for guiding us its
the motions, but there are two main things I really don't like
about it, and 1 that I'm on the fence about.

The first thing I don't like is likely the reason that Eli is
confused here.  The late binding of column-counting strategies
is confusing.  I wrote these functions so that each one has
a separate well-defined, readable-inasmuch-as-possible,
vc-region-history-traceable, performant column-counting
strategy. The "lsp-abiding" naming might be off, I admit, but
only since LSP started supporting more than one strategy.

The second thing I don't like is also due to the late-binding idea.
This is a hotspot in Eglot, some of these functions are called
many many times, for each LSP server interaction depending
on how many document positions are exchanged (and they can
be a lot).  I do remember benchmarking strategies at the time
and seeing a perceptible difference.  Plus, this late-binding is
really useless as a server will guaranteedly _not_ change its
column-counting standard during the LSP session.

The third thing that I'm not crazy with but I don't mind is
the necessity to support the "utf-8" strategy.  If "utf-16"
is mandatory, and we already support "utf-32" anyway, why should
we be adding this additional complexity.  But, if it can be
hidden behind a new pair of functions and Eli accepts it,
I'm OK with it.

Finally, here's a patch that doesn't use late-binding, doesn't
introduce new strategies and supports "utf-32" and "utf-16"
today.  As you can see, the patch is nearly trivial.

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index eea8be6d1aa..ae8afa69651 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -807,6 +807,7 @@ eglot-client-capabilities
              :rangeFormatting    `(:dynamicRegistration :json-false)
              :rename             `(:dynamicRegistration :json-false)
              :inlayHint          `(:dynamicRegistration :json-false)
+             :general            `(:positionEncodings ["utf-32" "utf-16"])
              :publishDiagnostics (list :relatedInformation :json-false
                                        ;; TODO: We can support
:codeDescription after
                                        ;; adding an appropriate UI to
@@ -1789,6 +1790,9 @@ eglot--managed-mode
       (add-hook 'eldoc-documentation-functions #'eglot-signature-eldoc-function
                 nil t)
       (eldoc-mode 1))
+    (when (eq (eglot--server-capable :positionEncoding) "utf-16")
+      (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-column)
+      (eglot--setq-saving eglot-current-column-function
#'eglot-current-column))
     (cl-pushnew (current-buffer) (eglot--managed-buffers
(eglot-current-server))))
    (t
     (remove-hook 'after-change-functions 'eglot--after-change t)

As I said, enhancing this patch with a new pair of "current/move-to"
functions that add in utf-8 support is acceptable.

João

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 10:20                           ` João Távora
@ 2023-02-24 11:01                             ` Augusto Stoffel
  2023-02-24 11:18                               ` João Távora
  2023-02-24 11:38                               ` Eli Zaretskii
  0 siblings, 2 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 11:01 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Fri, 24 Feb 2023 at 10:20, João Távora wrote:

> The second thing I don't like is also due to the late-binding idea.
> This is a hotspot in Eglot, some of these functions are called
> many many times, for each LSP server interaction depending
> on how many document positions are exchanged (and they can
> be a lot).  I do remember benchmarking strategies at the time
> and seeing a perceptible difference.  Plus, this late-binding is
> really useless as a server will guaranteedly _not_ change its
> column-counting standard during the LSP session.

`eglot-lsp-abiding-column' allocates a new string!  I doubt that looking
up a few plists makes much of a difference compared to that.  But when
using the better offset counting styles, I think it might indeed make a
difference.  OTOH it might as well be premature optimization.

> Finally, here's a patch that doesn't use late-binding, doesn't
> introduce new strategies and supports "utf-32" and "utf-16"
> today.  As you can see, the patch is nearly trivial.

I'm fine with that way of doing things, but please respond to my concern
from the other message: do you really want to store a server capability
in a buffer-local variable?  What about your plans to support multiple
servers?

I suggest you to guard against future headaches.  We can store the
offset functions in two slots of the server class if you don't like to
traverse the capabilities plist each time.

> diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
> index eea8be6d1aa..ae8afa69651 100644
> --- a/lisp/progmodes/eglot.el
> +++ b/lisp/progmodes/eglot.el
> @@ -807,6 +807,7 @@ eglot-client-capabilities
>               :rangeFormatting    `(:dynamicRegistration :json-false)
>               :rename             `(:dynamicRegistration :json-false)
>               :inlayHint          `(:dynamicRegistration :json-false)
> +             :general            `(:positionEncodings ["utf-32" "utf-16"])
>               :publishDiagnostics (list :relatedInformation :json-false
>                                         ;; TODO: We can support
> :codeDescription after
>                                         ;; adding an appropriate UI to
> @@ -1789,6 +1790,9 @@ eglot--managed-mode
>        (add-hook 'eldoc-documentation-functions #'eglot-signature-eldoc-function
>                  nil t)
>        (eldoc-mode 1))
> +    (when (eq (eglot--server-capable :positionEncoding) "utf-16")
> +      (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-column)
> +      (eglot--setq-saving eglot-current-column-function
> #'eglot-current-column))
>      (cl-pushnew (current-buffer) (eglot--managed-buffers
> (eglot-current-server))))
>     (t
>      (remove-hook 'after-change-functions 'eglot--after-change t)
>
> As I said, enhancing this patch with a new pair of "current/move-to"
> functions that add in utf-8 support is acceptable.
>
> João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:01                             ` Augusto Stoffel
@ 2023-02-24 11:18                               ` João Távora
  2023-02-24 11:47                                 ` Augusto Stoffel
  2023-02-24 11:38                               ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 11:18 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 11:01 AM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 at 10:20, João Távora wrote:
>
> > The second thing I don't like is also due to the late-binding idea.
> > This is a hotspot in Eglot, some of these functions are called
> > many many times, for each LSP server interaction depending
> > on how many document positions are exchanged (and they can
> > be a lot).  I do remember benchmarking strategies at the time
> > and seeing a perceptible difference.  Plus, this late-binding is
> > really useless as a server will guaranteedly _not_ change its
> > column-counting standard during the LSP session.
>
> `eglot-lsp-abiding-column' allocates a new string!  I doubt that looking
> up a few plists makes much of a difference compared to that.  But when
> using the better offset counting styles, I think it might indeed make a
> difference.  OTOH it might as well be premature optimization.

And this is why we measure.  You can find some scripts for doing that
and some discussion in:

https://github.com/joaotavora/eglot/pull/125

The 2018 issue where this all started.

> > Finally, here's a patch that doesn't use late-binding, doesn't
> > introduce new strategies and supports "utf-32" and "utf-16"
> > today.  As you can see, the patch is nearly trivial.
>
> I'm fine with that way of doing things, but please respond to my concern
> from the other message: do you really want to store a server capability
> in a buffer-local variable?  What about your plans to support multiple
> servers?

The capability is stored in the server object and reflects in the buffer-local
variable which will be restored when the session ends.  I don't see
any problem with that.

My idea of supporting multiple servers is to create a proxy program
that invokes multiple processes and negotiates a common set
of capabilities, so I don't see it as a problem either.  If server A supports
utf-32 and utf-16 and server B only supports utf-16, the multiplexing
server C only declares support for utf-16.

> I suggest you to guard against future headaches.  We can store the
> offset functions in two slots of the server class if you don't like to
> traverse the capabilities plist each time.

No, this is exactly the type of complexity that I strive to avoid in Eglot,
especially when the added value is small.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24  9:15                         ` Augusto Stoffel
  2023-02-24 10:20                           ` João Távora
@ 2023-02-24 11:27                           ` Eli Zaretskii
  2023-02-24 11:43                             ` João Távora
  2023-02-24 12:01                             ` Augusto Stoffel
  1 sibling, 2 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 11:27 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 10:15:48 +0100
> 
> >> -(defun eglot-current-column () (- (point) (line-beginning-position)))
> >> +(defun eglot-current-column ()
> >> +  "Calculate current column, counting Unicode codepoints."
> >> +  (- (point) (line-beginning-position)))
> >
> > Can we please take this opportunity to get rid of the confusing
> > "column" terminology?  As became evident from this discussion, we are
> > not talking columns here, we are talking offsets in characters from
> > BOL.  So something like "pos" or "linepos" or "line-offset" should be
> > better.
> >
> > João, are you okay with such a sweeping change in all of eglot.el?
> 
> I like linepos, if João is fine with not making the absolute minimal
> amount of changes to the code.

João?

> > As discussed, position-bytes is incorrect.  You should instead do
> > something like
> >
> >   (length (encode-coding-string
> >            (buffer-substring-no-properties (point)
> >                                            (line-beginning-position))
> >            'utf-8-unix t))
> 
> But it is incorrect only if the buffer contains characters outside of
> the Unicode range, right?  If that happens, we already lost, because a
> few steps later we will serialize the buffer text as JSON to send it to
> the server:

Why should one part of the code depend on what another part does?  In
my book, each part should do its job, and do it right.

> > Also, for 100% reliable results, we should bind
> > inhibit-field-text-motion to t when calling line-beginning-position.
> 
> We should rather be using pos-bol, no?  But how do we keep compatibility
> with older Emacsen?

Exactly.  pos-bol is Emacs 29 and later, whereas Eglot is available
from ELPA for older versions of Emacs.

> >> +              (tab-width 1)
> >                   ^^^^^^^^^^^
> > This last part shouldn't be necessary: we should move by characters,
> > not by columns.  Why is it necessary?
> 
> Maybe João can clarify, but I'm pretty sure this is there to support the
> UTF-16 way of counting offsets, so this ideally should move to
> eglot-move-to-lsp-abiding-column.

Then perhaps the UTF-16 way of counting offsets should be changed as
well.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:01                             ` Augusto Stoffel
  2023-02-24 11:18                               ` João Távora
@ 2023-02-24 11:38                               ` Eli Zaretskii
  2023-02-24 11:55                                 ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 11:38 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 12:01:48 +0100
> 
> On Fri, 24 Feb 2023 at 10:20, João Távora wrote:
> 
> > The second thing I don't like is also due to the late-binding idea.
> > This is a hotspot in Eglot, some of these functions are called
> > many many times, for each LSP server interaction depending
> > on how many document positions are exchanged (and they can
> > be a lot).  I do remember benchmarking strategies at the time
> > and seeing a perceptible difference.  Plus, this late-binding is
> > really useless as a server will guaranteedly _not_ change its
> > column-counting standard during the LSP session.
> 
> `eglot-lsp-abiding-column' allocates a new string!

If that is a concern, eglot.el could use a private temporary buffer
into which the encoded text is inserted, eliminating the need to call
'length'.  The impact of that in performance should be measured, of
course, to make sure it doesn't make code slower; it will definitely
improve the GC pressure aspect.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:27                           ` Eli Zaretskii
@ 2023-02-24 11:43                             ` João Távora
  2023-02-24 11:57                               ` Eli Zaretskii
  2023-02-24 12:01                             ` Augusto Stoffel
  1 sibling, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 11:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Fri, Feb 24, 2023 at 11:27 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Augusto Stoffel <arstoffel@gmail.com>
> > Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> > Date: Fri, 24 Feb 2023 10:15:48 +0100
> >
> > >> -(defun eglot-current-column () (- (point) (line-beginning-position)))
> > >> +(defun eglot-current-column ()
> > >> +  "Calculate current column, counting Unicode codepoints."
> > >> +  (- (point) (line-beginning-position)))
> > >
> > > Can we please take this opportunity to get rid of the confusing
> > > "column" terminology?  As became evident from this discussion, we are
> > > not talking columns here, we are talking offsets in characters from
> > > BOL.  So something like "pos" or "linepos" or "line-offset" should be
> > > better.
> > >
> > > João, are you okay with such a sweeping change in all of eglot.el?
> >
> > I like linepos, if João is fine with not making the absolute minimal
> > amount of changes to the code.
>
> João?

eglot-current-column is a user-visible function.  would need
obsoletion aliases.  Are you sure it isn't better just to add some
clarifying comments?  I fear for my ability to recall details about this
code with such a sweeping rename, accuracy-improving as it may be.
So it's a balancing act, your call.

>
> > >> +              (tab-width 1)
> > >                   ^^^^^^^^^^^
> > > This last part shouldn't be necessary: we should move by characters,
> > > not by columns.  Why is it necessary?
> >
> > Maybe João can clarify, but I'm pretty sure this is there to support the
> > UTF-16 way of counting offsets, so this ideally should move to
> > eglot-move-to-lsp-abiding-column.
>
> Then perhaps the UTF-16 way of counting offsets should be changed as
> well.

I've vc-region-history'ed it to:

commit 2cf7905887f2137869f44c3383a55636e38b4b81
Author: Michal Krzywkowski <k.michal@zoho.com>
Date:   Mon Nov 19 21:22:14 2018 +0100

    Treat tab characters as 1 column wide in position conversion functions

    Fixes https://github.com/joaotavora/eglot/issues/158.

    * eglot.el (eglot--pos-to-lsp-position): Call
      eglot-current-column-function with tab-width bound to 1.
    (eglot--lsp-position-to-point): Call eglot-move-to-column-function
    with tab-width bound to 1.

Following the link, I read this there:

  "This is because move-to-column and current-column count each tab
character as
   tab-width chars."

And, as far as I remember, at the time we were indeed using move-to-column and
current-column.  But now we aren't anymore.

So maybe, just maybe, this can be removed.  And the full test suite must
run afterwards.  And then probably more tests should be added.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:18                               ` João Távora
@ 2023-02-24 11:47                                 ` Augusto Stoffel
  2023-02-24 12:05                                   ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 11:47 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Fri, 24 Feb 2023 at 11:18, João Távora wrote:

> The capability is stored in the server object and reflects in the buffer-local
> variable which will be restored when the session ends.  I don't see
> any problem with that.

No problem, let's use your approach then.

>> I suggest you to guard against future headaches.  We can store the
>> offset functions in two slots of the server class if you don't like to
>> traverse the capabilities plist each time.
>
> No, this is exactly the type of complexity that I strive to avoid in Eglot,
> especially when the added value is small.

You see how “complexity” can be a subjective perception...  To me,
entangling things is a hallmark of complexity, and you are entangling
server information with buffer information here.  OTOH, adding a slot
that is set in 1 place and read in 1 place doesn't feel like complexity
at all to me.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:38                               ` Eli Zaretskii
@ 2023-02-24 11:55                                 ` João Távora
  0 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-24 11:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Fri, Feb 24, 2023 at 11:38 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > `eglot-lsp-abiding-column' allocates a new string!
>
> If that is a concern, eglot.el could use a private temporary buffer
> into which the encoded text is inserted, eliminating the need to call
> 'length'.  The impact of that in performance should be measured, of
> course, to make sure it doesn't make code slower; it will definitely
> improve the GC pressure aspect.

Indeed, maybe something like this (untested):

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index e20d209332d..78e0f9c1f0c 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -1454,11 +1454,15 @@ eglot-current-column-function
 (defun eglot-lsp-abiding-column (&optional lbp)
   "Calculate current COLUMN as defined by the LSP spec.
 LBP defaults to `line-beginning-position'."
-  (/ (- (length (encode-coding-region (or lbp (line-beginning-position))
-                                      ;; Fix github#860
-                                      (min (point) (point-max)) 'utf-16 t))
-        2)
-     2))
+  (let ((measure (with-current-buffer (get-buffer-create "
*eglot-utf16-measure*")
+                   (erase-buffer)
+                   (current-buffer))))
+    (/ (- (encode-coding-region (or lbp (line-beginning-position))
+                                ;; Fix github#860
+                                (min (point) (point-max)) 'utf-16
+                                measure)
+          2)
+       2)))

 (defun eglot--pos-to-lsp-position (&optional pos)
   "Convert point POS to LSP position."





^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:43                             ` João Távora
@ 2023-02-24 11:57                               ` Eli Zaretskii
  2023-02-24 12:09                                 ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 11:57 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 24 Feb 2023 11:43:18 +0000
> Cc: Augusto Stoffel <arstoffel@gmail.com>, 61726@debbugs.gnu.org
> 
> > > I like linepos, if João is fine with not making the absolute minimal
> > > amount of changes to the code.
> >
> > João?
> 
> eglot-current-column is a user-visible function.  would need
> obsoletion aliases.

Yes.  But that is not a problem, from my POV.

> Are you sure it isn't better just to add some clarifying comments?

I'm okay with that as well, but the clarifications should be in doc
strings of public functions as well, if we go that way.

> > > >> +              (tab-width 1)
> > > >                   ^^^^^^^^^^^
> > > > This last part shouldn't be necessary: we should move by characters,
> > > > not by columns.  Why is it necessary?
> > >
> > > Maybe João can clarify, but I'm pretty sure this is there to support the
> > > UTF-16 way of counting offsets, so this ideally should move to
> > > eglot-move-to-lsp-abiding-column.
> >
> > Then perhaps the UTF-16 way of counting offsets should be changed as
> > well.
> 
> I've vc-region-history'ed it to:
> 
> commit 2cf7905887f2137869f44c3383a55636e38b4b81
> Author: Michal Krzywkowski <k.michal@zoho.com>
> Date:   Mon Nov 19 21:22:14 2018 +0100
> 
>     Treat tab characters as 1 column wide in position conversion functions
> 
>     Fixes https://github.com/joaotavora/eglot/issues/158.
> 
>     * eglot.el (eglot--pos-to-lsp-position): Call
>       eglot-current-column-function with tab-width bound to 1.
>     (eglot--lsp-position-to-point): Call eglot-move-to-column-function
>     with tab-width bound to 1.
> 
> Following the link, I read this there:
> 
>   "This is because move-to-column and current-column count each tab
> character as
>    tab-width chars."
> 
> And, as far as I remember, at the time we were indeed using move-to-column and
> current-column.  But now we aren't anymore.
> 
> So maybe, just maybe, this can be removed.  And the full test suite must
> run afterwards.  And then probably more tests should be added.

How about removing it on master?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:27                           ` Eli Zaretskii
  2023-02-24 11:43                             ` João Távora
@ 2023-02-24 12:01                             ` Augusto Stoffel
  2023-02-24 12:16                               ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 12:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 13:27, Eli Zaretskii wrote:

>> > As discussed, position-bytes is incorrect.  You should instead do
>> > something like
>> >
>> >   (length (encode-coding-string
>> >            (buffer-substring-no-properties (point)
>> >                                            (line-beginning-position))
>> >            'utf-8-unix t))
>> 
>> But it is incorrect only if the buffer contains characters outside of
>> the Unicode range, right?  If that happens, we already lost, because a
>> few steps later we will serialize the buffer text as JSON to send it to
>> the server:
>
> Why should one part of the code depend on what another part does?  In
> my book, each part should do its job, and do it right.

Arguably both implementations are wrong, and the correct one should
produce an error if the buffer substring cannot be converted to valid
UTF-8.

Between two “wrong” but perfectly functional implementations, I'd choose
the more efficient one, because efficiency actually matters in this
case.  So which one is more efficient?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:47                                 ` Augusto Stoffel
@ 2023-02-24 12:05                                   ` João Távora
  2023-02-24 12:14                                     ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 12:05 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 11:47 AM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 at 11:18, João Távora wrote:
>
> > The capability is stored in the server object and reflects in the buffer-local
> > variable which will be restored when the session ends.  I don't see
> > any problem with that.
>
> No problem, let's use your approach then.
>
> >> I suggest you to guard against future headaches.  We can store the
> >> offset functions in two slots of the server class if you don't like to
> >> traverse the capabilities plist each time.
> >
> > No, this is exactly the type of complexity that I strive to avoid in Eglot,
> > especially when the added value is small.
>
> You see how “complexity” can be a subjective perception...  To me,
> entangling things is a hallmark of complexity, and you are entangling
> server information with buffer information here.  OTOH, adding a slot
> that is set in 1 place and read in 1 place doesn't feel like complexity
> at all to me.

Very true, it's subjective.  But both of our approaches are using
multiple levels of removal from the source of truth.  The source of
truth is in the server, then it's cached in eglot--capabilities.  Then
I propose another level, cache it a buffer-local value, and you
also propose another level, cache it an additional slot of
the server.

And don't forget that the "server" is _also_ hiding behind an
indirection, the eglot--current-server function and buffer
variable.

What IMO makes your solution more complex is that the new alternate
place of caching will not cause eglot-move-to-column-function and
eglot-current-column-function to be deleted.  We can't delete, even
if we wanted to, because of backward compatibility.  If you could,
I would agree that our two solutions are of similar complexity.  But
that's not the reality.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 11:57                               ` Eli Zaretskii
@ 2023-02-24 12:09                                 ` João Távora
  2023-02-24 12:18                                   ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 12:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, arstoffel

On Fri, Feb 24, 2023 at 11:57 AM Eli Zaretskii <eliz@gnu.org> wrote:

> > eglot-current-column is a user-visible function.  would need
> > obsoletion aliases.
>
> Yes.  But that is not a problem, from my POV.

True.

> > Are you sure it isn't better just to add some clarifying comments?
>
> I'm okay with that as well, but the clarifications should be in doc
> strings of public functions as well, if we go that way.

Yes, makes sense.  I'll let you decide what is cleaner: sweeping
renames or docstrings.  And thanks in advance.

> > And, as far as I remember, at the time we were indeed using move-to-column and
> > current-column.  But now we aren't anymore.
> >
> > So maybe, just maybe, this can be removed.  And the full test suite must
> > run afterwards.  And then probably more tests should be added.
>
> How about removing it on master?

Doesn't make much of a difference because ELPA :core package.
If the change is harmful, that harm will be visible "soon".
I think we can just add some tests to eglot-tests.el and
test the problematic case collected from the old bug tracker,
if it's not being tested already.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:05                                   ` João Távora
@ 2023-02-24 12:14                                     ` Augusto Stoffel
  0 siblings, 0 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 12:14 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Fri, 24 Feb 2023 at 12:05, João Távora wrote:

> What IMO makes your solution more complex is that the new alternate
> place of caching will not cause eglot-move-to-column-function and
> eglot-current-column-function to be deleted.  We can't delete, even
> if we wanted to, because of backward compatibility.  If you could,
> I would agree that our two solutions are of similar complexity.  But
> that's not the reality.

You might have missed it in this long thread, but my proposal was to
obsolete eglot-move-to-column-function and eglot-current-column-function
now and remove them in after several Emacs releases.

The rationale is that these vars were introduced to work around
nonconforming servers.  With the new LSP capability, there should be no
excuse for nonconforming servers to exist.  They should all adapt in the
medium term.

So yes, I have no problem admitting my approach looks uglier today, but
it's clear to me that it would lead to a cleaner result in 2030.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:01                             ` Augusto Stoffel
@ 2023-02-24 12:16                               ` Eli Zaretskii
  2023-02-24 12:35                                 ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 12:16 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 13:01:16 +0100
> 
> On Fri, 24 Feb 2023 at 13:27, Eli Zaretskii wrote:
> 
> >> > As discussed, position-bytes is incorrect.  You should instead do
> >> > something like
> >> >
> >> >   (length (encode-coding-string
> >> >            (buffer-substring-no-properties (point)
> >> >                                            (line-beginning-position))
> >> >            'utf-8-unix t))
> >> 
> >> But it is incorrect only if the buffer contains characters outside of
> >> the Unicode range, right?  If that happens, we already lost, because a
> >> few steps later we will serialize the buffer text as JSON to send it to
> >> the server:
> >
> > Why should one part of the code depend on what another part does?  In
> > my book, each part should do its job, and do it right.
> 
> Arguably both implementations are wrong, and the correct one should
> produce an error if the buffer substring cannot be converted to valid
> UTF-8.

You assume that the characters that aren't encodable in UTF-8 somehow
invalidate the results produced by the LSP?  But that is not
necessarily true, it depends on the context.  IOW, this is not the
problem eglot.el should solve, and I'm not sure that signaling an
error is the correct reaction to this situation.  It is basically the
problem of the user and/or the major mode.  Eglot should do its best
to cope, and leave the rest to the user.

> Between two “wrong” but perfectly functional implementations, I'd choose
> the more efficient one, because efficiency actually matters in this
> case.  So which one is more efficient?

I don't know, and I don't think efficiency is the main concern here.
The main concern, from my POV, is exposing the internal representation
of buffer text to the outer world.  What if we decide to change the
internal representation at some future date?  It already happened,
twice, in Emacs history; it can happen again, even though its
unlikely.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:09                                 ` João Távora
@ 2023-02-24 12:18                                   ` Eli Zaretskii
  2023-02-24 12:31                                     ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 12:18 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 24 Feb 2023 12:09:14 +0000
> Cc: arstoffel@gmail.com, 61726@debbugs.gnu.org
> 
> On Fri, Feb 24, 2023 at 11:57 AM Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > > eglot-current-column is a user-visible function.  would need
> > > obsoletion aliases.
> >
> > Yes.  But that is not a problem, from my POV.
> 
> True.
> 
> > > Are you sure it isn't better just to add some clarifying comments?
> >
> > I'm okay with that as well, but the clarifications should be in doc
> > strings of public functions as well, if we go that way.
> 
> Yes, makes sense.  I'll let you decide what is cleaner: sweeping
> renames or docstrings.  And thanks in advance.

I'm not doing the job, so I think this is up to Augusto.

> > > So maybe, just maybe, this can be removed.  And the full test suite must
> > > run afterwards.  And then probably more tests should be added.
> >
> > How about removing it on master?
> 
> Doesn't make much of a difference because ELPA :core package.
> If the change is harmful, that harm will be visible "soon".
> I think we can just add some tests to eglot-tests.el and
> test the problematic case collected from the old bug tracker,
> if it's not being tested already.

Fine by me.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:18                                   ` Eli Zaretskii
@ 2023-02-24 12:31                                     ` Augusto Stoffel
  0 siblings, 0 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 12:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, João Távora

On Fri, 24 Feb 2023 at 14:18, Eli Zaretskii wrote:

>> Yes, makes sense.  I'll let you decide what is cleaner: sweeping
>> renames or docstrings.  And thanks in advance.
>
> I'm not doing the job, so I think this is up to Augusto.

So I'll provide the new feature using the old-style names, and then we
can see about names and docstrings afterwards.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:16                               ` Eli Zaretskii
@ 2023-02-24 12:35                                 ` Augusto Stoffel
  2023-02-24 12:55                                   ` João Távora
  2023-02-24 13:34                                   ` Eli Zaretskii
  0 siblings, 2 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 12:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 14:16, Eli Zaretskii wrote:

> You assume that the characters that aren't encodable in UTF-8 somehow
> invalidate the results produced by the LSP?  But that is not
> necessarily true, it depends on the context.  IOW, this is not the
> problem eglot.el should solve, and I'm not sure that signaling an
> error is the correct reaction to this situation.  It is basically the
> problem of the user and/or the major mode.  Eglot should do its best
> to cope, and leave the rest to the user.

You can't even send or receive a message through the JSONRPC channel if
it's not valid UTF-8, and `json-serialize' rightfully emits an error.
So there's nothing Eglot can do to cope.  It also should not, IMO,
because we don't know how the server will respond, and we must trust the
server when it tell us to do destructive operations like adding or
deleting text.

> I don't know, and I don't think efficiency is the main concern here.

Okay, but João expressed his concerned with efficiency here.

> The main concern, from my POV, is exposing the internal representation
> of buffer text to the outer world.  What if we decide to change the
> internal representation at some future date?  It already happened,
> twice, in Emacs history; it can happen again, even though its
> unlikely.

Here you have a good point.  LSP got into this mess in the first place
because of exposure of JavaScript internals.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:35                                 ` Augusto Stoffel
@ 2023-02-24 12:55                                   ` João Távora
  2023-02-24 13:34                                   ` Eli Zaretskii
  1 sibling, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-24 12:55 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 12:35 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 at 14:16, Eli Zaretskii wrote:
>
> > You assume that the characters that aren't encodable in UTF-8 somehow
> > invalidate the results produced by the LSP?  But that is not
> > necessarily true, it depends on the context.  IOW, this is not the
> > problem eglot.el should solve, and I'm not sure that signaling an
> > error is the correct reaction to this situation.  It is basically the
> > problem of the user and/or the major mode.  Eglot should do its best
> > to cope, and leave the rest to the user.
>
> You can't even send or receive a message through the JSONRPC channel if
> it's not valid UTF-8, and `json-serialize' rightfully emits an error.
> So there's nothing Eglot can do to cope.  It also should not, IMO,
> because we don't know how the server will respond, and we must trust the
> server when it tell us to do destructive operations like adding or
> deleting text.
>
> > I don't know, and I don't think efficiency is the main concern here.
>
> Okay, but João expressed his concerned with efficiency here.

It's _a_ concern.  And so is clarity.  I don't want to have
unneeded late-binding.  These functions are called very
often and the continuous advance of LSP with more features
like say, server-calculated fontification will make us
rely even more heavily on them, almost like we rely on a
fast goto-char.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 12:35                                 ` Augusto Stoffel
  2023-02-24 12:55                                   ` João Távora
@ 2023-02-24 13:34                                   ` Eli Zaretskii
  2023-02-24 13:45                                     ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 13:34 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 13:35:37 +0100
> 
> On Fri, 24 Feb 2023 at 14:16, Eli Zaretskii wrote:
> 
> > You assume that the characters that aren't encodable in UTF-8 somehow
> > invalidate the results produced by the LSP?  But that is not
> > necessarily true, it depends on the context.  IOW, this is not the
> > problem eglot.el should solve, and I'm not sure that signaling an
> > error is the correct reaction to this situation.  It is basically the
> > problem of the user and/or the major mode.  Eglot should do its best
> > to cope, and leave the rest to the user.
> 
> You can't even send or receive a message through the JSONRPC channel if
> it's not valid UTF-8

That's because we _decided_ to behave like that.  A decision that is
not carved in stone.

> and `json-serialize' rightfully emits an error.

There's no "rightfully" here.  It's our decision to signal an error in
this case.  Substituting some innocent character for the unencodable
ones would be an entirely legitimate alternative.

> So there's nothing Eglot can do to cope.

I'm talking about Emacs as a whole.  Eglot shouldn't second-guess what
we might decide some day, definitely not in areas that are not
specific to Eglot, such as JSON.

> It also should not, IMO, because we don't know how the server will
> respond, and we must trust the server when it tell us to do
> destructive operations like adding or deleting text.

See above: replacing problematic characters would also solve this
problem.  Some other applications out there actually behave like that.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 13:34                                   ` Eli Zaretskii
@ 2023-02-24 13:45                                     ` João Távora
  2023-02-24 13:51                                       ` Eli Zaretskii
  2023-02-24 14:54                                       ` Augusto Stoffel
  0 siblings, 2 replies; 82+ messages in thread
From: João Távora @ 2023-02-24 13:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Fri, Feb 24, 2023 at 1:34 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Augusto Stoffel <arstoffel@gmail.com>
> > Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> > Date: Fri, 24 Feb 2023 13:35:37 +0100
> >
> > On Fri, 24 Feb 2023 at 14:16, Eli Zaretskii wrote:
> >
> > > You assume that the characters that aren't encodable in UTF-8 somehow
> > > invalidate the results produced by the LSP?  But that is not
> > > necessarily true, it depends on the context.  IOW, this is not the
> > > problem eglot.el should solve, and I'm not sure that signaling an
> > > error is the correct reaction to this situation.  It is basically the
> > > problem of the user and/or the major mode.  Eglot should do its best
> > > to cope, and leave the rest to the user.
> >
> > You can't even send or receive a message through the JSONRPC channel if
> > it's not valid UTF-8

> That's because we _decided_ to behave like that.  A decision that is
> not carved in stone.

At least or LSP, it seems it must be UTF-8:

  https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#contentPart

"... utf-8, which is the only encoding supported right now. If a
server or client
receives a header with a different encoding than utf-8 it should respond
with an error."

No problem with making jsonrpc.el support more encodings, but for LSP
it must be UTF-8.

I don't see how this is relevant to the code-point counting problem here,
though.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 13:45                                     ` João Távora
@ 2023-02-24 13:51                                       ` Eli Zaretskii
  2023-02-24 14:45                                         ` Augusto Stoffel
  2023-02-24 14:54                                       ` Augusto Stoffel
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 13:51 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 24 Feb 2023 13:45:23 +0000
> Cc: Augusto Stoffel <arstoffel@gmail.com>, 61726@debbugs.gnu.org
> 
> On Fri, Feb 24, 2023 at 1:34 PM Eli Zaretskii <eliz@gnu.org> wrote:
> >
> > > You can't even send or receive a message through the JSONRPC channel if
> > > it's not valid UTF-8
> 
> > That's because we _decided_ to behave like that.  A decision that is
> > not carved in stone.
> 
> At least or LSP, it seems it must be UTF-8:

This is a misunderstanding: I didn't mean to say that we should send
invalid UTF-8 sequences.  I meant something else.  Quote from the rest
of my message:

> > and `json-serialize' rightfully emits an error.
> 
> There's no "rightfully" here.  It's our decision to signal an error in
> this case.  Substituting some innocent character for the unencodable
> ones would be an entirely legitimate alternative.
> [...]
> See above: replacing problematic characters would also solve this
> problem.  Some other applications out there actually behave like that.

That's the alternative: replace the unencodable character with some
replacement.

> I don't see how this is relevant to the code-point counting problem here,
> though.

It started when I said we cannot count bytes in the internal
representation of text when we need to produce UTF-8 bytes.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 13:51                                       ` Eli Zaretskii
@ 2023-02-24 14:45                                         ` Augusto Stoffel
  2023-02-24 15:19                                           ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 14:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, João Távora

On Fri, 24 Feb 2023 at 15:51, Eli Zaretskii wrote:

>> > > You can't even send or receive a message through the JSONRPC channel if
>> > > it's not valid UTF-8
>> 
>> > That's because we _decided_ to behave like that.  A decision that is
>> > not carved in stone.
>> 
>> At least or LSP, it seems it must be UTF-8:
>
> This is a misunderstanding: I didn't mean to say that we should send
> invalid UTF-8 sequences.  I meant something else.  Quote from the rest
> of my message:
>
>> > and `json-serialize' rightfully emits an error.
>> 
>> There's no "rightfully" here.  It's our decision to signal an error in
>> this case.

In fact, our decision was to follow the JSON specification, which says
UTF-8 is the only allowed exchange encoding:
https://www.rfc-editor.org/rfc/rfc8259#section-8.1

>> Substituting some innocent character for the unencodable
>> ones would be an entirely legitimate alternative.

So actually the answer here is no.  You can save arbitrary bytes in a
file in your laptop and call it data.json.  But it you pass some data to
someone else and promise it's in JSON format, then it _must_ be UTF-8
encoded.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 13:45                                     ` João Távora
  2023-02-24 13:51                                       ` Eli Zaretskii
@ 2023-02-24 14:54                                       ` Augusto Stoffel
  2023-02-24 15:23                                         ` Eli Zaretskii
  2023-02-24 16:34                                         ` João Távora
  1 sibling, 2 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 14:54 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Fri, 24 Feb 2023 at 13:45, João Távora wrote:

> I don't see how this is relevant to the code-point counting problem here,
> though.

The relevance is in how the offset-counting function should proceed when
the buffer is not UTF-8 encodable:

- My patch has a garbage-in garbage-out behavior.

- Eli made a suggestion he deems more correct, but is more expensive.
  Since JSON must be UTF-8 encoded for data exchange purposes, I don't
  see the benefit, and in fact I consider this option just a different
  choice of garbage-in garbage-out behavior.

- A third option is to make the function throw an error.  However, if
  the buffer is not UTF-8 encodable, we already get an error from
  json-serialize, because it doesn't let you generate invalid JSON:

    (progn
     (insert ?" (max-char) ?")
     (json-serialize (buffer-substring-no-properties (pos-bol)
     (pos-eol))))

     ⇒ Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p "     \"\377\"")





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 14:45                                         ` Augusto Stoffel
@ 2023-02-24 15:19                                           ` Eli Zaretskii
  2023-02-24 15:52                                             ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 15:19 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: João Távora <joaotavora@gmail.com>,
>   61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 15:45:30 +0100
> 
> On Fri, 24 Feb 2023 at 15:51, Eli Zaretskii wrote:
> 
> > This is a misunderstanding: I didn't mean to say that we should send
> > invalid UTF-8 sequences.  I meant something else.  Quote from the rest
> > of my message:
> >
> >> > and `json-serialize' rightfully emits an error.
> >> 
> >> There's no "rightfully" here.  It's our decision to signal an error in
> >> this case.
> 
> In fact, our decision was to follow the JSON specification, which says
> UTF-8 is the only allowed exchange encoding:
> https://www.rfc-editor.org/rfc/rfc8259#section-8.1
> 
> >> Substituting some innocent character for the unencodable
> >> ones would be an entirely legitimate alternative.
> 
> So actually the answer here is no.  You can save arbitrary bytes in a
> file in your laptop and call it data.json.  But it you pass some data to
> someone else and promise it's in JSON format, then it _must_ be UTF-8
> encoded.

I'm bewildered by my apparent inability to explain what I mean.  So
let me try with an example.  Suppose the buffer text in question is

   abcde\201xyz

where \201 is a raw byte.  I'm saying that, instead of signaling an
error, we could send to the server the string

   abcde xyz

where the \201 byte was replaced by the SPC character.  The latter
string is, of course, perfectly correct UTF-8 sequence, and so doesn't
violate any specs.

The SPC character as a replacement is, of course, just one example.
We could instead use '?' or U+FFFD REPLACEMENT CHARACTER, or anything
else, and all of those replacements can be encoded in UTF-8 without
any problems.

Did I make myself clear now?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 14:54                                       ` Augusto Stoffel
@ 2023-02-24 15:23                                         ` Eli Zaretskii
  2023-02-24 15:56                                           ` Augusto Stoffel
  2023-02-24 16:34                                         ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 15:23 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 15:54:46 +0100
> 
> On Fri, 24 Feb 2023 at 13:45, João Távora wrote:
> 
> > I don't see how this is relevant to the code-point counting problem here,
> > though.
> 
> The relevance is in how the offset-counting function should proceed when
> the buffer is not UTF-8 encodable:
> 
> - My patch has a garbage-in garbage-out behavior.
> 
> - Eli made a suggestion he deems more correct, but is more expensive.

It is not more expensive.  The underlying functions we call are
already capable of producing replacements instead of characters that
cannot be encoded in UTF-8.  We just don't use those capabilities in
json.c because some of us had strong feelings about signaling an error
in these situations.

>   Since JSON must be UTF-8 encoded for data exchange purposes, I don't
>   see the benefit, and in fact I consider this option just a different
>   choice of garbage-in garbage-out behavior.

The benefit could be that Emacs is more lenient in face of such
situations, and doesn't signal errors from very low-level code which
has no idea about the context.

> - A third option is to make the function throw an error.  However, if
>   the buffer is not UTF-8 encodable, we already get an error from
>   json-serialize, because it doesn't let you generate invalid JSON:
> 
>     (progn
>      (insert ?" (max-char) ?")
>      (json-serialize (buffer-substring-no-properties (pos-bol)
>      (pos-eol))))
> 
>      ⇒ Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p "     \"\377\"")

What if json-serialize stops signaling an error at some point, and
instead uses the replacement mechanism I mentioned above?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 15:19                                           ` Eli Zaretskii
@ 2023-02-24 15:52                                             ` Augusto Stoffel
  2023-02-24 16:01                                               ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 15:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 17:19, Eli Zaretskii wrote:

> I'm bewildered by my apparent inability to explain what I mean.  So
> let me try with an example.  Suppose the buffer text in question is
>
>    abcde\201xyz
>
> where \201 is a raw byte.  I'm saying that, instead of signaling an
> error, we could send to the server the string
>
>    abcde xyz
>
> where the \201 byte was replaced by the SPC character.  The latter
> string is, of course, perfectly correct UTF-8 sequence, and so doesn't
> violate any specs.
>
> The SPC character as a replacement is, of course, just one example.
> We could instead use '?' or U+FFFD REPLACEMENT CHARACTER, or anything
> else, and all of those replacements can be encoded in UTF-8 without
> any problems.
>
> Did I make myself clear now?

You made yourself clear the first time.  What I don't understand is, why
do you think this is a good idea, because in my view it clearly isn't.

So suppose we lie about the buffer content and say it's "abcde xyz".
Then the server sends a diagnostic saying "found unexpected space
character at column 6".  What sense does it make to the user?

Even worse, imagine we then request instructions to reformat the buffer.
Suppose that the replacement "abcde xyz" -> "abcde\nxyz" is meaningful
in our language but the replacement "abcde\201xyz" -> "abcde\nxyz" is
dangerous.  Do we want to get into this kind of trouble?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 15:23                                         ` Eli Zaretskii
@ 2023-02-24 15:56                                           ` Augusto Stoffel
  2023-02-24 17:02                                             ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 15:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 17:23, Eli Zaretskii wrote:

> It is not more expensive.  The underlying functions we call are
> already capable of producing replacements instead of characters that
> cannot be encoded in UTF-8.  We just don't use those capabilities in
> json.c because some of us had strong feelings about signaling an error
> in these situations.

I didn't track down byte-to-position and position-bytes in the C code.
So you are saying they're as expensive as encode-coding-string?  If (and
only if) this is the case, then I take back my point :-).

> What if json-serialize stops signaling an error at some point, and
> instead uses the replacement mechanism I mentioned above?

In the other message I explained why I think this is exactly what we
should _not_ do.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 15:52                                             ` Augusto Stoffel
@ 2023-02-24 16:01                                               ` Eli Zaretskii
  2023-02-24 16:39                                                 ` Augusto Stoffel
  2023-02-24 18:08                                                 ` Augusto Stoffel
  0 siblings, 2 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 16:01 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 16:52:27 +0100
> 
> >    abcde xyz
> >
> > where the \201 byte was replaced by the SPC character.  The latter
> > string is, of course, perfectly correct UTF-8 sequence, and so doesn't
> > violate any specs.
> >
> > The SPC character as a replacement is, of course, just one example.
> > We could instead use '?' or U+FFFD REPLACEMENT CHARACTER, or anything
> > else, and all of those replacements can be encoded in UTF-8 without
> > any problems.
> >
> > Did I make myself clear now?
> 
> You made yourself clear the first time.  What I don't understand is, why
> do you think this is a good idea, because in my view it clearly isn't.
> 
> So suppose we lie about the buffer content and say it's "abcde xyz".
> Then the server sends a diagnostic saying "found unexpected space
> character at column 6".  What sense does it make to the user?

How can that happen?  Raw bytes can be in comments and in strings, and
basically nowhere else in a program.  How would the server decide that
a space is not valid in these contexts?

> Even worse, imagine we then request instructions to reformat the buffer.
> Suppose that the replacement "abcde xyz" -> "abcde\nxyz" is meaningful
> in our language but the replacement "abcde\201xyz" -> "abcde\nxyz" is
> dangerous.  Do we want to get into this kind of trouble?

"Dangerous" in what way?

And your objections/fears seem to circle around the special traits of
the SPC character, so what if instead of SPC we use U+FFFD?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 14:54                                       ` Augusto Stoffel
  2023-02-24 15:23                                         ` Eli Zaretskii
@ 2023-02-24 16:34                                         ` João Távora
  2023-02-24 17:06                                           ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-24 16:34 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 2:54 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Fri, 24 Feb 2023 at 13:45, João Távora wrote:
>
> > I don't see how this is relevant to the code-point counting problem here,
> > though.
>
> The relevance is in how the offset-counting function should proceed when
> the buffer is not UTF-8 encodable.

OK but this is just for the forthcoming utf-8 pair of functions
eglot--move-to-linepos-utf8 and eglot--current-linepos-utf8, right?

It doesn't affect the current utf16 and utf32 (move by codepoint)
pairs of functions, does it?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 16:01                                               ` Eli Zaretskii
@ 2023-02-24 16:39                                                 ` Augusto Stoffel
  2023-02-24 17:07                                                   ` Eli Zaretskii
  2023-02-24 18:08                                                 ` Augusto Stoffel
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 16:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Fri, 24 Feb 2023 at 18:01, Eli Zaretskii wrote:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
>> Date: Fri, 24 Feb 2023 16:52:27 +0100
>> 
>> >    abcde xyz
>> >
>> > where the \201 byte was replaced by the SPC character.  The latter
>> > string is, of course, perfectly correct UTF-8 sequence, and so doesn't
>> > violate any specs.
>> >
>> > The SPC character as a replacement is, of course, just one example.
>> > We could instead use '?' or U+FFFD REPLACEMENT CHARACTER, or anything
>> > else, and all of those replacements can be encoded in UTF-8 without
>> > any problems.
>> >
>> > Did I make myself clear now?
>> 
>> You made yourself clear the first time.  What I don't understand is, why
>> do you think this is a good idea, because in my view it clearly isn't.
>> 
>> So suppose we lie about the buffer content and say it's "abcde xyz".
>> Then the server sends a diagnostic saying "found unexpected space
>> character at column 6".  What sense does it make to the user?
>
> How can that happen?  Raw bytes can be in comments and in strings, and
> basically nowhere else in a program.  How would the server decide that
> a space is not valid in these contexts?
>
>> Even worse, imagine we then request instructions to reformat the buffer.
>> Suppose that the replacement "abcde xyz" -> "abcde\nxyz" is meaningful
>> in our language but the replacement "abcde\201xyz" -> "abcde\nxyz" is
>> dangerous.  Do we want to get into this kind of trouble?
>
> "Dangerous" in what way?

If someone thinks this is a good and useful feature, of course they
could work on it.  I think the current behavior (json-serialize error)
is better because it sticks to the letter of the specs.

My point of view is very straightforward: you can have language servers;
you can have arbitrary bytes in your files and buffers; but you can't
always have both.  But I'm sure I've already communicated this, sorry
for the repetition.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 15:56                                           ` Augusto Stoffel
@ 2023-02-24 17:02                                             ` Eli Zaretskii
  0 siblings, 0 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 17:02 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 16:56:57 +0100
> 
> On Fri, 24 Feb 2023 at 17:23, Eli Zaretskii wrote:
> 
> > It is not more expensive.  The underlying functions we call are
> > already capable of producing replacements instead of characters that
> > cannot be encoded in UTF-8.  We just don't use those capabilities in
> > json.c because some of us had strong feelings about signaling an error
> > in these situations.
> 
> I didn't track down byte-to-position and position-bytes in the C code.
> So you are saying they're as expensive as encode-coding-string?  If (and
> only if) this is the case, then I take back my point :-).

I didn't mean those functions, I meant the functions used by json.c to
encode strings.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 16:34                                         ` João Távora
@ 2023-02-24 17:06                                           ` Eli Zaretskii
  0 siblings, 0 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 17:06 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 24 Feb 2023 16:34:34 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 61726@debbugs.gnu.org
> 
> On Fri, Feb 24, 2023 at 2:54 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
> >
> > On Fri, 24 Feb 2023 at 13:45, João Távora wrote:
> >
> > > I don't see how this is relevant to the code-point counting problem here,
> > > though.
> >
> > The relevance is in how the offset-counting function should proceed when
> > the buffer is not UTF-8 encodable.
> 
> OK but this is just for the forthcoming utf-8 pair of functions
> eglot--move-to-linepos-utf8 and eglot--current-linepos-utf8, right?

Yes.

> It doesn't affect the current utf16 and utf32 (move by codepoint)
> pairs of functions, does it?

No, it doesn't.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 16:39                                                 ` Augusto Stoffel
@ 2023-02-24 17:07                                                   ` Eli Zaretskii
  0 siblings, 0 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-24 17:07 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 17:39:11 +0100
> 
> If someone thinks this is a good and useful feature, of course they
> could work on it.  I think the current behavior (json-serialize error)
> is better because it sticks to the letter of the specs.

The spec doesn't tell us to signal an error, it tells us not to send
invalid UTF-8.  Which under my proposal we will still adhere to.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 16:01                                               ` Eli Zaretskii
  2023-02-24 16:39                                                 ` Augusto Stoffel
@ 2023-02-24 18:08                                                 ` Augusto Stoffel
  2023-02-24 18:55                                                   ` João Távora
  2023-02-25 10:57                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-24 18:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

[-- Attachment #1: Type: text/plain, Size: 154 bytes --]

I've attached a minimal functioning version of the patch.  I would
prefer to merge this and do all possible refinements discussed here at a
later stage.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Eglot-support-positionEncoding-LSP-capability.patch --]
[-- Type: text/x-patch, Size: 2992 bytes --]

From 9b6794a729ac27b02472d68f849066c3a10297d4 Mon Sep 17 00:00:00 2001
From: Augusto Stoffel <arstoffel@gmail.com>
Date: Thu, 23 Feb 2023 08:55:58 +0100
Subject: [PATCH] Eglot: support positionEncoding LSP capability

* lisp/progmodes/eglot.el(eglot-client-capabilities):  Announce the
new capability.
(eglot-bytewise-column, eglot-move-to-bytewise-column): New functions.
(eglot--managed-mode): Set 'eglot-current-column-function' and
'eglot-move-to-bytewise-column' appropriately.
---
 lisp/progmodes/eglot.el | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index b569c03e8c2..9a4d5733d0b 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -816,6 +816,9 @@ eglot-client-capabilities
                                        `(:valueSet
                                          [,@(mapcar
                                              #'car eglot--tag-faces)])))
+            :general
+            (list
+             :positionEncodings ["utf-32" "utf-8" "utf-16"])
             :experimental eglot--{})))
 
 (cl-defgeneric eglot-workspace-folders (server)
@@ -1441,6 +1444,10 @@ eglot--warn
 
 (defun eglot-current-column () (- (point) (line-beginning-position)))
 
+(defun eglot-bytewise-column ()
+  "Calculate current column using the LSP `utf-8' criterion."
+  (- (position-bytes (point)) (position-bytes (line-beginning-position))))
+
 (defvar eglot-current-column-function #'eglot-lsp-abiding-column
   "Function to calculate the current column.
 
@@ -1505,6 +1512,12 @@ eglot-move-to-lsp-abiding-column
             (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
           (end-of-buffer (cl-return eob-err))))))
 
+(defun eglot-move-to-bytewise-column (column)
+  "Move to COLUMN as computed using the LSP `utf-8' criterion."
+  (goto-char (min (byte-to-position
+                   (+ (position-bytes (line-beginning-position)) column))
+                  (line-end-position))))
+
 (defun eglot--lsp-position-to-point (pos-plist &optional marker)
   "Convert LSP position POS-PLIST to Emacs point.
 If optional MARKER, return a marker instead"
@@ -1758,6 +1771,14 @@ eglot--managed-mode
   :init-value nil :lighter nil :keymap eglot-mode-map
   (cond
    (eglot--managed-mode
+    (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                      :positionEncoding)
+      ("utf-32"
+       (eglot--setq-saving eglot-current-column-function #'eglot-current-column)
+       (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-column))
+      ("utf-8"
+       (eglot--setq-saving eglot-current-column-function #'eglot-bytewise-column)
+       (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-bytewise-column)))
     (add-hook 'after-change-functions 'eglot--after-change nil t)
     (add-hook 'before-change-functions 'eglot--before-change nil t)
     (add-hook 'kill-buffer-hook #'eglot--managed-mode-off nil t)
-- 
2.39.2


[-- Attachment #3: Type: text/plain, Size: 887 bytes --]


For the 2 new functions I've followed current naming scheme, replacing
"lsp-abiding" by "bytewise".  We can then get rid of all the suboptimal
nomenclature (lsp-abiding, column, bytewise) in one go at a later stage.

I also didn't attempt to make 'eglot-move-to-bytewise-column' and
'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
throws an error.  I think it's clear that Eli and I have different
definitions "correctness" in this context, but Eli's proposal requires
substantial further work, including recovering from possible errors
raising from 'json-serialize'.

Lastly, I used 'line-beginning-position' naively in the 2 new function
because there are at least 4 other pre-existing naive uses in eglot.el.
This refinement deserves a patch of its own, probably using something on
the lines of

  (defalias eglot--pos-bol (if (fboundp 'pos-bol) 'pos-bol ...))

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 18:08                                                 ` Augusto Stoffel
@ 2023-02-24 18:55                                                   ` João Távora
  2023-02-25 10:58                                                     ` Eli Zaretskii
  2023-03-05 10:26                                                     ` Augusto Stoffel
  2023-02-25 10:57                                                   ` Eli Zaretskii
  1 sibling, 2 replies; 82+ messages in thread
From: João Távora @ 2023-02-24 18:55 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Fri, Feb 24, 2023 at 6:08 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> I've attached a minimal functioning version of the patch.  I would
> prefer to merge this and do all possible refinements discussed here at a
> later stage.

This is OK with me.  I think the current patch is OK, modulo some very
minor whitespace and formatting options that I can fix  myself.

> For the 2 new functions I've followed current naming scheme, replacing
> "lsp-abiding" by "bytewise".  We can then get rid of all the suboptimal
> nomenclature (lsp-abiding, column, bytewise) in one go at a later stage.

OK, but not too late though, as these are user-visible-functions, and
I'd like to tag an Eglot version soon.

> I also didn't attempt to make 'eglot-move-to-bytewise-column' and
> 'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
> throws an error.  I think it's clear that Eli and I have different
> definitions "correctness" in this context, but Eli's proposal requires
> substantial further work, including recovering from possible errors
> raising from 'json-serialize'.

If Eli is OK with this provisional UTF-8 impl, then so am I.  How
complicated do you estimate Eli proposal to be in comparison?

> Lastly, I used 'line-beginning-position' naively in the 2 new function
> because there are at least 4 other pre-existing naive uses in eglot.el.
> This refinement deserves a patch of its own, probably using something on
> the lines of
>
>   (defalias eglot--pos-bol (if (fboundp 'pos-bol) 'pos-bol ...))

This is also a good idea.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 18:08                                                 ` Augusto Stoffel
  2023-02-24 18:55                                                   ` João Távora
@ 2023-02-25 10:57                                                   ` Eli Zaretskii
  2023-02-25 11:29                                                     ` Augusto Stoffel
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-25 10:57 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Fri, 24 Feb 2023 19:08:14 +0100
> 
> I also didn't attempt to make 'eglot-move-to-bytewise-column' and
> 'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
> throws an error.  I think it's clear that Eli and I have different
> definitions "correctness" in this context, but Eli's proposal requires
> substantial further work, including recovering from possible errors
> raising from 'json-serialize'.

No, it doesn't.  All I'm asking for is to use encode-coding-string or
encode-coding-region to measure the actual number of bytes in a UTF-8
sequence that corresponds to a region of text.  Nothing else is
required or requested.

Can you please humor me and implement eglot-bytewise-column like that?





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 18:55                                                   ` João Távora
@ 2023-02-25 10:58                                                     ` Eli Zaretskii
  2023-03-05 10:26                                                     ` Augusto Stoffel
  1 sibling, 0 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-25 10:58 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Fri, 24 Feb 2023 18:55:18 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, 61726@debbugs.gnu.org
> 
> > I also didn't attempt to make 'eglot-move-to-bytewise-column' and
> > 'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
> > throws an error.  I think it's clear that Eli and I have different
> > definitions "correctness" in this context, but Eli's proposal requires
> > substantial further work, including recovering from possible errors
> > raising from 'json-serialize'.
> 
> If Eli is OK with this provisional UTF-8 impl, then so am I.

I'm not.  I asked for a simple modification.

> How complicated do you estimate Eli proposal to be in comparison?

It isn't complicated at all.  I've even shown a possible
implementation in one of my earlier messages.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 10:57                                                   ` Eli Zaretskii
@ 2023-02-25 11:29                                                     ` Augusto Stoffel
  2023-02-25 13:47                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-25 11:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Sat, 25 Feb 2023 at 12:57, Eli Zaretskii wrote:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
>> Date: Fri, 24 Feb 2023 19:08:14 +0100
>> 
>> I also didn't attempt to make 'eglot-move-to-bytewise-column' and
>> 'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
>> throws an error.  I think it's clear that Eli and I have different
>> definitions "correctness" in this context, but Eli's proposal requires
>> substantial further work, including recovering from possible errors
>> raising from 'json-serialize'.
>
> No, it doesn't.  All I'm asking for is to use encode-coding-string or
> encode-coding-region to measure the actual number of bytes in a UTF-8
> sequence that corresponds to a region of text.  Nothing else is
> required or requested.
>
> Can you please humor me and implement eglot-bytewise-column like that?

I would be glad to do that, but unfortunately I'd have to ask your
advice as to how to make the corresponding adaptation of
eglot-move-to-bytewise-column.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 11:29                                                     ` Augusto Stoffel
@ 2023-02-25 13:47                                                       ` Eli Zaretskii
  2023-02-25 14:14                                                         ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-25 13:47 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Sat, 25 Feb 2023 12:29:02 +0100
> 
> On Sat, 25 Feb 2023 at 12:57, Eli Zaretskii wrote:
> 
> >> From: Augusto Stoffel <arstoffel@gmail.com>
> >> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> >> Date: Fri, 24 Feb 2023 19:08:14 +0100
> >> 
> >> I also didn't attempt to make 'eglot-move-to-bytewise-column' and
> >> 'eglot-bytewise-column' "correct" in the cases where 'json-serialize'
> >> throws an error.  I think it's clear that Eli and I have different
> >> definitions "correctness" in this context, but Eli's proposal requires
> >> substantial further work, including recovering from possible errors
> >> raising from 'json-serialize'.
> >
> > No, it doesn't.  All I'm asking for is to use encode-coding-string or
> > encode-coding-region to measure the actual number of bytes in a UTF-8
> > sequence that corresponds to a region of text.  Nothing else is
> > required or requested.
> >
> > Can you please humor me and implement eglot-bytewise-column like that?
> 
> I would be glad to do that, but unfortunately I'd have to ask your
> advice as to how to make the corresponding adaptation of
> eglot-move-to-bytewise-column.

Instead of this:

  (defun eglot-bytewise-column ()
    "Calculate current column using the LSP `utf-8' criterion."
    (- (position-bytes (point)) (position-bytes (line-beginning-position))))

use this:

  (defun eglot-bytewise-column ()
    "Calculate current column using the LSP `utf-8' criterion."
    (length (encode-coding-region (line-beginning-position) (point)
             'utf-8-unix t)))





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 13:47                                                       ` Eli Zaretskii
@ 2023-02-25 14:14                                                         ` Augusto Stoffel
  2023-02-25 16:26                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-25 14:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

On Sat, 25 Feb 2023 at 15:47, Eli Zaretskii wrote:

>> > Can you please humor me and implement eglot-bytewise-column like that?
>> 
>> I would be glad to do that, but unfortunately I'd have to ask your
>> advice as to how to make the corresponding adaptation of
>> eglot-move-to-bytewise-column.
         ^^^^^^^

> Instead of this:
>
>   (defun eglot-bytewise-column ()
>     "Calculate current column using the LSP `utf-8' criterion."
>     (- (position-bytes (point)) (position-bytes (line-beginning-position))))
>
> use this:
>
>   (defun eglot-bytewise-column ()
>     "Calculate current column using the LSP `utf-8' criterion."
>     (length (encode-coding-region (line-beginning-position) (point)
>              'utf-8-unix t)))





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 14:14                                                         ` Augusto Stoffel
@ 2023-02-25 16:26                                                           ` Eli Zaretskii
  2023-02-25 18:10                                                             ` Augusto Stoffel
  2023-02-25 22:13                                                             ` João Távora
  0 siblings, 2 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-25 16:26 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> Date: Sat, 25 Feb 2023 15:14:06 +0100
> 
> On Sat, 25 Feb 2023 at 15:47, Eli Zaretskii wrote:
> 
> >> > Can you please humor me and implement eglot-bytewise-column like that?
> >> 
> >> I would be glad to do that, but unfortunately I'd have to ask your
> >> advice as to how to make the corresponding adaptation of
> >> eglot-move-to-bytewise-column.
>          ^^^^^^^

Sorry.  Here:

  (defun eglot-move-to-bytewise-column (column)
    "Move to COLUMN as computed using the LSP `utf-8' criterion."
    (let* ((bol (line-beginning-position))
	   (goal-byte (+ (position-bytes bol) column))
	   (eol (line-end-position)))
      (goto-char bol)
      (while (and (< (position-bytes (point)) goal-byte)
		  (< (point) eol))
	(if (>= (char-after) #x3fff80)  ; raw bytes take 2 bytes in the buffer
	    (setq goal-byte (1+ goal-byte)))
	(forward-char 1))))





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 16:26                                                           ` Eli Zaretskii
@ 2023-02-25 18:10                                                             ` Augusto Stoffel
  2023-02-25 22:15                                                               ` João Távora
  2023-02-25 22:13                                                             ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-25 18:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, joaotavora

[-- Attachment #1: Type: text/plain, Size: 123 bytes --]

Okay, I've attached a new patch with your suggested implementation,
which to the extent I'm able to test works correctly.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Eglot-support-positionEncoding-LSP-capability.patch --]
[-- Type: text/x-patch, Size: 3244 bytes --]

From 0a2f143cae00db4488a56a5f9e27f0e3589a08ef Mon Sep 17 00:00:00 2001
From: Augusto Stoffel <arstoffel@gmail.com>
Date: Thu, 23 Feb 2023 08:55:58 +0100
Subject: [PATCH] Eglot: support positionEncoding LSP capability

* lisp/progmodes/eglot.el(eglot-client-capabilities):  Announce the
new capability.
(eglot-bytewise-column, eglot-move-to-bytewise-column): New functions.
(eglot--managed-mode): Set 'eglot-current-column-function' and
'eglot-move-to-bytewise-column' appropriately.
---
 lisp/progmodes/eglot.el | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index b569c03e8c2..6b38d39c7b7 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -816,6 +816,9 @@ eglot-client-capabilities
                                        `(:valueSet
                                          [,@(mapcar
                                              #'car eglot--tag-faces)])))
+            :general
+            (list
+             :positionEncodings ["utf-32" "utf-8" "utf-16"])
             :experimental eglot--{})))
 
 (cl-defgeneric eglot-workspace-folders (server)
@@ -1441,6 +1444,11 @@ eglot--warn
 
 (defun eglot-current-column () (- (point) (line-beginning-position)))
 
+(defun eglot-bytewise-column ()
+  "Calculate current column using the LSP `utf-8' criterion."
+  (length (encode-coding-region (line-beginning-position) (point)
+                                'utf-8-unix t)))
+
 (defvar eglot-current-column-function #'eglot-lsp-abiding-column
   "Function to calculate the current column.
 
@@ -1505,6 +1513,18 @@ eglot-move-to-lsp-abiding-column
             (forward-char (/ (if (> diff 0) (1+ diff) (1- diff)) 2))
           (end-of-buffer (cl-return eob-err))))))
 
+(defun eglot-move-to-bytewise-column (column)
+  "Move to COLUMN as computed using the LSP `utf-8' criterion."
+  (let* ((bol (line-beginning-position))
+	 (goal-byte (+ (position-bytes bol) column))
+	 (eol (line-end-position)))
+    (goto-char bol)
+    (while (and (< (position-bytes (point)) goal-byte)
+		(< (point) eol))
+      (if (>= (char-after) #x3fff80)  ; raw bytes take 2 bytes in the buffer
+	  (setq goal-byte (1+ goal-byte)))
+      (forward-char 1))))
+
 (defun eglot--lsp-position-to-point (pos-plist &optional marker)
   "Convert LSP position POS-PLIST to Emacs point.
 If optional MARKER, return a marker instead"
@@ -1758,6 +1778,14 @@ eglot--managed-mode
   :init-value nil :lighter nil :keymap eglot-mode-map
   (cond
    (eglot--managed-mode
+    (pcase (plist-get (eglot--capabilities (eglot-current-server))
+                      :positionEncoding)
+      ("utf-32"
+       (eglot--setq-saving eglot-current-column-function #'eglot-current-column)
+       (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-column))
+      ("utf-8"
+       (eglot--setq-saving eglot-current-column-function #'eglot-bytewise-column)
+       (eglot--setq-saving eglot-move-to-column-function #'eglot-move-to-bytewise-column)))
     (add-hook 'after-change-functions 'eglot--after-change nil t)
     (add-hook 'before-change-functions 'eglot--before-change nil t)
     (add-hook 'kill-buffer-hook #'eglot--managed-mode-off nil t)
-- 
2.39.2


[-- Attachment #3: Type: text/plain, Size: 475 bytes --]


I still maintain that we are doing a lot of extra work (LOC and
CPU-wise) just to guard against the impossible.  On the other hand there
are probably better places to look for optimization opportunities.  For
instance, I noticed that when completions arrive form the server, we
call eglot-current-column once for each candidate, although most of
those will never be used.

Also, in a few years every serious server should support the
codepoint-based counting method anyway.

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 16:26                                                           ` Eli Zaretskii
  2023-02-25 18:10                                                             ` Augusto Stoffel
@ 2023-02-25 22:13                                                             ` João Távora
  2023-02-25 22:34                                                               ` Augusto Stoffel
  2023-02-26  5:31                                                               ` Eli Zaretskii
  1 sibling, 2 replies; 82+ messages in thread
From: João Távora @ 2023-02-25 22:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Augusto Stoffel <arstoffel@gmail.com>
>> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
>> Date: Sat, 25 Feb 2023 15:14:06 +0100
>> 
>> On Sat, 25 Feb 2023 at 15:47, Eli Zaretskii wrote:
>> 
>> >> > Can you please humor me and implement eglot-bytewise-column like that?
>> >> 
>> >> I would be glad to do that, but unfortunately I'd have to ask your
>> >> advice as to how to make the corresponding adaptation of
>> >> eglot-move-to-bytewise-column.
>>          ^^^^^^^
>
> Sorry.  Here:
>
>   (defun eglot-move-to-bytewise-column (column)
>     "Move to COLUMN as computed using the LSP `utf-8' criterion."
>     (let* ((bol (line-beginning-position))
> 	   (goal-byte (+ (position-bytes bol) column))
> 	   (eol (line-end-position)))
>       (goto-char bol)
>       (while (and (< (position-bytes (point)) goal-byte)
> 		  (< (point) eol))
> 	(if (>= (char-after) #x3fff80)  ; raw bytes take 2 bytes in the buffer
> 	    (setq goal-byte (1+ goal-byte)))
> 	(forward-char 1))))

In eglot-move-to-lsp-abiding-column (the utf-16 sibling of this
function) we use a binary search instead of a linear search.  I remember
measuring a visible improvement.  I'm not sure the conditions are
exactly the same with this one.  Could/should we do the same here?

João






^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 18:10                                                             ` Augusto Stoffel
@ 2023-02-25 22:15                                                               ` João Távora
  0 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-25 22:15 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

Augusto Stoffel <arstoffel@gmail.com> writes:

> Also, in a few years every serious server should support the
> codepoint-based counting method anyway.

Exactly why I wonder why we're going through all this bother to support
an intermediately flawed alternative.  So don't state this too often ;-)

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 22:13                                                             ` João Távora
@ 2023-02-25 22:34                                                               ` Augusto Stoffel
  2023-02-25 23:16                                                                 ` João Távora
  2023-02-26  5:31                                                               ` Eli Zaretskii
  1 sibling, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-25 22:34 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

On Sat, 25 Feb 2023 at 22:13, João Távora wrote:

> In eglot-move-to-lsp-abiding-column (the utf-16 sibling of this
> function) we use a binary search instead of a linear search.  I remember
> measuring a visible improvement.  I'm not sure the conditions are
> exactly the same with this one.  Could/should we do the same here?

Funnily, in the quick tests I made, Eli's function seems twice as fast
as "lsp-abiding".

Specifically, I ran the following (for each of the 3 move-to functions)
on a buffer where the first line is about 100 characters long:

(benchmark-run-compiled 10000
    (progn
      (goto-char (point-min))
      (let ((len (- (pos-eol) (point))))
        (dotimes (i len)
          (eglot-move-to-column (mod i len))))))





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 22:34                                                               ` Augusto Stoffel
@ 2023-02-25 23:16                                                                 ` João Távora
  2023-02-25 23:57                                                                   ` Augusto Stoffel
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-25 23:16 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii

On Sat, Feb 25, 2023 at 10:34 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>
> On Sat, 25 Feb 2023 at 22:13, João Távora wrote:
>
> > In eglot-move-to-lsp-abiding-column (the utf-16 sibling of this
> > function) we use a binary search instead of a linear search.  I remember
> > measuring a visible improvement.  I'm not sure the conditions are
> > exactly the same with this one.  Could/should we do the same here?
>
> Funnily, in the quick tests I made, Eli's function seems twice as fast
> as "lsp-abiding".

That's understandable as lsp-abiding has to measure the length in
bytes of a number of codepoints every time it does one of
the iterations of the binary search.  Position-bytes doesn't quite work
there (I can't remember why). It's quite faster than it's linear search
cousin.

So comparing those two doesn't make sense.  I meant comparing a
linear version of the utf-8 function to a binary-searching version of
the same utf-8 function.  If that even makes sense here: I'm not sure it
does because I haven't followed the details of the problem.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 23:16                                                                 ` João Távora
@ 2023-02-25 23:57                                                                   ` Augusto Stoffel
  2023-02-26  6:03                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: Augusto Stoffel @ 2023-02-25 23:57 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, Eli Zaretskii

> That's understandable as lsp-abiding has to measure the length in
> bytes of a number of codepoints every time it does one of
> the iterations of the binary search.  Position-bytes doesn't quite work
> there (I can't remember why). It's quite faster than it's linear search
> cousin.
>
> So comparing those two doesn't make sense.

The linear cousin of "lsp-abiding" is very similar to the
function that Eli wrote:

(defun eglot-move-to-lsp-abiding-column-linearly (column)
  "Move to COLUMN as computed using the LSP `utf-16' criterion."
  (let* ((bol (line-beginning-position))
	 (goal-char (+ bol column))
	 (eol (line-end-position)))
    (goto-char bol)
    (while (and (< (point) goal-char)
		(< (point) eol))
      (if (<= #x010000 (char-after) #x10ffff)
	  (setq goal-char (1- goal-char)))
      (forward-char 1))))

It would be very hard to believe they have different performance
characteristics.  In fact, in some hastily done tests, I get the
following relative running times:

eglot-move-to-colum: 1.0
eglot-move-to-lsp-abiding-column-linearly: 8.4
eglot-move-to-bytewise-column: 8.0
eglot-move-to-lsp-abiding-column: 14.4





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 22:13                                                             ` João Távora
  2023-02-25 22:34                                                               ` Augusto Stoffel
@ 2023-02-26  5:31                                                               ` Eli Zaretskii
  2023-02-26 10:38                                                                 ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26  5:31 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Cc: Augusto Stoffel <arstoffel@gmail.com>,  61726@debbugs.gnu.org
> Date: Sat, 25 Feb 2023 22:13:29 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Augusto Stoffel <arstoffel@gmail.com>
> >> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> >> Date: Sat, 25 Feb 2023 15:14:06 +0100
> >> 
> >> On Sat, 25 Feb 2023 at 15:47, Eli Zaretskii wrote:
> >> 
> >> >> > Can you please humor me and implement eglot-bytewise-column like that?
> >> >> 
> >> >> I would be glad to do that, but unfortunately I'd have to ask your
> >> >> advice as to how to make the corresponding adaptation of
> >> >> eglot-move-to-bytewise-column.
> >>          ^^^^^^^
> >
> > Sorry.  Here:
> >
> >   (defun eglot-move-to-bytewise-column (column)
> >     "Move to COLUMN as computed using the LSP `utf-8' criterion."
> >     (let* ((bol (line-beginning-position))
> > 	   (goal-byte (+ (position-bytes bol) column))
> > 	   (eol (line-end-position)))
> >       (goto-char bol)
> >       (while (and (< (position-bytes (point)) goal-byte)
> > 		  (< (point) eol))
> > 	(if (>= (char-after) #x3fff80)  ; raw bytes take 2 bytes in the buffer
> > 	    (setq goal-byte (1+ goal-byte)))
> > 	(forward-char 1))))
> 
> In eglot-move-to-lsp-abiding-column (the utf-16 sibling of this
> function) we use a binary search instead of a linear search.  I remember
> measuring a visible improvement.  I'm not sure the conditions are
> exactly the same with this one.  Could/should we do the same here?

Fine by me, but optimizing a method that is not yet used sounds a bit
premature, no?  I won't object, though.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-25 23:57                                                                   ` Augusto Stoffel
@ 2023-02-26  6:03                                                                     ` Eli Zaretskii
  2023-02-26 10:33                                                                       ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26  6:03 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, joaotavora

> From: Augusto Stoffel <arstoffel@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  61726@debbugs.gnu.org
> Date: Sun, 26 Feb 2023 00:57:42 +0100
> 
> The linear cousin of "lsp-abiding" is very similar to the
> function that Eli wrote:
> 
> (defun eglot-move-to-lsp-abiding-column-linearly (column)
>   "Move to COLUMN as computed using the LSP `utf-16' criterion."
>   (let* ((bol (line-beginning-position))
> 	 (goal-char (+ bol column))
> 	 (eol (line-end-position)))
>     (goto-char bol)
>     (while (and (< (point) goal-char)
> 		(< (point) eol))
>       (if (<= #x010000 (char-after) #x10ffff)
> 	  (setq goal-char (1- goal-char)))
>       (forward-char 1))))
> 
> It would be very hard to believe they have different performance
> characteristics.  In fact, in some hastily done tests, I get the
> following relative running times:
> 
> eglot-move-to-colum: 1.0
> eglot-move-to-lsp-abiding-column-linearly: 8.4
> eglot-move-to-bytewise-column: 8.0
> eglot-move-to-lsp-abiding-column: 14.4

The current version of eglot-move-to-lsp-abiding-column calls
encode-coding-region in the loop, which I guess is the main reason for
its being slower.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26  6:03                                                                     ` Eli Zaretskii
@ 2023-02-26 10:33                                                                       ` João Távora
  2023-02-26 13:13                                                                         ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-26 10:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Sun, Feb 26, 2023 at 6:03 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Augusto Stoffel <arstoffel@gmail.com>
> > Cc: Eli Zaretskii <eliz@gnu.org>,  61726@debbugs.gnu.org
> > Date: Sun, 26 Feb 2023 00:57:42 +0100
> >
> > The linear cousin of "lsp-abiding" is very similar to the
> > function that Eli wrote:
> >
> > (defun eglot-move-to-lsp-abiding-column-linearly (column)
> >   "Move to COLUMN as computed using the LSP `utf-16' criterion."
> >   (let* ((bol (line-beginning-position))
> >        (goal-char (+ bol column))
> >        (eol (line-end-position)))
> >     (goto-char bol)
> >     (while (and (< (point) goal-char)
> >               (< (point) eol))
> >       (if (<= #x010000 (char-after) #x10ffff)
> >         (setq goal-char (1- goal-char)))
> >       (forward-char 1))))
> >
> > It would be very hard to believe they have different performance
> > characteristics.  In fact, in some hastily done tests, I get the
> > following relative running times:
> >
> > eglot-move-to-colum: 1.0
> > eglot-move-to-lsp-abiding-column-linearly: 8.4
> > eglot-move-to-bytewise-column: 8.0
> > eglot-move-to-lsp-abiding-column: 14.4
>
> The current version of eglot-move-to-lsp-abiding-column calls
> encode-coding-region in the loop, which I guess is the main reason for
> its being slower.

Yup, here's a surprise.  We don't need encode-coding-region
after all and the binary search I implemented back in
https://github.com/joaotavora/eglot/pull/125
is mostly useless.  In that issue, noone thought of counting
characters by examining the code point values and counting
code units.

So I pushed this much faster, simpler version to emacs-29.

I credited Eli in the commit, as we wrote the code. Hope
that's OK.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26  5:31                                                               ` Eli Zaretskii
@ 2023-02-26 10:38                                                                 ` João Távora
  0 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-26 10:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, arstoffel

On Sun, Feb 26, 2023 at 5:31 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: João Távora <joaotavora@gmail.com>
> > Cc: Augusto Stoffel <arstoffel@gmail.com>,  61726@debbugs.gnu.org
> > Date: Sat, 25 Feb 2023 22:13:29 +0000
> >
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> > >> From: Augusto Stoffel <arstoffel@gmail.com>
> > >> Cc: joaotavora@gmail.com,  61726@debbugs.gnu.org
> > >> Date: Sat, 25 Feb 2023 15:14:06 +0100
> > >>
> > >> On Sat, 25 Feb 2023 at 15:47, Eli Zaretskii wrote:
> > >>
> > >> >> > Can you please humor me and implement eglot-bytewise-column like that?
> > >> >>
> > >> >> I would be glad to do that, but unfortunately I'd have to ask your
> > >> >> advice as to how to make the corresponding adaptation of
> > >> >> eglot-move-to-bytewise-column.
> > >>          ^^^^^^^
> > >
> > > Sorry.  Here:
> > >
> > >   (defun eglot-move-to-bytewise-column (column)
> > >     "Move to COLUMN as computed using the LSP `utf-8' criterion."
> > >     (let* ((bol (line-beginning-position))
> > >        (goal-byte (+ (position-bytes bol) column))
> > >        (eol (line-end-position)))
> > >       (goto-char bol)
> > >       (while (and (< (position-bytes (point)) goal-byte)
> > >               (< (point) eol))
> > >     (if (>= (char-after) #x3fff80)  ; raw bytes take 2 bytes in the buffer
> > >         (setq goal-byte (1+ goal-byte)))
> > >     (forward-char 1))))
> >
> > In eglot-move-to-lsp-abiding-column (the utf-16 sibling of this
> > function) we use a binary search instead of a linear search.  I remember
> > measuring a visible improvement.  I'm not sure the conditions are
> > exactly the same with this one.  Could/should we do the same here?
>
> Fine by me, but optimizing a method that is not yet used sounds a bit
> premature, no?  I won't object, though.

There's nothing to optimize there, there's no benefit to binary search.
I think this type of function, which has some utf-8/16 knowledge
directly in them, is the fastest option, much faster than using
encode-coding-region like I was doing before.

João





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 10:33                                                                       ` João Távora
@ 2023-02-26 13:13                                                                         ` João Távora
  2023-02-26 13:16                                                                           ` Eli Zaretskii
  2023-02-26 13:25                                                                           ` Eli Zaretskii
  0 siblings, 2 replies; 82+ messages in thread
From: João Távora @ 2023-02-26 13:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, Augusto Stoffel

On Sun, Feb 26, 2023 at 10:33 AM João Távora <joaotavora@gmail.com> wrote:

> So I pushed this much faster, simpler version to emacs-29.
>
> I credited Eli in the commit, as we wrote the code. Hope
> that's OK.

Sorry, I meant to write "he wrote the code".  But actually Augusto
also wrote some of it.  So fortunately I also added a "Co-authored-by"
note.

Anyway, I _also_ did some more stuff:

* Pushed the latest patch by Augusto

* Pushed a renaming/redocumenting obsoletion-establishing patch.  Now
Eglot talks of "linepos" instead of "column".

* Tested this against actual LSP servers supporting and not supporting
the positionEncodings capabilities (clangd, rust-analyzer, pylsp).

Have a look and let me know if I missed something important.

I think the only thing missing to close this bug is that `eglot--lbpos` alias
that Augusto proposed.

All in all, I'm happy to wrap up this long discussion where I re-learned
some coding system stuff.  I'm also quite happy to have doubled the
performance of one of Eglot's hotspots thanks to a somewhat accidental
discovery from both of you, so thank you very much.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 13:13                                                                         ` João Távora
@ 2023-02-26 13:16                                                                           ` Eli Zaretskii
  2023-02-26 13:25                                                                           ` Eli Zaretskii
  1 sibling, 0 replies; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26 13:16 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Sun, 26 Feb 2023 13:13:16 +0000
> Cc: Augusto Stoffel <arstoffel@gmail.com>, 61726@debbugs.gnu.org
> 
> Anyway, I _also_ did some more stuff:
> 
> * Pushed the latest patch by Augusto
> 
> * Pushed a renaming/redocumenting obsoletion-establishing patch.  Now
> Eglot talks of "linepos" instead of "column".
> 
> * Tested this against actual LSP servers supporting and not supporting
> the positionEncodings capabilities (clangd, rust-analyzer, pylsp).

Great, thanks.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 13:13                                                                         ` João Távora
  2023-02-26 13:16                                                                           ` Eli Zaretskii
@ 2023-02-26 13:25                                                                           ` Eli Zaretskii
  2023-02-26 14:17                                                                             ` João Távora
  1 sibling, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26 13:25 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Date: Sun, 26 Feb 2023 13:13:16 +0000
> Cc: Augusto Stoffel <arstoffel@gmail.com>, 61726@debbugs.gnu.org
> 
> * Pushed a renaming/redocumenting obsoletion-establishing patch.  Now
> Eglot talks of "linepos" instead of "column".

I've corrected a few doc strings to use a more widely-familiar
terminology.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 13:25                                                                           ` Eli Zaretskii
@ 2023-02-26 14:17                                                                             ` João Távora
  2023-02-26 14:50                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-26 14:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, arstoffel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: João Távora <joaotavora@gmail.com>
>> Date: Sun, 26 Feb 2023 13:13:16 +0000
>> Cc: Augusto Stoffel <arstoffel@gmail.com>, 61726@debbugs.gnu.org
>> 
>> * Pushed a renaming/redocumenting obsoletion-establishing patch.  Now
>> Eglot talks of "linepos" instead of "column".
>
> I've corrected a few doc strings to use a more widely-familiar
> terminology.

Thanks, but at one of the corrections is mistaken, so I've fixed it like
this.

     (defvar eglot-current-linepos-function #'eglot-utf-16-linepos
    -  "Function calculating number of UTF-16 code units from line beginning.
    +  "Function calculating number of code units from line beginning.

     This is the inverse operation of
     `eglot-move-to-linepos-function' (which see).	It is a function of

This variable is set by default to a function that indeed does UTF-16
calculations.  But it can and will be set to one that does UTF-32 or
UTF-8 codepoint calculations, depending on the server.

I also have a few nits.  In your patch:

     (defun eglot-utf-32-linepos ()
    -  "Calculate number of code units to line beginning using UTF-32."
    +  "Calculate number of Unicode codepoints from line beginning."
       (- (point) (line-beginning-position)))

This is strictly true, because in UTF-32 it's one code point per code
unit.  But the protocol of the variable
'eglot-current-linepos-function', where this function is to be plugged,
a code-unit counting function is demanded.  I would predict that
encoding-challenged programmers (like this package's maintainer in about
2 week's time) might not know about that equivalence and will find it
slightly confusing to plug this function into that variable.

But as I said this is just a nit.

João

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 14:17                                                                             ` João Távora
@ 2023-02-26 14:50                                                                               ` Eli Zaretskii
  2023-02-26 15:15                                                                                 ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26 14:50 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Cc: arstoffel@gmail.com,  61726@debbugs.gnu.org
> Date: Sun, 26 Feb 2023 14:17:39 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I've corrected a few doc strings to use a more widely-familiar
> > terminology.
> 
> Thanks, but at one of the corrections is mistaken, so I've fixed it like
> this.
> 
>      (defvar eglot-current-linepos-function #'eglot-utf-16-linepos
>     -  "Function calculating number of UTF-16 code units from line beginning.
>     +  "Function calculating number of code units from line beginning.
>      
>      This is the inverse operation of
>      `eglot-move-to-linepos-function' (which see).	It is a function of
> 
> This variable is set by default to a function that indeed does UTF-16
> calculations.  But it can and will be set to one that does UTF-32 or
> UTF-8 codepoint calculations, depending on the server.
> 
> I also have a few nits.  In your patch:
> 
>      (defun eglot-utf-32-linepos ()
>     -  "Calculate number of code units to line beginning using UTF-32."
>     +  "Calculate number of Unicode codepoints from line beginning."
>        (- (point) (line-beginning-position)))
> 
> This is strictly true, because in UTF-32 it's one code point per code
> unit.  But the protocol of the variable
> 'eglot-current-linepos-function', where this function is to be plugged,
> a code-unit counting function is demanded.  I would predict that
> encoding-challenged programmers (like this package's maintainer in about
> 2 week's time) might not know about that equivalence and will find it
> slightly confusing to plug this function into that variable.
> 
> But as I said this is just a nit.

I made one more fix.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 14:50                                                                               ` Eli Zaretskii
@ 2023-02-26 15:15                                                                                 ` João Távora
  2023-02-26 15:37                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 82+ messages in thread
From: João Távora @ 2023-02-26 15:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 61726, arstoffel

Eli Zaretskii <eliz@gnu.org> writes:


>> But as I said this is just a nit.
>
> I made one more fix.

FTR, I think it's way worse now. The function variable has no clear
protocol: it's very odd to state its return type as number of code units
_or_ bytes _or_ code points.  A good docstring for such a variable notes
that, for any buffer position, a plugged-in function must return the
same _type_ but may return a different _value_ to match the
unit-counting strategy being used by the LSP server.

I haven't reverted to avoid a commit war, and because I'm a bit weary of
this bug.  I leave a patch for consideration and for the record, feel
free to ignore it.

João

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index dd84f545ed4..12fb72503aa 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -1453,11 +1453,20 @@ eglot--warn
 (defvar eglot-current-linepos-function #'eglot-utf-16-linepos
   "Function calculating position relative to line beginning.
 
-This is the inverse of `eglot-move-to-linepos-function' (which see).
-It is a function of no arguments returning the number of code units
-or bytes or codepoints corresponding to the current position of point,
-relative to line beginning, as expected by the function that is the
-value of `eglot-move-to-linepos-function'.")
+This is the inverse operation of
+`eglot-move-to-linepos-function' (which see).  The value is a
+nullary function that examines current `point' and computes the
+number of code units relative to line beginning in such a way
+that it matches how the LSP server also counts code units.  For
+example, if the LSP server is using UTF-16 encoding, and point is
+on the \"b\" of a line containing just \"aXbc\" where \"X\" is a
+funny looking character in the UTF-16 \"supplementary plane\",
+this function has to return 3 code points instead of 2.  If the
+server is using UTF-32 encoding, this function should return 2
+code points.
+
+Since LSP 3.17 server and client may agree on an encoding and
+Eglot will set this variable automatically.")
 
 (defun eglot-utf-8-linepos ()
   "Calculate number of UTF-8 bytes from line beginning."
@@ -1496,11 +1505,11 @@ eglot-move-to-linepos-function
   "Function to move to a position within a line reported by the LSP server.
 
 Per the LSP spec, character offsets in LSP Position objects count
-UTF-16 code units, not actual code points.  So when LSP says
-position 3 of a line containing just \"aXbc\", where X is a funny
-looking character in the UTF-16 \"supplementary plane\", it
-actually means `b', not `c'.  The default value
-`eglot-move-to-utf-16-linepos' accounts for this.
+code units, not actual code points.  The default is to use UTF-16
+encoding, so when LSP says position 3 of a line containing just
+\"aXbc\", where \"X\" is a funny looking character in the UTF-16
+\"supplementary plane\", it actually means \"b\", not \"c\".  The
+default value `eglot-move-to-utf-16-linepos' accounts for this.
 
 This variable can also be set to `eglot-move-to-utf-8-linepos' or
 `eglot-move-to-utf-32-linepos' for servers not closely following






^ permalink raw reply related	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 15:15                                                                                 ` João Távora
@ 2023-02-26 15:37                                                                                   ` Eli Zaretskii
  2023-02-27 11:15                                                                                     ` João Távora
  0 siblings, 1 reply; 82+ messages in thread
From: Eli Zaretskii @ 2023-02-26 15:37 UTC (permalink / raw)
  To: João Távora; +Cc: 61726, arstoffel

> From: João Távora <joaotavora@gmail.com>
> Cc: arstoffel@gmail.com,  61726@debbugs.gnu.org
> Date: Sun, 26 Feb 2023 15:15:57 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> 
> >> But as I said this is just a nit.
> >
> > I made one more fix.
> 
> FTR, I think it's way worse now. The function variable has no clear
> protocol: it's very odd to state its return type as number of code units
> _or_ bytes _or_ code points.  A good docstring for such a variable notes
> that, for any buffer position, a plugged-in function must return the
> same _type_ but may return a different _value_ to match the
> unit-counting strategy being used by the LSP server.

It cannot return the same value,. it must return a value in the same
units as its opposite.  The doc string says:

  This is the inverse of `eglot-move-to-linepos-function' (which see).
  It is a function of no arguments returning the number of code units
  or bytes or codepoints corresponding to the current position of point,
  relative to line beginning, as expected by the function that is the
  value of `eglot-move-to-linepos-function'.")

Note the last sentence.

> I haven't reverted to avoid a commit war, and because I'm a bit weary of
> this bug.  I leave a patch for consideration and for the record, feel
> free to ignore it.

It's your package, so feel free to make any changes you like.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-23 18:42   ` Augusto Stoffel
@ 2023-02-27 10:11     ` Felician Nemeth
  0 siblings, 0 replies; 82+ messages in thread
From: Felician Nemeth @ 2023-02-27 10:11 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: 61726, Eli Zaretskii, João Távora

Augusto Stoffel <arstoffel@gmail.com> writes:

> This is intentional, so let's see if you agree with my rationale.

You have mostly convinced me.  I also see the patch developed further in
the mean time.

>> (In the long run, it might make sense to let the list of the offered
>> encodings configurable.)
>
> I don't think I understand why.  We would already be offering all 3
> positionEncoding options.  The server can pick whatever they like and
> we'll deal with it.  Everything should "just work", no need for
> configuration.

When the eglot-{current-,move-to-}column-functions will be removed, the
users won't be able to adjust the behavior of Eglot to cope with
defective servers.  However, this adds complexity for the sake of an
unlikely hypothetical case.  So now I agree with you that this is
unnecessary.  Thanks.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-26 15:37                                                                                   ` Eli Zaretskii
@ 2023-02-27 11:15                                                                                     ` João Távora
  0 siblings, 0 replies; 82+ messages in thread
From: João Távora @ 2023-02-27 11:15 UTC (permalink / raw)
  To: Eli Zaretskii, 61726-done; +Cc: arstoffel

On Sun, Feb 26, 2023 at 3:37 PM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: João Távora <joaotavora@gmail.com>
> > Cc: arstoffel@gmail.com,  61726@debbugs.gnu.org
> > Date: Sun, 26 Feb 2023 15:15:57 +0000
> >
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> >
> > >> But as I said this is just a nit.
> > >
> > > I made one more fix.
> >
> > FTR, I think it's way worse now. The function variable has no clear
> > protocol: it's very odd to state its return type as number of code units
> > _or_ bytes _or_ code points.  A good docstring for such a variable notes
> > that, for any buffer position, a plugged-in function must return the
> > same _type_ but may return a different _value_ to match the
> > unit-counting strategy being used by the LSP server.
>
> It cannot return the same value,. it must return a value in the same
> units as its opposite.

I wrote "return the same _type_ but may return a different _value_".

The doc string says:
>
>   This is the inverse of `eglot-move-to-linepos-function' (which see).
>   It is a function of no arguments returning the number of code units
>   or bytes or codepoints corresponding to the current position of point,
>   relative to line beginning, as expected by the function that is the
>   value of `eglot-move-to-linepos-function'.")
>
> Note the last sentence.

Yes, I understand the intention.  That's a good detail, to state that
when that other function is fed the return value of this one, position
should remain unaltered.  I just did that.

> It's your package, so feel free to make any changes you like.

Done, and closing. We can deal with the `line-beginning-position` thing
afterwards.  It's only tangentially related to this encoding-supporting feature.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* bug#61726: [PATCH] Eglot: Support positionEncoding capability
  2023-02-24 18:55                                                   ` João Távora
  2023-02-25 10:58                                                     ` Eli Zaretskii
@ 2023-03-05 10:26                                                     ` Augusto Stoffel
  1 sibling, 0 replies; 82+ messages in thread
From: Augusto Stoffel @ 2023-03-05 10:26 UTC (permalink / raw)
  To: João Távora; +Cc: Daniel Mendler, Eli Zaretskii, 61726

On Fri, 24 Feb 2023 at 18:55, João Távora wrote:

> On Fri, Feb 24, 2023 at 6:08 PM Augusto Stoffel <arstoffel@gmail.com> wrote:
>> Lastly, I used 'line-beginning-position' naively in the 2 new function
>> because there are at least 4 other pre-existing naive uses in eglot.el.
>> This refinement deserves a patch of its own, probably using something on
>> the lines of
>>
>>   (defalias eglot--pos-bol (if (fboundp 'pos-bol) 'pos-bol ...))
>
> This is also a good idea.

Actually, it might be better to use the compat library.  I'm copying
Daniel to ask if that is possible/recommended for "core" packages.  I
imagine adding compat to the Package-Requires and calling

  (require 'compat nil 'no-error)

would work?





^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2023-03-05 10:26 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-23  8:05 bug#61726: [PATCH] Eglot: Support positionEncoding capability Augusto Stoffel
2023-02-23 10:39 ` Eli Zaretskii
2023-02-23 11:32   ` João Távora
2023-02-23 12:04     ` Augusto Stoffel
2023-02-23 12:24       ` João Távora
2023-02-23 11:46   ` Augusto Stoffel
2023-02-23 12:54     ` Eli Zaretskii
2023-02-23 13:31       ` Augusto Stoffel
2023-02-23 15:04         ` Eli Zaretskii
2023-02-23 18:52           ` Augusto Stoffel
2023-02-23 19:20             ` Eli Zaretskii
2023-02-23 19:28               ` João Távora
2023-02-23 19:52                 ` Augusto Stoffel
2023-02-24  6:43                   ` Eli Zaretskii
2023-02-24  7:18                     ` Augusto Stoffel
2023-02-24  8:38                       ` Eli Zaretskii
2023-02-24  9:15                         ` Augusto Stoffel
2023-02-24 10:20                           ` João Távora
2023-02-24 11:01                             ` Augusto Stoffel
2023-02-24 11:18                               ` João Távora
2023-02-24 11:47                                 ` Augusto Stoffel
2023-02-24 12:05                                   ` João Távora
2023-02-24 12:14                                     ` Augusto Stoffel
2023-02-24 11:38                               ` Eli Zaretskii
2023-02-24 11:55                                 ` João Távora
2023-02-24 11:27                           ` Eli Zaretskii
2023-02-24 11:43                             ` João Távora
2023-02-24 11:57                               ` Eli Zaretskii
2023-02-24 12:09                                 ` João Távora
2023-02-24 12:18                                   ` Eli Zaretskii
2023-02-24 12:31                                     ` Augusto Stoffel
2023-02-24 12:01                             ` Augusto Stoffel
2023-02-24 12:16                               ` Eli Zaretskii
2023-02-24 12:35                                 ` Augusto Stoffel
2023-02-24 12:55                                   ` João Távora
2023-02-24 13:34                                   ` Eli Zaretskii
2023-02-24 13:45                                     ` João Távora
2023-02-24 13:51                                       ` Eli Zaretskii
2023-02-24 14:45                                         ` Augusto Stoffel
2023-02-24 15:19                                           ` Eli Zaretskii
2023-02-24 15:52                                             ` Augusto Stoffel
2023-02-24 16:01                                               ` Eli Zaretskii
2023-02-24 16:39                                                 ` Augusto Stoffel
2023-02-24 17:07                                                   ` Eli Zaretskii
2023-02-24 18:08                                                 ` Augusto Stoffel
2023-02-24 18:55                                                   ` João Távora
2023-02-25 10:58                                                     ` Eli Zaretskii
2023-03-05 10:26                                                     ` Augusto Stoffel
2023-02-25 10:57                                                   ` Eli Zaretskii
2023-02-25 11:29                                                     ` Augusto Stoffel
2023-02-25 13:47                                                       ` Eli Zaretskii
2023-02-25 14:14                                                         ` Augusto Stoffel
2023-02-25 16:26                                                           ` Eli Zaretskii
2023-02-25 18:10                                                             ` Augusto Stoffel
2023-02-25 22:15                                                               ` João Távora
2023-02-25 22:13                                                             ` João Távora
2023-02-25 22:34                                                               ` Augusto Stoffel
2023-02-25 23:16                                                                 ` João Távora
2023-02-25 23:57                                                                   ` Augusto Stoffel
2023-02-26  6:03                                                                     ` Eli Zaretskii
2023-02-26 10:33                                                                       ` João Távora
2023-02-26 13:13                                                                         ` João Távora
2023-02-26 13:16                                                                           ` Eli Zaretskii
2023-02-26 13:25                                                                           ` Eli Zaretskii
2023-02-26 14:17                                                                             ` João Távora
2023-02-26 14:50                                                                               ` Eli Zaretskii
2023-02-26 15:15                                                                                 ` João Távora
2023-02-26 15:37                                                                                   ` Eli Zaretskii
2023-02-27 11:15                                                                                     ` João Távora
2023-02-26  5:31                                                               ` Eli Zaretskii
2023-02-26 10:38                                                                 ` João Távora
2023-02-24 14:54                                       ` Augusto Stoffel
2023-02-24 15:23                                         ` Eli Zaretskii
2023-02-24 15:56                                           ` Augusto Stoffel
2023-02-24 17:02                                             ` Eli Zaretskii
2023-02-24 16:34                                         ` João Távora
2023-02-24 17:06                                           ` Eli Zaretskii
2023-02-23 11:37 ` João Távora
2023-02-23 17:01 ` Felician Nemeth
2023-02-23 17:11   ` João Távora
2023-02-23 18:42   ` Augusto Stoffel
2023-02-27 10:11     ` Felician Nemeth

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).