From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Augusto Stoffel Newsgroups: gmane.emacs.bugs Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability Date: Thu, 23 Feb 2023 14:31:52 +0100 Message-ID: <87wn48ecdz.fsf@gmail.com> References: <87a614g628.fsf@gmail.com> <83cz60r7hu.fsf@gnu.org> <875ybsfvtj.fsf@gmail.com> <831qmgr17p.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24030"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 61726@debbugs.gnu.org, joaotavora@gmail.com To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Feb 23 14:33:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pVBiu-00066V-5c for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 23 Feb 2023 14:33:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pVBia-0006gv-G2; Thu, 23 Feb 2023 08:33:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pVBiZ-0006gH-3l for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 08:33:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pVBiY-0005RW-Jq for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 08:33:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pVBiY-00059y-0h for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 08:33:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Augusto Stoffel Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 23 Feb 2023 13:33:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61726 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 61726-submit@debbugs.gnu.org id=B61726.167715912319762 (code B ref 61726); Thu, 23 Feb 2023 13:33:01 +0000 Original-Received: (at 61726) by debbugs.gnu.org; 23 Feb 2023 13:32:03 +0000 Original-Received: from localhost ([127.0.0.1]:33403 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pVBhb-00058f-7M for submit@debbugs.gnu.org; Thu, 23 Feb 2023 08:32:03 -0500 Original-Received: from mail-ed1-f50.google.com ([209.85.208.50]:33492) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pVBhZ-00058B-6X for 61726@debbugs.gnu.org; Thu, 23 Feb 2023 08:32:02 -0500 Original-Received: by mail-ed1-f50.google.com with SMTP id ck15so43370341edb.0 for <61726@debbugs.gnu.org>; Thu, 23 Feb 2023 05:32:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=unK1SdoItVX91UiCICB5TzwFBKcA3xXglCBFw7t9foM=; b=abF5Co2lrtu23si2Fw5nHFz8wmqTK4NTp37pASUCAcQkoBLRGHLNGc7cz73ZTT8Nqn jyDps0lPEyEBiHGk1zIOHbjK2O90HrWUqPHkPxFqfv59UiDbM9Q38NTemsH5Ec4x86kM pIszLHkZ+SdGK16rUXxiX2EE6vnXuG56PMYvFsKH5FTSNNRQp734AsprdvjkDRqZS4qz b6raionfg1bHPpkRHu9r/pmGXjchbBFKqAQqRW5uDJa39q673sRJhzKyPvFLz9wH5O6r pDbcwHyOig90y5j8ewVF4Lb5h+BId3Xvw8tjIfUKVjyeZ/ilr+pXiy4VZSo/A0uNFHfb UN0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=unK1SdoItVX91UiCICB5TzwFBKcA3xXglCBFw7t9foM=; b=ClbJO6BS78mTnNtJqD5Wh4EZCIjGyunhDzDDVt+QO3aQQpHKhDtdN761TSrZI23H7w pzMiFYiw5JELM9jVlSZm0w3VADaA9LF8KzATYFp3UzO/vpi6+f92KmC9FxElEslR+8p9 vQVatcfJx1NUuLrk3bPRDxjx3PdSwdRLGeVUcKV4QYHOLbZbODs2Lj++XrmtyBtLCW2b tqULbxS/maGIDOpwkDtnbWtu4kfSYHKKAQIKeIBpwzisjNPORALn/oEt1WZjWN0iGXcd LyLJVsYazFk1BkU0KlJPNXSUqzsdEWwi47aK19NsrYQBAgdMlFcSeeezmFJdD42MDnZr DMvQ== X-Gm-Message-State: AO0yUKV5vqrk1JgmZ1vbKNbbclNYqz+P97arUhh/xpho3NEKz+Ccyhno EtBi79mtuci8RV0CFeYDvaY= X-Google-Smtp-Source: AK7set+ofis14Prn/XgYT6q+hDc1H7TKYkq87cfQkRJSpL4UmwnlEs7e5aYF3MGIwPHj8pB3wTk44A== X-Received: by 2002:a17:907:888b:b0:878:6519:c740 with SMTP id rp11-20020a170907888b00b008786519c740mr24931811ejc.44.1677159114868; Thu, 23 Feb 2023 05:31:54 -0800 (PST) Original-Received: from ars3 ([2a02:8109:8ac0:56d0::6fd0]) by smtp.gmail.com with ESMTPSA id h4-20020a50c384000000b004af62273b66sm1998834edf.18.2023.02.23.05.31.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 05:31:54 -0800 (PST) In-Reply-To: <831qmgr17p.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 23 Feb 2023 14:54:50 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256463 Archived-At: On Thu, 23 Feb 2023 at 14:54, Eli Zaretskii wrote: >> But just to confirm: position-bytes and byte-to-position are always with >> respect to Emacs's internal extended UTF-8 representation and have >> nothing to do with the buffer file enconding, right? > > Yes. See bufferpos-to-filepos to get an idea of what hoops we need to > jump through to get it right, even just with UTF-8. Okay, then we're on the same page. Just to emphasize, the buffer file is totally irrelevant for Eglot's purposes. The only thing that matters is the representation of the buffer text when it's serialized as an UTF-8-encoded string inside a JSON object. >> > What does this stuff do with double-width or zero-width characters? >> > Emacs takes character-width into consideration when it counts columns, >> > but it is unclear to me what do LSP servers do in those cases. >> > Likewise with characters that are composed on display. >>=20 >> `eglot-move-to-column' is supposed so count Unicode codepoints, so >> e.g. x, =E2=87=92 and =F0=9F=98=83 all contribute 1 unit. > > But if the resulting column is then used in move-to-column etc., it > might go to the wrong column, because in Emacs each column is not > necessarily a single codepoint. The simplest example is a TAB > character, but there are more examples, some of which are quite > complicated (see below). There's only one function that uses `move-to-column'. It's very old and I didn't touch it. >> One the other hand, the Emoji >> =F0=9F=A7=9B=E2=80=8D=E2=99=80=EF=B8=8F contributes 4 units. This is ind= ependent of with screen display. > > Not in Emacs. Sorry, I don't understand what you mean. Emas has no say as to how Emoji are represented as sequences of codepoints. The female vampire Emoji is 4 codepoints, if I'm counting it right. Of course I undestand taht the Emoji occupies 1 column in my screen. >> By the way, I don't undertand your claim about column counting. If I >> move point over =F0=9F=A7=9B=E2=80=8D=E2=99=80=EF=B8=8F, the mode line c= olumn count increments by 3 units, >> which seems to make no sense: this Emoji is 4 codepoints longs and >> occupies 1 screen column. What's the logic here? > > If that is what you see, it could be a bug. Does current-column agree > with what you see in the mode line? Yes. > In general, characters (codepoints) that are composed on display into > a single glyph or "grapheme cluster" are supposed to be counted as a > single column. Try typing this in "emacs -Q" > > a C-x 8 RET COMBINING ACUTE ACCENT RET > > If your default font is capable enough, you will see a single glyph of > 'a' with acute accent (=C3=A1), and it will count as 1 column, although > there are 2 codepoints in the buffer. And "M-: (move-to-column 1) RET" > will move past both codepoints. Now imagine that we get such sequences > from the LSP server -- what will Eglot do in terms of column counting? Right, I undestand the Unicode business (thanks for the pointers in any case). If you look carefully at the Eglot code, you will see that `move-to-column' only appears in the code pertaining the =E2=80=9CUTF-16 wa= y of counting offsets=E2=80=9D, which 1. is old and I didn't touch in this patch, 2. seems to work correctly, despite looking suspicious, and 3. will not be used anymore when both Eglot and the LSP server supports the positionEncodings capabitily. I hope this motivates you to add this feature =F0=9F=99=82.