From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability Date: Thu, 23 Feb 2023 14:54:50 +0200 Message-ID: <831qmgr17p.fsf@gnu.org> References: <87a614g628.fsf@gmail.com> <83cz60r7hu.fsf@gnu.org> <875ybsfvtj.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30972"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 61726@debbugs.gnu.org, joaotavora@gmail.com To: Augusto Stoffel Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Feb 23 13:56:55 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pVB9a-0007uo-V4 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 23 Feb 2023 13:56:55 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pVB8n-0004B7-5t; Thu, 23 Feb 2023 07:56:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pVB8k-000473-Ns for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 07:56:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pVB8k-0008IL-5k for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 07:56:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pVB8j-0004BA-U0 for bug-gnu-emacs@gnu.org; Thu, 23 Feb 2023 07:56:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 23 Feb 2023 12:56:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61726 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 61726-submit@debbugs.gnu.org id=B61726.167715690215983 (code B ref 61726); Thu, 23 Feb 2023 12:56:01 +0000 Original-Received: (at 61726) by debbugs.gnu.org; 23 Feb 2023 12:55:02 +0000 Original-Received: from localhost ([127.0.0.1]:33315 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pVB7m-00049i-6G for submit@debbugs.gnu.org; Thu, 23 Feb 2023 07:55:02 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:59346) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pVB7j-00049B-O3 for 61726@debbugs.gnu.org; Thu, 23 Feb 2023 07:55:00 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pVB7e-00081q-Eg; Thu, 23 Feb 2023 07:54:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=SJNm4eMP1bWtPDn98xJxSS1JTsa4mYVlkpJpG4blvMM=; b=iu38bOJxh+6r5SWmVupB yaY27bXaDDcEkeUsE9iddH9qGU89coyvWUEkUwE8S/S6h/RVGdwh9zVLy5r5M/WdWBUl1qttsFlra Kepfw00+VeCygPYSEEhUjRZbXA42fWOf5RmtCkq9PNWPH0R+j3weovN9hTLf4oP6D8WHnxrydggwV 5v5uwoOBgZ3c3uLo4M1LbmkDFFt78Jy9rA5RU3AkMTg4QKPLgt9zoIJjBtHmRwfKEaXfX/HvgUbD/ wgv8pRJNsKyfy0AkEnGDUZisdrQMn1/xE2CCVbRVXSXTwCP0TXD1rt5wzDcYLyP6EnvkNU/a6CRIe oBzWjdE5KWY4bQ==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pVB7d-00073T-UA; Thu, 23 Feb 2023 07:54:54 -0500 In-Reply-To: <875ybsfvtj.fsf@gmail.com> (message from Augusto Stoffel on Thu, 23 Feb 2023 12:46:48 +0100) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:256451 Archived-At: > From: Augusto Stoffel > Cc: 61726@debbugs.gnu.org, joaotavora@gmail.com > Date: Thu, 23 Feb 2023 12:46:48 +0100 > > >> +(defun eglot--current-column-utf-8 () > >> + "Calculate current column, counting bytes." > >> + (- (position-bytes (point)) (position-bytes (line-beginning-position)))) > > > > This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it > > counts the bytes in the internal representation Emacs uses for buffer > > and string text. The differences are minor and subtle, but not > > negligible. > > Right, if the buffer contains a char outside of the Unicode range, we > lose. > > But just to confirm: position-bytes and byte-to-position are always with > respect to Emacs's internal extended UTF-8 representation and have > nothing to do with the buffer file enconding, right? Yes. See bufferpos-to-filepos to get an idea of what hoops we need to jump through to get it right, even just with UTF-8. > > What does this stuff do with double-width or zero-width characters? > > Emacs takes character-width into consideration when it counts columns, > > but it is unclear to me what do LSP servers do in those cases. > > Likewise with characters that are composed on display. > > `eglot-move-to-column' is supposed so count Unicode codepoints, so > e.g. x, โ‡’ and ๐Ÿ˜ƒ all contribute 1 unit. But if the resulting column is then used in move-to-column etc., it might go to the wrong column, because in Emacs each column is not necessarily a single codepoint. The simplest example is a TAB character, but there are more examples, some of which are quite complicated (see below). > One the other hand, the Emoji > ๐Ÿง›โ€โ™€๏ธ contributes 4 units. This is independent of with screen display. Not in Emacs. > By the way, I don't undertand your claim about column counting. If I > move point over ๐Ÿง›โ€โ™€๏ธ, the mode line column count increments by 3 units, > which seems to make no sense: this Emoji is 4 codepoints longs and > occupies 1 screen column. What's the logic here? If that is what you see, it could be a bug. Does current-column agree with what you see in the mode line? In general, characters (codepoints) that are composed on display into a single glyph or "grapheme cluster" are supposed to be counted as a single column. Try typing this in "emacs -Q" a C-x 8 RET COMBINING ACUTE ACCENT RET If your default font is capable enough, you will see a single glyph of 'a' with acute accent (รก), and it will count as 1 column, although there are 2 codepoints in the buffer. And "M-: (move-to-column 1) RET" will move past both codepoints. Now imagine that we get such sequences from the LSP server -- what will Eglot do in terms of column counting?