From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Re: Reliable after-change-functions (via: Using incremental parsing in Emacs) Date: Fri, 03 Apr 2020 21:46:00 +0300 Message-ID: <83zhbsv253.fsf@gnu.org> References: <83369o1khx.fsf@gnu.org> <83imijz68s.fsf@gnu.org> <831rp7ypam.fsf@gnu.org> <86wo6yhj4d.fsf@stephe-leake.org> <83o8sax803.fsf@gnu.org> <86pncpffmk.fsf@stephe-leake.org> <83y2rdvwm1.fsf@gnu.org> <86zhbse8xy.fsf@stephe-leake.org> Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="57212"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Apr 03 20:47:48 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jKRM3-000Eiq-0n for ged-emacs-devel@m.gmane-mx.org; Fri, 03 Apr 2020 20:47:47 +0200 Original-Received: from localhost ([::1]:59726 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKRM2-0006fl-3M for ged-emacs-devel@m.gmane-mx.org; Fri, 03 Apr 2020 14:47:46 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:38966) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKRKb-0005kg-Sp for emacs-devel@gnu.org; Fri, 03 Apr 2020 14:46:18 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:55726) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jKRKb-0005e9-3O; Fri, 03 Apr 2020 14:46:17 -0400 Original-Received: from [176.228.60.248] (port=2582 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jKRKX-0003SS-FP; Fri, 03 Apr 2020 14:46:16 -0400 In-Reply-To: <86zhbse8xy.fsf@stephe-leake.org> (message from Stephen Leake on Fri, 03 Apr 2020 10:11:05 -0800) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:246357 Archived-At: > From: Stephen Leake > Date: Fri, 03 Apr 2020 10:11:05 -0800 > > > Emacs represents buffer text as a superset of UTF-8, with the > > violations of strict UTF-8 being very rare in buffers that hold > > program sources. The function we can provide that lets tree-sitter > > access buffer text can cope with those violations, > > Ok. "cope with those violations" = "do some encoding". If we use "encoding" terminology for this, it will be confusing and will cause misunderstandings. "Conversion" is better, IMO. Some sequences may need to be converted when feeding them to tree-sitter. But I think tree-sitter should be able to cope with this itself. It is unreasonable to expect strict UTF-8 from all applications. Maybe I'm dreaming, but ISTR there is (or was) an issue on their issue tracker about this. > We cannot avoid encoding (either in the read function provided to > tree-sitter, or in the tree-sitter lexer), but the encoding may be very > simple and efficient. Once again, please reserve "encoding" to the likes of encode-coding-region or code_convert_string, to avoid confusion.