From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: access to raw buffer text from module Date: Thu, 05 Dec 2019 17:01:41 -0800 Message-ID: <86zhg6mfm2.fsf_-_@stephe-leake.org> References: <87eexlb1d4.fsf@randomsample> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="204420"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (windows-nt) To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 06 02:02:42 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1id212-000r1p-UO for ged-emacs-devel@m.gmane.org; Fri, 06 Dec 2019 02:02:41 +0100 Original-Received: from localhost ([::1]:34394 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1id211-0003Xa-5C for ged-emacs-devel@m.gmane.org; Thu, 05 Dec 2019 20:02:39 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43520) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1id20E-0003XG-Vs for emacs-devel@gnu.org; Thu, 05 Dec 2019 20:01:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1id20D-0004MV-1T for emacs-devel@gnu.org; Thu, 05 Dec 2019 20:01:50 -0500 Original-Received: from gateway36.websitewelcome.com ([192.185.194.2]:13234) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1id20C-0004Ef-AC for emacs-devel@gnu.org; Thu, 05 Dec 2019 20:01:48 -0500 Original-Received: from cm16.websitewelcome.com (cm16.websitewelcome.com [100.42.49.19]) by gateway36.websitewelcome.com (Postfix) with ESMTP id DCCF3401BE0BA for ; Thu, 5 Dec 2019 18:11:57 -0600 (CST) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id d20AiRx7lOdBHd20AicOK8; Thu, 05 Dec 2019 19:01:46 -0600 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Type:MIME-Version:Message-ID: In-Reply-To:Date:References:Subject:To:From:Sender:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JezO12uaU9/JK3pcJIkqd8ytZ9ATUmvtIje1D9da0Ps=; b=a8qUGDW79/i03jBV1ovzyifX9 wyWV6RUr5m+/CNNm/jDdiVH4F2GVYPMsZ4atKL7LJfP+LAfPFS1EzgNWFzTkbsVSvvAeV8YC9f58U XxY7U7acwp0eHricPwQzlsocf8Xj0Q3EfoP1bPYqS6vA9cLun0ZuutJvh/lz6KIcH1gRxPsuMCIt4 IoJ7yXlbNH9zoHi1WBZp6kLfhpO+59mEWjMNfIjbsvGCIwaEdaPv6OfyHRAKJBFtAZm+78ptzmG/t 4YIWpNJkCsumo0Ir4cI7hxANwzBGZZn26k5RWcHikJ88cdMG+Sx3ENhq8NSCmMH7y+RwseJrC5LFF FN7QFukcg==; Original-Received: from [76.77.182.20] (port=63018 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from ) id 1id20A-003Cst-4O for emacs-devel@gnu.org; Thu, 05 Dec 2019 18:01:46 -0700 In-Reply-To: (Stefan Monnier's message of "Thu, 05 Dec 2019 09:11:11 -0500") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1id20A-003Cst-4O X-Source-Sender: (Takver4) [76.77.182.20]:63018 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 1 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 192.185.194.2 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:243174 Archived-At: Stefan Monnier writes: >> A related but different question. Would it be possible to get access to >> the raw buffer data from dynamic modules? (That is, pointer to the start, >> length and gap information.) > > You might like to talk with Stephen Leake > . > IIUC he wrote a dynamic module which parses the buffer. AFAICT he > didn't use such a "raw" access, so it'd be interesting to hear about > his experience. No, I sent the buffer content as a string. I was hoping to avoid that copy, but other things turned out to be way slower (creating _lots_ of text properties), so I went back to a separate process, and made that faster (doing more stuff in the process, so fewer text properties are needed). >> I'm only interested in read-only access, and I'd be happy to patch it >> in myself if it's deemed generally acceptable. > > It would tend to expose internal data subject to change (and offer the > ability to change this data in a way that can break some invariants), so > it's definitely not in the style of the current module interface. > > But we may be able to provide a slightly less "raw" access that doesn't > suffer in the same way. So details about your particular needs would be > helpful to try and figure out what we can do (i.e. tell us the problems > you face when using `char-after` or `buffer-substring`, which would be > the main ways I can think of to access the buffer's content with the > current module API). In my case, I wanted raw speed when lexing the source text. The lexer I'm using can handle utf-8, when given a start address and byte length. Allowing for a gap would mean checking for that at each byte, which might slow things down as much as copying. But lexing is a _very_ small portion of the total parse time, so it's really not worth worrying about the copy either; even sending the text to a separate process does not take a noticeable amount of time. If I convert to LSP style (https://langserver.org/), then the full text is sent once, and only edits are sent after that, making the copy issue irrelevant. -- -- Stephe