From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Re: Bug #25608 and the comment-cache branch Date: Thu, 23 Feb 2017 16:23:39 +0200 Message-ID: <9f4ac18b-6453-bee8-83d8-f452392adbb9@yandex.ru> References: <20170202202418.GA2505@acm> <83lgtouxpf.fsf@gnu.org> <20170202215154.GB2505@acm> <83h94bvhzw.fsf@gnu.org> <20170203172952.GC2250@acm> <0a40d539-b7bc-2655-5429-6280022106ee@yandex.ru> <20170204102410.GA2047@acm> <8f9e68fc-4314-625d-b4bf-796c71c91798@yandex.ru> <20170206192423.GB3568@acm> <4f0fabf3-be9c-7492-379b-59dc93e72b4f@yandex.ru> <20170207192119.GA2490@acm> <424e6409-029c-d15d-421c-4fb90594329c@yandex.ru> <195629e9-11d6-2fb6-4c9d-39c8a244e2ec@yandex.ru> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1487859851 10889 195.159.176.226 (23 Feb 2017 14:24:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 23 Feb 2017 14:24:11 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 23 15:24:00 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cguJK-0001mV-Fr for ged-emacs-devel@m.gmane.org; Thu, 23 Feb 2017 15:23:58 +0100 Original-Received: from localhost ([::1]:59014 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cguJQ-0001ZT-E1 for ged-emacs-devel@m.gmane.org; Thu, 23 Feb 2017 09:24:04 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33481) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cguJE-0001YL-8D for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:53 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cguJ7-0005lj-Vj for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:52 -0500 Original-Received: from mail-wr0-x242.google.com ([2a00:1450:400c:c0c::242]:34054) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cguJ7-0005lF-M9 for emacs-devel@gnu.org; Thu, 23 Feb 2017 09:23:45 -0500 Original-Received: by mail-wr0-x242.google.com with SMTP id 89so4094297wrr.1 for ; Thu, 23 Feb 2017 06:23:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=EyZ8g75MkOi00dBSXGyn37F+sVXe9fcYjHgFASoMyiU=; b=th1uUlgmcstuN89kW85rX6OjXKPCdEZPGacBkyFgSAKz/qiogHT1QNN74FNH+fUcSC /9rAesq6OeqRIcl0YLfIg9WcpvnWc2ihKzdvifoG4Tid5VwKlSLeTsMsgqxaHZ9n1w6N 5VICViLAx3w68uSYUe5RJa7J5rKCmmdu1y9RtjdYDojce9UgujmEmkXbVsHx19Arcn+B 0seZlPz/AWjE7r4cEGeDaZK9JrG62m0sm2fsoqPW0TFij3dSs3vllefulpPZFvsW6+0s T1JgZihs5MyIJQbNuHkULa8iAY6SYgV4DIn3exUKnrV4eeYsBN50CgPTZKXEJCcg0ua0 vuKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=EyZ8g75MkOi00dBSXGyn37F+sVXe9fcYjHgFASoMyiU=; b=mdEPRhsw4nz/xsiOlCFuvKcvwoOLtbM/JDbDbIo5nd14f8uEMoIkOtNUi9ErGAxho6 fam1X9FIRMF3lS0Kolwnj1jS0Wlnq0bujfkpeq3yRwK5aAyBLqWTKGcCbgGTDi2rlCzD M2WNWtDoLwbryqj4AJOqGrLoMTcfaOomEn/0KR1WG8k1zuzTOVNJn/sQfxNNr18uJnOC pywsak74CNW89pY+LY9uYnWxvBahWdV+kz6ib+mWlUBJ6MoU2yxevXLjuL7fRu+Brzdl iAF2PBQopqZMLJCnAUN6o3MvbChvYLbNGbHAuZI5eOABu1/eBd/cyRRaYrjLF+W9K1Ir yhRw== X-Gm-Message-State: AMke39l9HrJK3O9nu24/0dEEKTkrOiaYAqona32anVBm7buBhp51G2W4j4AggvwGI8sICw== X-Received: by 10.223.141.229 with SMTP id o92mr31264568wrb.22.1487859822130; Thu, 23 Feb 2017 06:23:42 -0800 (PST) Original-Received: from [192.168.0.133] ([212.50.99.193]) by smtp.googlemail.com with ESMTPSA id b8sm3569310wrb.9.2017.02.23.06.23.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Feb 2017 06:23:41 -0800 (PST) In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c0c::242 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:212544 Archived-At: On 22.02.2017 05:53, Stefan Monnier wrote: >> I see, thanks. And I think that means that, ideally, it would work without >> the caller having to adjust the syntax visibility bounds, or the like, as >> long as the syntax table is correct and the beginning (or the end) of the >> currently navigated comment is within view. > > Right, but not reliably so: very often we need to parse backward not > just until the matching starter but until the previous closer (to make > sure the starter we saw was not itself within an earlier comment), and > in other cases the mix of comment markers and string markers make it > impossible to guess if we were really inside a comment, so we end up > falling back on the forward-parse code. Naturally, we'd need to save more information to be able to do that. E.g. propertize the end of a complex comment with the position of its beginning. Since the first time we go through a buffer is in the forward direction, getting that info would be inexpensive. > Actually, it's very straightforward: the forward parse already gives us > the beginning of the surrounding element, so we just re-do the forward > parse from that spot. It's just a matter of wrapping the code inside > a loop. You're likely a better judge of that. It does sound a bit convoluted to me (and having to deal with different kinds of comments adds its complexity), but not something that having a handful of tests wouldn't keep straight. > No. "Previously", we typically scan the line backward and stop as soon > as we hit the previous \n (which tells us that no comment can start > earlier than that if it finishes with a \n). > > In a few cases, we do fallback on the forward parse code, in which case > indeed we'll take longer, but those are normally rare (which is why this > comment-cache and my syntax-ppss-patch haven't been installed yet: they > improve the performance of a case that's somewhat infrequent). I see, thanks. >> Perhaps we could use the "generic comment bounds" syntax-table property to >> delineate such difficult comments. If that idea sounds similar to >> comment-cache, that is no accident. > > Maybe. Obviously, my syntax-ppss hammer makes me think that such > alternate solutions aren't needed: syntax-ppss solves this case without > having to try and come out with a clever way to detect which comments > are tricky nor how to mark them. The alternative tweak I had in mind would be applied somewhere around syntax-propertize. So it would be a matter of trading off one bit of complexity for another, still staying within the framework of syntax-ppss. > Not completely sure either. I've had vague ideas of adding some kind of > hook to syntax-tables, i.e. add a new kind of syntax element which ends > up calling an Elisp function of your choice so you can make it "do the > right thing" for the particular construct. I think just having paired syntactic elements would suffice. Or just propertizing the whole subregion with one text property span. Whichever would be easier to process. Not sure about using the syntax-table property for this. In some weird cases there won't be a space of a newline to put these syntax-table values on. And a newline staying a newline might be syntactically important for the primary major mode somewhere. Another thing to consider is that we would probably want to fontify the contents of all subregions normally, even when inside comments belonging to the outer mode. So the primitives used in font-lock-fontify-syntactically-region would need to be able to stop at those boundaries instead of automatically skipping over. > So when scanning (forward or backward), if we bump into an element with > that syntax (typically applied as a syntax-table text-property), we call > the function which will know how to jump over the sub-region or will > signal an "end of sub-region" error. Just having those hooks won't be enough, we still don't have enough info how to syntax-propertize the subregion contents, for instance. So I'm not sure what the flexibility of using the functions here would buy us.