From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: font-lock-syntactic-keywords obsolet? Date: Sun, 10 Jul 2016 22:11:51 +0000 Message-ID: <20160710221151.GA3551@acm.fritz.box> References: <20160620181218.GC2192@acm.fritz.box> <20160620200830.GE2192@acm.fritz.box> <18697155-06d3-2191-6a6b-3ea58e8d17cb@yandex.ru> <20160621144047.GB3177@acm.fritz.box> <20160623163021.GA4946@acm.fritz.box> <7762a6a6-9554-945d-cc5a-4a14157eaeb0@yandex.ru> <20160630095215.GB3082@acm.fritz.box> <9b26e260-337c-36ea-5d85-6e955fa36c3a@yandex.ru> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1468188724 24952 80.91.229.3 (10 Jul 2016 22:12:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 10 Jul 2016 22:12:04 +0000 (UTC) Cc: Noam Postavsky , emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 11 00:11:55 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bMMx9-0006Ym-5j for ged-emacs-devel@m.gmane.org; Mon, 11 Jul 2016 00:11:55 +0200 Original-Received: from localhost ([::1]:57005 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bMMx4-0001ED-UK for ged-emacs-devel@m.gmane.org; Sun, 10 Jul 2016 18:11:50 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34709) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bMMwy-0001E6-0g for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:45 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bMMwu-0008G4-Oj for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:43 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:23994) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bMMwu-0008Fb-DD for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:40 -0400 Original-Received: (qmail 67724 invoked by uid 3782); 10 Jul 2016 22:11:37 -0000 Original-Received: from acm.muc.de (p548C72A6.dip0.t-ipconnect.de [84.140.114.166]) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 11 Jul 2016 00:11:35 +0200 Original-Received: (qmail 6019 invoked by uid 1000); 10 Jul 2016 22:11:51 -0000 Content-Disposition: inline In-Reply-To: <9b26e260-337c-36ea-5d85-6e955fa36c3a@yandex.ru> User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:205525 Archived-At: Hello, Dmitry. On Sun, Jul 10, 2016 at 05:01:55AM +0300, Dmitry Gutov wrote: > Hi Alan, > Sorry for the late response. No problems! > On 06/30/2016 12:52 PM, Alan Mackenzie wrote: > >> So now the raw strings are properly using limits? Does that mean there > >> is a limit on the length of a raw string that CC Mode supports? (Testing > >> indicates so). > > There isn't any limit on the length of a raw string that I know about, > > nor should there be. If you've got a test which shows there is such a > > limit, please tell me about it! > Hmm, maybe not a limit, but long raw strings still aren't getting > handled right. Yes. > Example 1: > - Apply the attached patch to xdisp.c, which should make most of the > code belong within the raw string literal. > - Visit this file. Switch to c++-mode. > - See the literal highlighted as expected. > Press `M->', to get to the end of the buffer (that happens rather > slowly, esp. considering we're inside a string, and font-lock can get > this information quickly). > The literal ends at )foo". > - Modify the trailing "foo" piece: delete it, or replace with "bar", etc > => the literal still ends at the same line. > I have to go back to the opener and fiddle with the delimiter there, for > it to finally notice that something is wrong. > If the raw string is small, on the other hand, I don't see this problem. > Example 2: > - Visit this file. Switch to c++-mode. > - See the literal highlighted as expected. > - M->. > Kill the closing delimiter and paste it a few lines below the opening > delimiter. See the new positions of the raw string recognized (or not, > I'm getting different results). But if they are recognized... > - Call `undo' a few times, until the closing delimiter is back at its > original position. The literal is broken again. There was some code in the mix designed to stop too much expansion of a region when there were humongous macros. (A few years back somebody had complained about the speed in processing a ~5,000 line macro.) Unfortunately, this code got caught up in raw string processing. I hope the following patch fixes it. I know that the processing is currently slow in such a large raw string. It is probably possible to optimise this. Whether it is worthwhile is the question. diff -r 2fcfc6e054b3 cc-mode.el --- a/cc-mode.el Sun Jul 03 17:54:20 2016 +0000 +++ b/cc-mode.el Sun Jul 10 21:53:29 2016 +0000 @@ -906,14 +906,16 @@ ;; before change function. (goto-char c-new-BEG) (c-beginning-of-macro) - (setq c-new-BEG (point)) + (when (< (point) c-new-BEG) + (setq c-new-BEG (max (point) (c-determine-limit 500 c-new-BEG)))) (goto-char c-new-END) (when (c-beginning-of-macro) (c-end-of-macro) (or (eobp) (forward-char))) ; Over the terminating NL which may be marked ; with a c-cpp-delimiter category property - (setq c-new-END (point))) + (when (> (point) c-new-END) + (setq c-new-END (min (point) (c-determine-+ve-limit 500 c-new-END))))) (defun c-depropertize-new-text (beg end old-len) ;; Remove from the new text in (BEG END) any and all text properties which @@ -941,15 +943,17 @@ ;; Point is undefined on both entry and exit to this function. The buffer ;; will have been widened on entry. ;; + ;; c-new-BEG has already been extended in `c-extend-region-for-CPP' so we + ;; don't need to repeat the exercise here. + ;; ;; This function is in the C/C++/ObjC value of `c-before-font-lock-functions'. (goto-char endd) - (if (c-beginning-of-macro) - (c-end-of-macro)) - (setq c-new-END (max endd c-new-END (point))) - ;; Determine the region, (c-new-BEG c-new-END), which will get font - ;; locked. This restricts the region should there be long macros. - (setq c-new-BEG (max c-new-BEG (c-determine-limit 500 begg)) - c-new-END (min c-new-END (c-determine-+ve-limit 500 endd)))) + (when (c-beginning-of-macro) + (c-end-of-macro) + ;; Determine the region, (c-new-BEG c-new-END), which will get font + ;; locked. This restricts the region should there be long macros. + (setq c-new-END (min (max c-new-END (point)) + (c-determine-+ve-limit 500 c-new-END))))) (defun c-neutralize-CPP-line (beg end) ;; BEG and END bound a region, typically a preprocessor line. Put a > > The "limit" in my previous post was a bound supplied as an argument to > > c-font-lock-declarators, which does what it says. > I'm confused. If, as we discussed before, syntax properties are applied > in before/after-functions, why does c-font-lock-declarations need to be > concerned with scanning for raw string bounds? The raw string bounds have nothing to do with c-font-lock-declarators. It's just that that function takes a bound which was hardly ever needed, since the function stops when it reaches something which signalled the end of a sequence of declarators (for example, a semicolon). Hence the fact that the bound given was wrong didn't get noticed. However, with raw strings in the game, when (point-max) was the bound, the function actually ended up fruitlessly scanning to (point-max) rather than to the `limit' it should have been scanning to. The limit to c-font-lock-declarators is now `(min limit (point-max))'. That way, when the buffer is narrowed to less than `limit', there won't be an out of bounds error, and when there are unterminated raw strings, there won't be useless scanning past `limit' either. I'm not sure if the above will help much, but I hope it does. -- Alan Mackenzie (Nuremberg, Germany).