From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.devel
Subject: Re: font-lock-syntactic-keywords obsolet?
Date: Sun, 10 Jul 2016 22:11:51 +0000
Message-ID: <20160710221151.GA3551@acm.fritz.box>
References: <20160620181218.GC2192@acm.fritz.box>
	<d53960fe-2805-0b77-7ddf-6f44d3a2bb55@yandex.ru>
	<20160620200830.GE2192@acm.fritz.box>
	<18697155-06d3-2191-6a6b-3ea58e8d17cb@yandex.ru>
	<20160621144047.GB3177@acm.fritz.box>
	<f807a7e9-ee41-0ea9-0a24-d1ee7bb85d7d@yandex.ru>
	<20160623163021.GA4946@acm.fritz.box>
	<7762a6a6-9554-945d-cc5a-4a14157eaeb0@yandex.ru>
	<20160630095215.GB3082@acm.fritz.box>
	<9b26e260-337c-36ea-5d85-6e955fa36c3a@yandex.ru>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1468188724 24952 80.91.229.3 (10 Jul 2016 22:12:04 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 10 Jul 2016 22:12:04 +0000 (UTC)
Cc: Noam Postavsky <npostavs@users.sourceforge.net>, emacs-devel@gnu.org
To: Dmitry Gutov <dgutov@yandex.ru>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Jul 11 00:11:55 2016
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1bMMx9-0006Ym-5j
	for ged-emacs-devel@m.gmane.org; Mon, 11 Jul 2016 00:11:55 +0200
Original-Received: from localhost ([::1]:57005 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1bMMx4-0001ED-UK
	for ged-emacs-devel@m.gmane.org; Sun, 10 Jul 2016 18:11:50 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34709)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <acm@muc.de>)
	id 1bMMwy-0001E6-0g
	for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:45 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <acm@muc.de>) id 1bMMwu-0008G4-Oj
	for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:43 -0400
Original-Received: from mail.muc.de ([193.149.48.3]:23994)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <acm@muc.de>)
	id 1bMMwu-0008Fb-DD
	for emacs-devel@gnu.org; Sun, 10 Jul 2016 18:11:40 -0400
Original-Received: (qmail 67724 invoked by uid 3782); 10 Jul 2016 22:11:37 -0000
Original-Received: from acm.muc.de (p548C72A6.dip0.t-ipconnect.de [84.140.114.166]) by
	colin.muc.de (tmda-ofmipd) with ESMTP;
	Mon, 11 Jul 2016 00:11:35 +0200
Original-Received: (qmail 6019 invoked by uid 1000); 10 Jul 2016 22:11:51 -0000
Content-Disposition: inline
In-Reply-To: <9b26e260-337c-36ea-5d85-6e955fa36c3a@yandex.ru>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-Delivery-Agent: TMDA/1.1.12 (Macallan)
X-Primary-Address: acm@muc.de
X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x
X-Received-From: 193.149.48.3
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:205525
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/205525>

Hello, Dmitry.

On Sun, Jul 10, 2016 at 05:01:55AM +0300, Dmitry Gutov wrote:
> Hi Alan,

> Sorry for the late response.

No problems!

> On 06/30/2016 12:52 PM, Alan Mackenzie wrote:

> >> So now the raw strings are properly using limits? Does that mean there
> >> is a limit on the length of a raw string that CC Mode supports? (Testing
> >> indicates so).

> > There isn't any limit on the length of a raw string that I know about,
> > nor should there be.  If you've got a test which shows there is such a
> > limit, please tell me about it!

> Hmm, maybe not a limit, but long raw strings still aren't getting 
> handled right.

Yes.

> Example 1:

> - Apply the attached patch to xdisp.c, which should make most of the 
> code belong within the raw string literal.
> - Visit this file. Switch to c++-mode.
> - See the literal highlighted as expected.

> Press `M->', to get to the end of the buffer (that happens rather 
> slowly, esp. considering we're inside a string, and font-lock can get 
> this information quickly).

> The literal ends at )foo".

> - Modify the trailing "foo" piece: delete it, or replace with "bar", etc 
> => the literal still ends at the same line.

> I have to go back to the opener and fiddle with the delimiter there, for 
> it to finally notice that something is wrong.

> If the raw string is small, on the other hand, I don't see this problem.

> Example 2:

> - Visit this file. Switch to c++-mode.
> - See the literal highlighted as expected.
> - M->.

> Kill the closing delimiter and paste it a few lines below the opening 
> delimiter. See the new positions of the raw string recognized (or not, 
> I'm getting different results). But if they are recognized...

> - Call `undo' a few times, until the closing delimiter is back at its 
> original position. The literal is broken again.

There was some code in the mix designed to stop too much expansion of
a region when there were humongous macros.  (A few years back somebody
had complained about the speed in processing a ~5,000 line macro.)
Unfortunately, this code got caught up in raw string processing.  I hope
the following patch fixes it.  I know that the processing is currently
slow in such a large raw string.  It is probably possible to optimise
this.  Whether it is worthwhile is the question.


diff -r 2fcfc6e054b3 cc-mode.el
--- a/cc-mode.el	Sun Jul 03 17:54:20 2016 +0000
+++ b/cc-mode.el	Sun Jul 10 21:53:29 2016 +0000
@@ -906,14 +906,16 @@
   ;; before change function.
   (goto-char c-new-BEG)
   (c-beginning-of-macro)
-  (setq c-new-BEG (point))
+  (when (< (point) c-new-BEG)
+    (setq c-new-BEG (max (point) (c-determine-limit 500 c-new-BEG))))
 
   (goto-char c-new-END)
   (when (c-beginning-of-macro)
     (c-end-of-macro)
     (or (eobp) (forward-char)))	 ; Over the terminating NL which may be marked
 				 ; with a c-cpp-delimiter category property
-  (setq c-new-END (point)))
+  (when (> (point) c-new-END)
+    (setq c-new-END (min (point) (c-determine-+ve-limit 500 c-new-END)))))
 
 (defun c-depropertize-new-text (beg end old-len)
   ;; Remove from the new text in (BEG END) any and all text properties which
@@ -941,15 +943,17 @@
   ;; Point is undefined on both entry and exit to this function.  The buffer
   ;; will have been widened on entry.
   ;;
+  ;; c-new-BEG has already been extended in `c-extend-region-for-CPP' so we
+  ;; don't need to repeat the exercise here.
+  ;;
   ;; This function is in the C/C++/ObjC value of `c-before-font-lock-functions'.
   (goto-char endd)
-  (if (c-beginning-of-macro)
-      (c-end-of-macro))
-  (setq c-new-END (max endd c-new-END (point)))
-  ;; Determine the region, (c-new-BEG c-new-END), which will get font
-  ;; locked.  This restricts the region should there be long macros.
-  (setq c-new-BEG (max c-new-BEG (c-determine-limit 500 begg))
-	c-new-END (min c-new-END (c-determine-+ve-limit 500 endd))))
+  (when (c-beginning-of-macro)
+    (c-end-of-macro)
+    ;; Determine the region, (c-new-BEG c-new-END), which will get font
+    ;; locked.  This restricts the region should there be long macros.
+    (setq c-new-END (min (max c-new-END (point))
+			 (c-determine-+ve-limit 500 c-new-END)))))
 
 (defun c-neutralize-CPP-line (beg end)
   ;; BEG and END bound a region, typically a preprocessor line.  Put a


> > The "limit" in my previous post was a bound supplied as an argument to
> > c-font-lock-declarators, which does what it says.

> I'm confused. If, as we discussed before, syntax properties are applied 
> in before/after-functions, why does c-font-lock-declarations need to be 
> concerned with scanning for raw string bounds?

The raw string bounds have nothing to do with c-font-lock-declarators.
It's just that that function takes a bound which was hardly ever needed,
since the function stops when it reaches something which signalled the
end of a sequence of declarators (for example, a semicolon).  Hence the
fact that the bound given was wrong didn't get noticed.  However, with
raw strings in the game, when (point-max) was the bound, the function
actually ended up fruitlessly scanning to (point-max) rather than to the
`limit' it should have been scanning to.

The limit to c-font-lock-declarators is now `(min limit (point-max))'.
That way, when the buffer is narrowed to less than `limit', there won't
be an out of bounds error, and when there are unterminated raw strings,
there won't be useless scanning past `limit' either.

I'm not sure if the above will help much, but I hope it does.

-- 
Alan Mackenzie (Nuremberg, Germany).