From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alan Mackenzie Newsgroups: gmane.emacs.devel Subject: Re: Somwhat subtle issues with raw string literals in C++ Date: Wed, 29 Jun 2016 13:57:25 +0000 Message-ID: <20160629135725.GA5327@acm.fritz.box> References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1467208673 16345 80.91.229.3 (29 Jun 2016 13:57:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 29 Jun 2016 13:57:53 +0000 (UTC) Cc: Emacs developers To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jun 29 15:57:42 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bIFzg-0006d1-3e for ged-emacs-devel@m.gmane.org; Wed, 29 Jun 2016 15:57:32 +0200 Original-Received: from localhost ([::1]:43982 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bIFzf-0006RQ-BC for ged-emacs-devel@m.gmane.org; Wed, 29 Jun 2016 09:57:31 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53716) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bIFzY-0006Q7-5L for emacs-devel@gnu.org; Wed, 29 Jun 2016 09:57:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bIFzT-0005A5-K4 for emacs-devel@gnu.org; Wed, 29 Jun 2016 09:57:24 -0400 Original-Received: from mail.muc.de ([193.149.48.3]:28259) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bIFzT-00059E-8r for emacs-devel@gnu.org; Wed, 29 Jun 2016 09:57:19 -0400 Original-Received: (qmail 14426 invoked by uid 3782); 29 Jun 2016 13:57:17 -0000 Original-Received: from acm.muc.de (p4FC4603C.dip0.t-ipconnect.de [79.196.96.60]) by colin.muc.de (tmda-ofmipd) with ESMTP; Wed, 29 Jun 2016 15:57:15 +0200 Original-Received: (qmail 5379 invoked by uid 1000); 29 Jun 2016 13:57:25 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-Received-From: 193.149.48.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:204920 Archived-At: Hello again, Philipp, On Tue, Jun 28, 2016 at 03:33:30PM +0000, Philipp Stephani wrote: > Hi, > there still seem to be some subtle issues with detection of raw string > literals. Unfortunately they are hard to reproduce. One example that fails > (for me) with a recent master build is: > cd /tmp > wget > https://raw.githubusercontent.com/google/protobuf/ef7894e2dc6d287419e42a4fdc52cdfedd386d16/conformance/conformance_test.cc > /path/to/emacs -Q +686 conformance_test.cc > Around that line the fontification of the raw string literals is wrong > (quote characters are treated as string terminators), in other parts of the > file the fontification is correct. This typically happens with files that > contain many large raw string literals that contain quote characters. Again, thanks for the report. The problem was a mishandling of a cache, with the result that the raw string handling code mistakenly believed it was within a string when it wasn't, at a critical point in the file. Would you please try out the following patch and confirm that it fixes the problem, or let me know what's still not working properly. Thanks in advance! diff -r 68a956fb5f55 cc-engine.el --- a/cc-engine.el Mon Jun 27 10:18:54 2016 +0000 +++ b/cc-engine.el Wed Jun 29 13:43:27 2016 +0000 @@ -2292,6 +2292,12 @@ ;; is reduced by buffer changes, and increased by invocations of ;; `c-parse-ps-state-below'. +(defsubst c-truncate-semi-nonlit-pos-cache (pos) + ;; Truncate the upper bound of the cache `c-state-semi-nonlit-pos-cache' to + ;; POS, if it is higher than that position. + (setq c-state-semi-nonlit-pos-cache-limit + (min c-state-semi-nonlit-pos-cache-limit pos))) + (defun c-state-semi-pp-to-literal (here &optional not-in-delimiter) ;; Do a parse-partial-sexp from a position in the buffer before HERE which ;; isn't in a literal, and return information about HERE, either: @@ -2495,7 +2501,7 @@ (let ((c c-state-semi-nonlit-pos-cache) elt state pos npos high-elt) ;; Trim the cache to take account of buffer changes. - (while (and c (> (c-ps-state-cache-pos (c-ps-state-cache-pos (car c))) + (while (and c (> (c-ps-state-cache-pos (car c)) c-state-semi-nonlit-pos-cache-limit)) (setq c (cdr c))) (setq c-state-semi-nonlit-pos-cache c) @@ -3488,8 +3494,7 @@ ;; HERE. (if (<= here c-state-nonlit-pos-cache-limit) (setq c-state-nonlit-pos-cache-limit (1- here))) - (if (<= here c-state-semi-nonlit-pos-cache-limit) - (setq c-state-semi-nonlit-pos-cache-limit (1- here))) + (c-truncate-semi-nonlit-pos-cache here) ;; `c-state-cache': ;; Case 1: if `here' is in a literal containing point-min, everything @@ -6082,19 +6085,32 @@ (cond ((null open-paren-prop) ;; A terminated raw string - (if (search-forward (concat ")" id "\"") nil t) - (c-clear-char-property-with-value - (1+ open-paren) (match-beginning 0) 'syntax-table '(1)))) + (when (search-forward (concat ")" id "\"") nil t) + (let* ((closing-paren (match-beginning 0)) + (first-punctuation + (save-match-data + (goto-char (1+ open-paren)) + (and (c-search-forward-char-property 'syntax-table '(1) + closing-paren) + (1- (point))))) + ) + (when first-punctuation + (c-clear-char-property-with-value + first-punctuation (match-beginning 0) 'syntax-table '(1)) + (c-truncate-semi-nonlit-pos-cache first-punctuation) + )))) ((or (and (equal open-paren-prop '(15)) (null bound)) (equal open-paren-prop '(1))) ;; An unterminated raw string either not in a macro, or in a macro with ;; the open parenthesis right up against the end of macro (c-clear-char-property open-quote 'syntax-table) + (c-truncate-semi-nonlit-pos-cache open-quote) (c-clear-char-property open-paren 'syntax-table)) (t ;; An unterminated string in a macro, with at least one char after the ;; open paren (c-clear-char-property open-quote 'syntax-table) + (c-truncate-semi-nonlit-pos-cache open-quote) (c-clear-char-property open-paren 'syntax-table) (let ((after-string-fence-pos (save-excursion @@ -6194,9 +6210,11 @@ (while (progn (skip-syntax-forward "^\"" end-string) (< (point) end-string)) (c-put-char-property (point) 'syntax-table '(1)) ; punctuation + (c-truncate-semi-nonlit-pos-cache (point)) (forward-char)) (goto-char after-quote)) (c-put-char-property open-quote 'syntax-table '(1)) ; punctuation + (c-truncate-semi-nonlit-pos-cache open-quote) (c-put-char-property open-paren 'syntax-table '(15)) ; generic string (when bound ;; In a CPP construct, we try to apply a generic-string `syntax-table' @@ -6223,9 +6241,12 @@ "\\(\\\\\n\\)*\\=")) ; 11 (1+ open-paren) t)) (if (match-beginning 10) - (c-put-char-property (match-beginning 10) 'syntax-table '(15)) + (progn + (c-put-char-property (match-beginning 10) 'syntax-table '(15)) + (c-truncate-semi-nonlit-pos-cache (match-beginning 10))) (c-put-char-property (match-beginning 5) 'syntax-table '(1)) - (c-put-char-property (1+ (match-beginning 5)) 'syntax-table '(15))) + (c-put-char-property (1+ (match-beginning 5)) 'syntax-table '(15)) + (c-truncate-semi-nonlit-pos-cache (1+ (match-beginning 5)))) (c-put-char-property open-paren 'syntax-table '(1))) (goto-char bound)))) -- Alan Mackenzie (Nuremberg, Germany).