unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Alan Mackenzie <acm@muc.de>
To: Ivan Andrus <darthandrus@gmail.com>
Cc: 15212@debbugs.gnu.org
Subject: bug#15212: 24.3.50; c++-mode doesn't support raw string literals
Date: Sun, 29 May 2016 21:36:16 +0000	[thread overview]
Message-ID: <20160529213616.GC3367@acm.fritz.box> (raw)
In-Reply-To: <20160528144045.GB2950@acm.fritz.box>

[-- Attachment #1: Type: text/plain, Size: 12669 bytes --]

Hi, Ivan.

I've now got a patch, which I'd be grateful if you could try out, both
to see if there are any bugs, and also to get your general impression.
I think there are one or two bugs left in the code, and it needs tidying
up quite a lot.  So this won't be the final version.

The patch will work only on the savannah master branch - sorry, but it
depends on the fix to the "infrastructure" bug which I committed to
master only this morning (timezone +0200).

I'm also attaching a small test file which might interest you.

On Sat, May 28, 2016 at 02:40:45PM +0000, Alan Mackenzie wrote:
> Thanks for the suggestion!  I've actually had an almost working solution
> myself for just over a week.  Then I got confused with a bug in some CC
> Mode "infrastructure" code.  Such is life!

> The way I am fontifying these is thus:
> (i) For a correctly terminated raw string, everything between the ( and )
> inclusive gets string face, everything else just the default face:

>             R"foo(bar)foo"
>                  ^^^^^
>           font-lock-string-face.

> (ii) For a construct with a raw string opener, not correctly terminated,
> I am putting warning face on the entire raw string opener, leaving the
> rest of the string with string face, e.g.:

>             R"baz(bar)foo"
>             ^^^^^^
>      font-lock-warning-face
>                   ^^^^^^^^
> 	      font-lock-string-face

> Of course, that is subject to change if it doesn't work very well.

> CC Mode doesn't actually use syntax-ppss and syntax-propertize-function,
> since they don't allow enough control.  In particular, on a buffer
> change, they erase all syntax-table text properties between point and end
> of buffer which is wasteful; it is never necessary to erase these beyond
> the next end of statement, and they are quite expensive to apply.

> Anyhow, we should be able to have this implemented and the bug closed
> pretty soon.

The patch:



diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
index 4d6a120..460c773 100644
--- a/lisp/progmodes/cc-engine.el
+++ b/lisp/progmodes/cc-engine.el
@@ -2293,7 +2293,8 @@ c-state-pp-to-literal
   ;;     (STATE TYPE (BEG . END))     if TO is in a literal; or
   ;;     (STATE)                      otherwise,
   ;; where STATE is the parsing state at TO, TYPE is the type of the literal
-  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal.
+  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal,
+  ;; including the delimiters.
   ;;
   ;; Unless NOT-IN-DELIMITER is non-nil, when TO is inside a two-character
   ;; comment opener, this is recognized as being in a comment literal.
@@ -5777,6 +5778,139 @@ c-restore-<>-properties
 				       'c-decl-arg-start)))))))
       (or (c-forward-<>-arglist nil)
 	  (forward-char)))))
+
+\f
+;; Routines to handle C++ raw strings.
+(defun c-raw-string-pos ()
+  ;; Get POINT's relationship to any containing raw string.
+  ;; If point isn't in a raw string, return nil.
+  ;; Otherwise, return the following list:
+  ;;
+  ;;   (POS B\" B\( E\) E\")
+  ;;
+  ;; , where POS is the symbol `open-delim' if point is in the opening
+  ;; delimiter, the symbol `close-delim' if it's in the closing delimiter, and
+  ;; nil if it's in the string body.  B\", B\(, E\), E\" are the positions of
+  ;; the opening and closing quotes and parentheses of a correctly terminated
+  ;; raw string.  (N.B.: E\) and E\" are NOT on the "outside" of these
+  ;; characters.)  If the raw string is not terminated, E\) and E\" are set to
+  ;; nil.
+  ;;
+  ;; Note: this routine is dependant upon the correct syntax-table text
+  ;; properties being set.
+  (let* ((safe (c-state-semi-safe-place (point)))
+	 (state (c-state-pp-to-literal safe (point)))
+	 open-quote-pos open-paren-pos close-paren-pos close-quote-pos id)
+    (save-excursion
+      (when
+	  (and
+	   (cond
+	    ((null (cadr state))
+	     (or (eq (char-after) ?\")
+		 (search-backward "\"" (max (- (point) 17) (point-min)) t)))
+	    ((and (eq (cadr state) 'string)
+		  (goto-char (car (nth 2 state)))
+		  (or (eq (char-after) ?\")
+		      (search-backward "\"" (max (- (point) 17) (point-min)) t))
+		  (not (bobp)))))
+	   (eq (char-before) ?R)
+	   (looking-at "\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("))
+	(setq open-quote-pos (point)
+	      open-paren-pos (match-end 1)
+	      id (match-string-no-properties 1))
+	(goto-char (1+ open-paren-pos))
+	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
+		   (search-forward-regexp (concat ")" id "\"") nil t))
+	  (setq close-paren-pos (match-beginning 0)
+		close-quote-pos (1- (point))))))
+    (and open-quote-pos
+	 (list
+	  (cond
+	   ((<= (point) open-paren-pos)
+	    'open-delim)
+	   ((and close-paren-pos
+		 (> (point) close-paren-pos))
+	    'close-delim)
+	   (t nil))
+	  open-quote-pos open-paren-pos close-paren-pos close-quote-pos))))
+
+(defun c-clear-raw-string-syntax-table-properties (raw)
+  (if (nth 2 raw)
+      ;; Clear out punctuation syntax-table text props from the string body.
+      (c-clear-char-property-with-value
+       (cadr raw) (nth 2 raw) 'syntax-table '(1))
+    ;; unclosed raw string.
+    (c-clear-char-property (car raw) 'syntax-table)
+    (c-clear-char-property (cadr raw) 'syntax-table))
+  (setq c-new-BEG (min c-new-BEG (car raw)))
+  (setq c-new-END (max c-new-END (1+ (cadr raw)))))
+
+(defun c-before-change-check-c++-raw-strings (beg end)
+  ;; This functions clears syntax-table text properties from C++ raw strings
+  ;; which are being chnaged, or are associated with a change.
+  (c-save-buffer-state
+      ((beg-rs (save-excursion (goto-char c-new-BEG) (c-raw-string-pos)))
+       (end-rs (save-excursion (goto-char c-new-END) (c-raw-string-pos)))
+       )
+    (cond
+     ;; Neither BEG nor END are in raw strings.
+     ((and (null beg-rs) (null end-rs)))
+     ;; BEG is in the opening delimiter or "body" of an unterminated string.
+     ((and beg-rs (null (nth 3 beg-rs)))
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs)))
+     ;; BEG and END are both in the body of the same raw string.
+     ((and (equal (cdr beg-rs) (cdr end-rs))
+	   (null (car beg-rs)) (null (car end-rs))))
+     ;; BEG and END are in the same raw string, (at least) one of them in a
+     ;; delimiter.
+     ((equal (cdr beg-rs) (cdr end-rs))
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs)))
+     ;; BEG is in some raw string, END isn't in it.
+     (beg-rs
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs))
+      (when end-rs
+	(c-clear-raw-string-syntax-table-properties (cdr end-rs))))
+     ;; BEG isn't in a raw string, END is.
+     (end-rs
+      (c-clear-raw-string-syntax-table-properties (cdr end-rs))))))
+
+(defun c-temp-before-change (beg end)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-before-change-check-c++-raw-strings beg end))
+
+(defun c-after-change-mark-raw-strings (beg end old-len)
+  ;; Put any needed text properties on raw strings.  This function is called
+  ;; as an after-change function.
+  (save-excursion
+    (c-save-buffer-state ()
+      (goto-char c-new-BEG)
+      (while (and (< (point) c-new-END)
+		  (c-syntactic-re-search-forward
+		   "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" c-new-END t))
+	(let ((id (match-string-no-properties 1))
+	      (open-quote (1+ (match-beginning 0)))
+	      (open-paren (match-end 1))
+	      )
+	  (if (search-forward-regexp (concat ")" id "\"") nil t)
+	      (let ((end-string (match-beginning 0))
+		    (after-quote (match-end 0))
+		    )
+		(goto-char open-paren)
+		(while (progn (skip-syntax-forward "^\"" end-string)
+			      (< (point) end-string))
+		  (c-put-char-property (point) 'syntax-table '(1)) ; punctuation
+		  (forward-char))
+		(goto-char after-quote))
+	    (c-put-char-property open-quote 'syntax-table '(1)) ; punctuation
+	    (c-put-char-property open-paren 'syntax-table '(15))))) ; generic string
+
+      )))
+
+(defun c-temp-after-change (beg end old-len)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-after-change-mark-raw-strings beg end old-len))
 \f
 ;; Handling of small scale constructs like types and names.
 
diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el
index 4e83d6d..fd8065a 100644
--- a/lisp/progmodes/cc-fonts.el
+++ b/lisp/progmodes/cc-fonts.el
@@ -723,6 +723,10 @@ c-font-lock-invalid-string
 	(concat ".\\(" c-string-limit-regexp "\\)")
 	'((c-font-lock-invalid-string)))
 
+      ;; Fontify C++ raw strings.
+      ,@(when (c-major-mode-is 'c++-mode)
+	  '(c-font-lock-c++-raw-strings))
+
       ;; Fontify keyword constants.
       ,@(when (c-lang-const c-constant-kwds)
 	  (let ((re (c-make-keywords-re nil (c-lang-const c-constant-kwds))))
@@ -1571,6 +1575,34 @@ c-font-lock-enclosing-decls
 	    (c-forward-syntactic-ws)
 	    (c-font-lock-declarators limit t in-typedef)))))))
 
+(defun c-font-lock-c++-raw-strings (limit)
+  ;; Fontify C++ raw strings.
+  ;;
+  ;; This function will be called from font-lock for a region bounded by POINT
+  ;; and LIMIT, as though it were to identify a keyword for
+  ;; font-lock-keyword-face.  It always returns NIL to inhibit this and
+  ;; prevent a repeat invocation.  See elisp/lispref page "Search-based
+  ;; Fontification".
+  (while (search-forward-regexp
+	  "R\\(\"\\)\\([^ ()\\\n\r\t]\\{,16\\}\\)(" limit t)
+    (when ;; (eq (c-get-char-property (1- (point)) 'face)
+	;;     'font-lock-string-face)
+	(or (and (eobp)
+		 (eq (c-get-char-property (1- (point)) 'face)
+		     'font-lock-warning-face))
+	    (eq (c-get-char-property (point) 'face) 'font-lock-string-face))
+      (if (c-get-char-property (1- (point)) 'syntax-table)
+	  (c-put-font-lock-face (match-beginning 0) (match-end 0)
+				'font-lock-warning-face)
+	(c-put-font-lock-face (match-beginning 1) (match-end 2)
+			      'default)
+	(when (search-forward-regexp
+	       (concat ")\\(" (match-string-no-properties 2) "\\)\"")
+	       limit t)
+	  (c-put-font-lock-face (match-beginning 1) (point)
+				'default)))))
+  nil)
+
 (c-lang-defconst c-simple-decl-matchers
   "Simple font lock matchers for types and declarations.  These are used
 on level 2 only and so aren't combined with `c-complex-decl-matchers'."
diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
index 6f4d1f1..8ba0c5c 100644
--- a/lisp/progmodes/cc-langs.el
+++ b/lisp/progmodes/cc-langs.el
@@ -474,9 +474,12 @@ c-populate-syntax-table
   ;; The value here may be a list of functions or a single function.
   t nil
   c++ '(c-extend-region-for-CPP
+	c-depropertize-region
+	c-before-change-check-c++-raw-strings
 	c-before-change-check-<>-operators
 	c-invalidate-macro-cache)
   (c objc) '(c-extend-region-for-CPP
+	     c-depropertize-region
 	     c-invalidate-macro-cache)
   ;; java 'c-before-change-check-<>-operators
   awk 'c-awk-record-region-clear-NL)
@@ -509,7 +512,8 @@ c-populate-syntax-table
   (c objc) '(c-extend-font-lock-region-for-macros
 	     c-neutralize-syntax-in-and-mark-CPP
 	     c-change-expand-fl-region)
-  c++ '(c-extend-font-lock-region-for-macros
+  c++ '(c-after-change-mark-raw-strings
+	c-extend-font-lock-region-for-macros
 	c-neutralize-syntax-in-and-mark-CPP
 	c-restore-<>-properties
 	c-change-expand-fl-region)
diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el
index 9ab0480..53322cf 100644
--- a/lisp/progmodes/cc-mode.el
+++ b/lisp/progmodes/cc-mode.el
@@ -877,6 +877,16 @@ c-called-from-text-property-change-p
   (memq (cadr (backtrace-frame 3))
 	'(put-text-property remove-list-of-text-properties)))
 
+(defun c-depropertize-region (beg end)
+  ;; Remove the punctuation syntax-table text property from the region
+  ;; (c-new-BEG c-new-END).
+  ;;
+  ;; This function is in the C/C++/ObjC values of
+  ;; `c-get-state-before-change-functions' and is called exclusively as a
+  ;; before change function.
+  (c-clear-char-property-with-value
+   c-new-BEG c-new-END 'syntax-table '(1)))
+
 (defun c-extend-region-for-CPP (beg end)
   ;; Adjust `c-new-BEG', `c-new-END' respectively to the beginning and end of
   ;; any preprocessor construct they may be in. 
@@ -969,7 +979,7 @@ c-neutralize-syntax-in-and-mark-CPP
   ;; This function might make hidden buffer changes.
   (c-save-buffer-state (limits )
     ;; Clear 'syntax-table properties "punctuation":
-    (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
+    ;; (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
 
     ;; CPP "comment" markers:
     (if (eval-when-compile (memq 'category-properties c-emacs-features));Emacs.


-- 
Alan Mackenzie (Nuremberg, Germany).


[-- Attachment #2: raw-string.cc --]
[-- Type: text/x-c, Size: 373 bytes --]

char foo [] = R"(foo)";
char bar [] = R"bar(bar)bar";
char empty [] = R"()";
char quote1 [] = R"34(")34";
char quote2 [] = R"0x22(foo"bar)0x22";
char sixteen [] = R"0123456789abcdef(sixteen)0123456789abcdef";
char seventeen [] = R"0123456789abcdefg(seventeen)0123456789abcdefg";
char multi_line [] = R"(First line.
Second line.
Third line.)";
char baz [] = R"baz(baz)foo";

  reply	other threads:[~2016-05-29 21:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-29 21:00 bug#15212: 24.3.50; c++-mode doesn't support raw string literals Ivan Andrus
2016-03-30  3:14 ` Ivan Andrus
2016-04-03 18:36   ` Alan Mackenzie
2016-05-24 17:12     ` Ivan Andrus
2016-05-28 14:40       ` Alan Mackenzie
2016-05-29 21:36         ` Alan Mackenzie [this message]
2016-05-31 14:22           ` Ivan Andrus
2016-05-31 21:32             ` Alan Mackenzie
2016-05-31 23:52               ` Michael Welsh Duggan
2016-06-02 16:36                 ` Alan Mackenzie
2016-05-31 22:21             ` Alan Mackenzie
2016-06-01  5:21               ` Ivan Andrus
2016-06-02 16:07                 ` Alan Mackenzie
     [not found]                 ` <20160602160741.GC4067@acm.fritz.box>
2016-06-06 16:32                   ` Alan Mackenzie
     [not found]                   ` <20160606163203.GA19322@acm.fritz.box>
2016-06-07 22:06                     ` Michael Welsh Duggan
2016-06-07 22:21                       ` Alan Mackenzie
2016-06-09  1:38                     ` Ivan Andrus
2016-06-09 15:04                       ` Alan Mackenzie
2016-06-09 15:06   ` Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160529213616.GC3367@acm.fritz.box \
    --to=acm@muc.de \
    --cc=15212@debbugs.gnu.org \
    --cc=darthandrus@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).