unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
@ 2013-08-29 21:00 Ivan Andrus
  2016-03-30  3:14 ` Ivan Andrus
  0 siblings, 1 reply; 19+ messages in thread
From: Ivan Andrus @ 2013-08-29 21:00 UTC (permalink / raw)
  To: 15212

C++11 allows fancy new raw string literals [1], but these strings aren't
supported in c++-mode (e.g. fontification and movement by sexp's).

In my experience such raw strings are fairly rare, and they are no doubt
difficult to support.  But I thought I would report this since I
didn't see a bug for it in debbugs.

-Ivan

[1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals

In GNU Emacs 24.3.50.5 (i386-apple-darwin12.4.0, NS apple-appkit-1187.39)
of 2013-08-23 on ivanandres-MacBookPro
Bzr revision: 113987 eggert@cs.ucla.edu-20130824022334-kloqiv3hqimcrnmg
Windowing system distributor `Apple', version 10.3.1187
Configured using:
`configure --with-ns --with-xml2'

Important settings:
  locale-coding-system: nil
  default enable-multibyte-characters: t

Major mode: Org

Minor modes in effect:
  jabber-activity-mode: t
  fold-mode-active: t
  semantic-minor-modes-format: ((:eval (if (or semantic-highlight-edits-mode semantic-show-unmatched-syntax-mode semantic-idle-scheduler-mode)  S)))
  diff-auto-refine-mode: t
  reveal-mode: t
  TeX-PDF-mode: t
  which-function-mode: t
  show-paren-mode: t
  global-semantic-stickyfunc-mode: t
  msb-mode: t
  minibuffer-depth-indicate-mode: t
  global-hl-line-mode: t
  delete-selection-mode: t
  auto-image-file-mode: t
  auto-insert-mode: t
  yas-global-mode: t
  yas-minor-mode: t
  shell-dirtrack-mode: t
  ido-everywhere: t
  global-visible-mark-mode: t
  visible-mark-mode: t
  gvol-mode: t
  recentf-mode: t
  desktop-save-mode: t
  drag-stuff-global-mode: t
  command-frequency-autosave-mode: t
  command-frequency-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  auto-fill-function: org-auto-fill-function
  transient-mark-mode: t

Recent input:
<return> <C-tab> <C-tab> <C-tab> <C-tab> <down> <return>
<help-echo> <down-mouse-2> <mouse-1> <help-echo> <down-mouse-2>
<mouse-1> <help-echo> <down-mouse-1> <mouse-1> q C-s
C-s C-s C-a C-s C-s C-s C-s C-s C-s <return> <return>
<down> <return> q C-s C-s C-s C-s C-a <return> <down>
<return> <down> <return> <down> <return> <up> <return>
<down> <down> <return> <down> <return> M-w C-s C-s
C-s C-x 1 C-s C-s C-s C-s C-s C-s C-a C-s c o n s t
C-s C-s C-s C-s C-s C-s q C-a q w i s e SPC <C-backspace>
g r e a t SPC e d i t i n g SPC e x p e r i e n c e
. <return> TAB C-o C-o C-o C-o C-e <C-backspace> <C-backspace>
<C-backspace> o v e r a l l SPC e x e <backspace> c
e l l e n t SPC c c - m o d e , , s C-j C-SPC C-l C-d
C-j ' C-SPC , , C-l C-t ! C-x C-s TAB TAB TAB TAB TAB
C-a A l t h o u g h SPC t h i s SPC i s SPC f a i r
l y SPC u n o <backspace> c o m m o n , SPC M-l M-q
C-k C-d C-\ C-k C-e C-u C-u C-u C-u C-u <C-backspace>
C-o . C-M-k SPC M-c M-q TAB TAB C-d C-\ C-k C-k C-o
SPC s i n <C-backspace> C-SPC SPC s i n c e SPC o t
h e r w i s e <C-backspace> C-SPC <C-backspace> C-SPC
C-x C-s C-c C-c y e s <return> <C-tab> <C-tab> M-w
y C-x C-s C-x r j T C-x r j t C-s c + + - C-s C-u C-k
<tab> <tab> <C-S-backspace> <C-S-backspace> C-x C-s
TAB M-x <right> <right> <right> <return>

Recent messages:
Error during redisplay: (eval (let ((glob-str (mapconcat (function eval) global-mode-string "")) (global-string (mapconcat (function eval) global-mode-string " "))) (when (> (length glob-str) 0) (concat (propertize " " (quote display) (\` ((space :align-to (- right-fringe (\, (+ (length global-string) 5)))))) (quote help-echo) "mouse-1: Select (drag to resize)
mouse-2: Make current window occupy the whole frame
mouse-3: Remove current window from display") (propertize " " (quote display) arrow-right-3) (propertize global-string (quote face) (quote mode-line-buffer-id)) #(" " 0 1 (help-echo "mouse-1: Select (drag to resize)
mouse-2: Make current window occupy the whole frame
mouse-3: Remove current window from display" face mode-line-buffer-id)) (propertize " " (quote display) arrow-right-4))))) signaled (void-function t)
Error during redisplay: (eval (let ((glob-str (mapconcat (function eval) global-mode-string "")) (global-string (mapconcat (function eval) global-mode-string " "))) (when (> (length glob-str) 0) (concat (propertize " " (quote display) (\` ((space :align-to (- right-fringe (\, (+ (length global-string) 5)))))) (quote help-echo) "mouse-1: Select (drag to resize)
mouse-2: Make current window occupy the whole frame
mouse-3: Remove current window from display") (propertize " " (quote display) arrow-right-3) (propertize global-string (quote face) (quote mode-line-buffer-id)) #(" " 0 1 (help-echo "mouse-1: Select (drag to resize)
mouse-2: Make current window occupy the whole frame
mouse-3: Remove current window from display" face mode-line-buffer-id)) (propertize " " (quote display) arrow-right-4))))) signaled (void-function t)

Load-path shadows:
/Users/ivanandres/.emacs.d/elpa/magit-20130828.1540/.dir-locals hides /Users/ivanandres/vcs/sage-mode/emacs/.dir-locals
~/vcs/emacs-clang-complete-async/auto-complete-clang-async hides /Users/ivanandres/.emacs.d/elpa/auto-complete-clang-async-20130526.2314/auto-complete-clang-async
/Users/ivanandres/.emacs.d/elpa/confluence-20130814.735/confluence-edit hides /Users/ivanandres/.emacs.d/elpa/confluence-edit-20130804.2241/confluence-edit
/Users/ivanandres/.emacs.d/elpa/magit-20130828.1540/.dir-locals hides /Users/ivanandres/.emacs.d/elpa/highlight-parentheses-20130523.1752/.dir-locals
/Users/ivanandres/.emacs.d/elpa/magit-20130828.1540/.dir-locals hides /Users/ivanandres/.emacs.d/elpa/highlight-symbol-20130628.1552/.dir-locals
/Users/ivanandres/.emacs.d/elpa/php+-mode-20121129.1452/string-utils hides /Users/ivanandres/.emacs.d/elpa/string-utils-20121108.1917/string-utils
/Users/ivanandres/.emacs.d/elpa/jira-20091012.2123/jira hides ~/.emacs.d/local/jira
/Users/ivanandres/.emacs.d/elpa/magit-20130828.1540/.dir-locals hides /Users/ivanandres/vcs/emacs/local/nextstep/Emacs.app/Contents/Resources/lisp/gnus/.dir-locals

Features:
(mailalias mailclient flow-fill smiley gnus-cite qp gnus-async
gnus-bcklg gnus-agent gnus-srvr gnus-score score-mode nnvirtual nntp
gnus-ml gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig
nndoc gnus-cache gnus-sum nnoo gnus-group gnus-undo nnmail mail-source
gnus-start gnus-spec gnus-int gnus-range gnus-win mm-archive debbugs-gnu
debbugs soap-client url-queue shr-color color eww mm-url gnus gnus-ems
nnheader shr hi-lock shadow emacsbug message rfc822 mml mml-sec
mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
compare-w pcase jabber-libnotify jabber-awesome jabber-osd jabber-wmii
jabber-xmessage jabber-festival jabber-sawfish jabber-ratpoison
jabber-tmux jabber-screen jabber-socks5 jabber-ft-server
jabber-ft-client jabber-truncate jabber-time jabber-vcard-avatars
jabber-chatstates jabber-events jabber-vcard jabber-activity
jabber-watch jabber-modeline jabber-ahc-presence jabber-version
jabber-browse jabber-search jabber-roster jabber-ourversion
jabber-avatar jabber-autoaway jabber-register jabber-presence dbus
jabber-ping jabber-si-server jabber-ft-common jabber-si-client
jabber-si-common jabber-feature-neg jabber-private jabber-ahc
jabber-muc-nick-completion hippie-exp jabber-muc
jabber-muc-nick-coloring hexrgb jabber-newdisco jabber-widget
jabber-disco jabber-iq jabber-chat jabber-menu jabber-history
jabber-chatbuffer jabber-alert jabber-core jabber-console sgml-mode
jabber-keymap jabber-sasl sasl sasl-anonymous sasl-login sasl-plain fsm
jabber-conn srv dns gnutls jabber-logon jabber-xml jabber-util
ecb-symboldef ecb-analyse ecb-compatibility ecb-winman-support
ecb-autogen ecb-tod ecb-cycle ecb-eshell ecb-help ecb-jde ecb-upgrade
ecb-file-browser ecb-method-browser ecb-semantic-wrapper ecb-semantic
ecb-speedbar ecb-layout ecb-create-layout ecb-compilation
ecb-common-browser ecb-navigate ecb-cedet-wrapper semantic/analyze
semantic/scope semantic/analyze/fcn ecb-mode-line ecb-face tree-buffer
ecb-util silentcomp find-lisp magit-cherry magit-bisect magit-key-mode
magit iswitchb esh-var esh-io esh-cmd esh-opt esh-ext esh-proc esh-arg
esh-groups eshell esh-module esh-mode esh-util ediff-merg ediff-wind
ediff-diff ediff-mult ediff-help ediff-init ediff-util ediff
magit-compat mc-separate-operations rectangular-region-mode mc-mark-pop
mc-mark-more mc-cycle-cursors css-eldoc-hash-table autoload lisp-mnt
tar-mode finder-inf mark-more-like-this mark-multiple mc-edit-lines
multiple-cursors-core warnings shell-toggle sql view rot13 disp-table
mail-extr org-colview yaml-mode json-mode sort mail-utils network-stream
starttls url-cache find-file phi-search vcursor repeat js json
goto-last-change debug browse-url url-handlers url-http tls url-auth
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums url-gw calc-aent calc-yank
calc-misc calc-alg calc-menu calc-ext calc calc-loaddefs calc-macs
org-table epa-file epa derived conf-mode vc-annotate log-view wgrep-ack
wgrep ack-and-a-half grep ibuf-ext ibuffer git-rebase-mode ruler-mode
hexl dired+ dired-x dired-aux dired ruby-mode git-commit-mode skeleton
etags-select etags ffap tramp-cache tramp-sh dabbrev vc-bzr vc-svn
vc-cvs vc-dir ewoc smerge-mode epg epg-config mule-util cal-move
parse-time superword subword artist picture reporter rect org-element
diff-mode diff nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml
rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util
rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph
nxml-enc xmltok misearch multi-isearch semantic/imenu semantic/sb
semantic/sort semantic/db-file data-debug cedet-files
semantic/wisent/python semantic/db-mode semantic/decorate/include
semantic/db-find semantic/db-ref semantic/decorate/mode
semantic/decorate pulse sage-view semantic/dep semantic/wisent/python-wy
semantic/wisent semantic/wisent/wisent sage-mode apropos sage-compat
hideshow python sh-script smie executable elide-head ede/cpp-root
ede/generic ede/shell eieio-opt ede/speedbar ede/files ede ede/base
ede/auto ede/source eieio-speedbar speedbar sb-image dframe eieio-custom
semantic/db eieio-base vc-git c-eldoc tempo url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse url-vars mailcap xml-parse doxymacs cc-langs info-look cc-mode
cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
cc-defs eldoc highlight-parentheses greedy-delete hl-sexp
highlight-symbol thingatpt gvol-light-theme tabify cal-iso org-mobile
reveal org-mouse org-irc org-habit org-jsinfo org-infojs org-html
org-info org-gnus org-docview org-ctags org-bibtex bibtex org-bbdb
org-archive org-id vc-hg tex-fold reftex-dcr reftex-auc reftex
reftex-vars tex-bar toolbar-x font-latex latex edmacro kmacro tex-style
sage-latex org-latex org-export-latex org-beamer footnote org-crypt
ob-python org-clock org-exp ob-exp org-exp-blocks org-agenda org
ob-tangle ob-ref ob-lob ob-table org-footnote org-src ob-comint ob-keys
org-pcomplete org-list org-faces org-entities noutline outline
org-version ob-emacs-lisp ob org-compat org-macs ob-eval org-loaddefs
find-func tex-buf tex crm time uniquify saveplace semantic/idle
semantic/format ezimage semantic/tag-ls semantic/find semantic/ctxt
which-func imenu paren semantic/util-modes semantic/util semantic
semantic/tag semantic/lex semantic/fw mode-local cedet msb mb-depth
icomplete hl-line delsel image-file cus-start cus-load diary-lib
diary-loaddefs cal-menu calendar cal-loaddefs autoinsert yasnippet
help-mode tramp tramp-compat auth-source eieio byte-opt bytecomp
byte-compile cconv eieio-core gnus-util mm-util mail-prsvr
password-cache tramp-loaddefs trampver shell pcomplete format-spec smex
ido visible-mark parenface fold commit-patch-buffer log-edit pcvs-util
add-log vc vc-dispatcher sage sage-load jka-compr recentf tree-widget
wid-edit rx xml flymake compile comint ansi-color ring tex-site desktop
frameset drag-stuff browse-kill-ring backtr command-frequency uptimes pp
server easy-mmode assoc advice windmove ac-math-autoloads
auto-complete-clang-autoloads c-eldoc-autoloads
command-frequency-autoloads etags-select-autoloads gap-mode-autoloads
goto-last-change-autoloads hl-sexp-autoloads jabber-autoloads
json-mode-autoloads info easymenu mainline-autoloads
mark-multiple-autoloads php+-mode-autoloads php-eldoc-autoloads
popup-autoloads help-fns cl-macs gv cl cl-loaddefs cl-lib
visible-mark-autoloads yaml-mode-autoloads package time-date tooltip
ediff-hook vc-hooks lisp-float-type mwheel ns-win tool-bar dnd fontset
image regexp-opt fringe tabulated-list newcomment lisp-mode prog-mode
register page menu-bar rfn-eshadow timer select scroll-bar mouse
jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process ns
multi-tty emacs)





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2013-08-29 21:00 bug#15212: 24.3.50; c++-mode doesn't support raw string literals Ivan Andrus
@ 2016-03-30  3:14 ` Ivan Andrus
  2016-04-03 18:36   ` Alan Mackenzie
  2016-06-09 15:06   ` Alan Mackenzie
  0 siblings, 2 replies; 19+ messages in thread
From: Ivan Andrus @ 2016-03-30  3:14 UTC (permalink / raw)
  To: 15212; +Cc: Alan Mackenzie

Ivan Andrus <darthandrus@gmail.com> writes:

> C++11 allows fancy new raw string literals [1], but these strings aren't
> supported in c++-mode (e.g. fontification and movement by sexp's).
>
> In my experience such raw strings are fairly rare, and they are no doubt
> difficult to support.  But I thought I would report this since I
> didn't see a bug for it in debbugs.
>
> -Ivan
>
> [1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals

Any thoughts on this?  These are becoming more common in the code I work
on and some colleagues and I would like support, since they can destroy 
fontification of the rest of the buffer.  I'm hesitant to try and do it
myself because of the famed difficulty of cc-mode.  :-(  But I'm willing
to try if someone has ideas.

-Ivan







^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-03-30  3:14 ` Ivan Andrus
@ 2016-04-03 18:36   ` Alan Mackenzie
  2016-05-24 17:12     ` Ivan Andrus
  2016-06-09 15:06   ` Alan Mackenzie
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Mackenzie @ 2016-04-03 18:36 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hello, Ivan.

On Tue, Mar 29, 2016 at 09:14:44PM -0600, Ivan Andrus wrote:
> Ivan Andrus <darthandrus@gmail.com> writes:

> > C++11 allows fancy new raw string literals [1], but these strings aren't
> > supported in c++-mode (e.g. fontification and movement by sexp's).

> > In my experience such raw strings are fairly rare, and they are no doubt
> > difficult to support.  But I thought I would report this since I
> > didn't see a bug for it in debbugs.

> > -Ivan

> > [1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals

> Any thoughts on this?  These are becoming more common in the code I work
> on and some colleagues and I would like support, since they can destroy 
> fontification of the rest of the buffer.  I'm hesitant to try and do it
> myself because of the famed difficulty of cc-mode.  :-(  But I'm willing
> to try if someone has ideas.

OK, I'll have a go at adding these ASAP.  I've got a few ideas as how
best to do this.

> -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-04-03 18:36   ` Alan Mackenzie
@ 2016-05-24 17:12     ` Ivan Andrus
  2016-05-28 14:40       ` Alan Mackenzie
  0 siblings, 1 reply; 19+ messages in thread
From: Ivan Andrus @ 2016-05-24 17:12 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 15212

I've tried the following function which seems to work reasonably well
in my limited testing.  I'm not suggesting this as the final solution,
but I would be happy to do some revisions based on your feedback, or
abandon it altogether if you have a better method.  Alternately, you
can use any pieces that you find useful.

(defun c++11-syntax-propertize-function (beg end)
  ;; interactive for easy testing
  (interactive (list (point-min) (point-max)))
  (save-excursion
    (if (not (nth 3 (syntax-ppss beg)))
        ;; Not in a string, so in particular, not in a raw string
        (goto-char beg)
      ;; We have to re-propertize a raw-string, so move back to the
beginning of it.
      (goto-char (nth 8 (syntax-ppss beg)))
      (skip-syntax-backward "'"))
    ;; Look for raw strings in the area of interest
    (while (search-forward-regexp
"\\(\\(?:L\\|u8\\|u\\|U\\)?R\\)\"\\([^(]*\\)(" end t)
      (let* ((full (match-string-no-properties 0))
             (qualifier (match-string-no-properties 1))
             (delimiter (match-string-no-properties 2))
             (beg-beg (match-beginning 0))
             (beg-quote (+ beg-beg (length qualifier)))
             (beg-quote-after (1+ beg-quote)))
        (let* ((ppss (syntax-ppss beg-beg))
               (in-string-or-comment (or (nth 3 ppss) (nth 4 ppss))))
          (if in-string-or-comment
              ;; Move past the match to avoid an infinite loop
              (goto-char (match-end 0))
            ;; Search for the end of the string
            (when (search-forward-regexp
                   (concat ")" delimiter "\"")
                   ;; I don't limit it to end because I'm afraid it
might not be far enough.
                   nil t)
              (let ((end-end (match-end 0)))
                (remove-text-properties beg-beg end-end '(syntax-table . nil))
                ;; Mark the qualifier as attaching to the next sexp
                (put-text-property beg-beg beg-quote
                                   'syntax-table
                                   (string-to-syntax "'"))
                ;; Mark the quotes appropriately
                (put-text-property beg-quote beg-quote-after
                                   'syntax-table
                                   ;; (string-to-syntax "\"")
                                   (string-to-syntax "|"))
                (put-text-property (1- end-end)
                                   end-end
                                   'syntax-table
                                   (string-to-syntax "|"))))))))))

;; Then in a c++ buffer...
(setq-local syntax-propertize-function #'c++11-syntax-propertize-function)

-Ivan


On Sun, Apr 3, 2016 at 12:36 PM, Alan Mackenzie <acm@muc.de> wrote:
> Hello, Ivan.
>
> On Tue, Mar 29, 2016 at 09:14:44PM -0600, Ivan Andrus wrote:
>> Ivan Andrus <darthandrus@gmail.com> writes:
>
>> > C++11 allows fancy new raw string literals [1], but these strings aren't
>> > supported in c++-mode (e.g. fontification and movement by sexp's).
>
>> > In my experience such raw strings are fairly rare, and they are no doubt
>> > difficult to support.  But I thought I would report this since I
>> > didn't see a bug for it in debbugs.
>
>> > -Ivan
>
>> > [1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals
>
>> Any thoughts on this?  These are becoming more common in the code I work
>> on and some colleagues and I would like support, since they can destroy
>> fontification of the rest of the buffer.  I'm hesitant to try and do it
>> myself because of the famed difficulty of cc-mode.  :-(  But I'm willing
>> to try if someone has ideas.
>
> OK, I'll have a go at adding these ASAP.  I've got a few ideas as how
> best to do this.
>
>> -Ivan
>
> --
> Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-24 17:12     ` Ivan Andrus
@ 2016-05-28 14:40       ` Alan Mackenzie
  2016-05-29 21:36         ` Alan Mackenzie
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Mackenzie @ 2016-05-28 14:40 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hi, Ivan.

Thanks for the suggestion!  I've actually had an almost working solution
myself for just over a week.  Then I got confused with a bug in some CC
Mode "infrastructure" code.  Such is life!

The way I am fontifying these is thus:
(i) For a correctly terminated raw string, everything between the ( and )
inclusive gets string face, everything else just the default face:

            R"foo(bar)foo"
                 ^^^^^
          font-lock-string-face.

(ii) For a construct with a raw string opener, not correctly terminated,
I am putting warning face on the entire raw string opener, leaving the
rest of the string with string face, e.g.:

            R"baz(bar)foo"
            ^^^^^^
     font-lock-warning-face
                  ^^^^^^^^
	      font-lock-string-face

Of course, that is subject to change if it doesn't work very well.

CC Mode doesn't actually use syntax-ppss and syntax-propertize-function,
since they don't allow enough control.  In particular, on a buffer
change, they erase all syntax-table text properties between point and end
of buffer which is wasteful; it is never necessary to erase these beyond
the next end of statement, and they are quite expensive to apply.

Anyhow, we should be able to have this implemented and the bug closed
pretty soon.

-- 
Alan Mackenzie (Nuremberg, Germany).



On Tue, May 24, 2016 at 11:12:31AM -0600, Ivan Andrus wrote:
> I've tried the following function which seems to work reasonably well
> in my limited testing.  I'm not suggesting this as the final solution,
> but I would be happy to do some revisions based on your feedback, or
> abandon it altogether if you have a better method.  Alternately, you
> can use any pieces that you find useful.

> (defun c++11-syntax-propertize-function (beg end)
>   ;; interactive for easy testing
>   (interactive (list (point-min) (point-max)))
>   (save-excursion
>     (if (not (nth 3 (syntax-ppss beg)))
>         ;; Not in a string, so in particular, not in a raw string
>         (goto-char beg)
>       ;; We have to re-propertize a raw-string, so move back to the
> beginning of it.
>       (goto-char (nth 8 (syntax-ppss beg)))
>       (skip-syntax-backward "'"))
>     ;; Look for raw strings in the area of interest
>     (while (search-forward-regexp
> "\\(\\(?:L\\|u8\\|u\\|U\\)?R\\)\"\\([^(]*\\)(" end t)
>       (let* ((full (match-string-no-properties 0))
>              (qualifier (match-string-no-properties 1))
>              (delimiter (match-string-no-properties 2))
>              (beg-beg (match-beginning 0))
>              (beg-quote (+ beg-beg (length qualifier)))
>              (beg-quote-after (1+ beg-quote)))
>         (let* ((ppss (syntax-ppss beg-beg))
>                (in-string-or-comment (or (nth 3 ppss) (nth 4 ppss))))
>           (if in-string-or-comment
>               ;; Move past the match to avoid an infinite loop
>               (goto-char (match-end 0))
>             ;; Search for the end of the string
>             (when (search-forward-regexp
>                    (concat ")" delimiter "\"")
>                    ;; I don't limit it to end because I'm afraid it
> might not be far enough.
>                    nil t)
>               (let ((end-end (match-end 0)))
>                 (remove-text-properties beg-beg end-end '(syntax-table . nil))
>                 ;; Mark the qualifier as attaching to the next sexp
>                 (put-text-property beg-beg beg-quote
>                                    'syntax-table
>                                    (string-to-syntax "'"))
>                 ;; Mark the quotes appropriately
>                 (put-text-property beg-quote beg-quote-after
>                                    'syntax-table
>                                    ;; (string-to-syntax "\"")
>                                    (string-to-syntax "|"))
>                 (put-text-property (1- end-end)
>                                    end-end
>                                    'syntax-table
>                                    (string-to-syntax "|"))))))))))

> ;; Then in a c++ buffer...
> (setq-local syntax-propertize-function #'c++11-syntax-propertize-function)

> -Ivan


> On Sun, Apr 3, 2016 at 12:36 PM, Alan Mackenzie <acm@muc.de> wrote:
> > Hello, Ivan.

> > On Tue, Mar 29, 2016 at 09:14:44PM -0600, Ivan Andrus wrote:
> >> Ivan Andrus <darthandrus@gmail.com> writes:

> >> > C++11 allows fancy new raw string literals [1], but these strings aren't
> >> > supported in c++-mode (e.g. fontification and movement by sexp's).

> >> > In my experience such raw strings are fairly rare, and they are no doubt
> >> > difficult to support.  But I thought I would report this since I
> >> > didn't see a bug for it in debbugs.

> >> > -Ivan

> >> > [1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals

> >> Any thoughts on this?  These are becoming more common in the code I work
> >> on and some colleagues and I would like support, since they can destroy
> >> fontification of the rest of the buffer.  I'm hesitant to try and do it
> >> myself because of the famed difficulty of cc-mode.  :-(  But I'm willing
> >> to try if someone has ideas.

> > OK, I'll have a go at adding these ASAP.  I've got a few ideas as how
> > best to do this.

> >> -Ivan

> > --
> > Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-28 14:40       ` Alan Mackenzie
@ 2016-05-29 21:36         ` Alan Mackenzie
  2016-05-31 14:22           ` Ivan Andrus
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Mackenzie @ 2016-05-29 21:36 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

[-- Attachment #1: Type: text/plain, Size: 12669 bytes --]

Hi, Ivan.

I've now got a patch, which I'd be grateful if you could try out, both
to see if there are any bugs, and also to get your general impression.
I think there are one or two bugs left in the code, and it needs tidying
up quite a lot.  So this won't be the final version.

The patch will work only on the savannah master branch - sorry, but it
depends on the fix to the "infrastructure" bug which I committed to
master only this morning (timezone +0200).

I'm also attaching a small test file which might interest you.

On Sat, May 28, 2016 at 02:40:45PM +0000, Alan Mackenzie wrote:
> Thanks for the suggestion!  I've actually had an almost working solution
> myself for just over a week.  Then I got confused with a bug in some CC
> Mode "infrastructure" code.  Such is life!

> The way I am fontifying these is thus:
> (i) For a correctly terminated raw string, everything between the ( and )
> inclusive gets string face, everything else just the default face:

>             R"foo(bar)foo"
>                  ^^^^^
>           font-lock-string-face.

> (ii) For a construct with a raw string opener, not correctly terminated,
> I am putting warning face on the entire raw string opener, leaving the
> rest of the string with string face, e.g.:

>             R"baz(bar)foo"
>             ^^^^^^
>      font-lock-warning-face
>                   ^^^^^^^^
> 	      font-lock-string-face

> Of course, that is subject to change if it doesn't work very well.

> CC Mode doesn't actually use syntax-ppss and syntax-propertize-function,
> since they don't allow enough control.  In particular, on a buffer
> change, they erase all syntax-table text properties between point and end
> of buffer which is wasteful; it is never necessary to erase these beyond
> the next end of statement, and they are quite expensive to apply.

> Anyhow, we should be able to have this implemented and the bug closed
> pretty soon.

The patch:



diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
index 4d6a120..460c773 100644
--- a/lisp/progmodes/cc-engine.el
+++ b/lisp/progmodes/cc-engine.el
@@ -2293,7 +2293,8 @@ c-state-pp-to-literal
   ;;     (STATE TYPE (BEG . END))     if TO is in a literal; or
   ;;     (STATE)                      otherwise,
   ;; where STATE is the parsing state at TO, TYPE is the type of the literal
-  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal.
+  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal,
+  ;; including the delimiters.
   ;;
   ;; Unless NOT-IN-DELIMITER is non-nil, when TO is inside a two-character
   ;; comment opener, this is recognized as being in a comment literal.
@@ -5777,6 +5778,139 @@ c-restore-<>-properties
 				       'c-decl-arg-start)))))))
       (or (c-forward-<>-arglist nil)
 	  (forward-char)))))
+
+\f
+;; Routines to handle C++ raw strings.
+(defun c-raw-string-pos ()
+  ;; Get POINT's relationship to any containing raw string.
+  ;; If point isn't in a raw string, return nil.
+  ;; Otherwise, return the following list:
+  ;;
+  ;;   (POS B\" B\( E\) E\")
+  ;;
+  ;; , where POS is the symbol `open-delim' if point is in the opening
+  ;; delimiter, the symbol `close-delim' if it's in the closing delimiter, and
+  ;; nil if it's in the string body.  B\", B\(, E\), E\" are the positions of
+  ;; the opening and closing quotes and parentheses of a correctly terminated
+  ;; raw string.  (N.B.: E\) and E\" are NOT on the "outside" of these
+  ;; characters.)  If the raw string is not terminated, E\) and E\" are set to
+  ;; nil.
+  ;;
+  ;; Note: this routine is dependant upon the correct syntax-table text
+  ;; properties being set.
+  (let* ((safe (c-state-semi-safe-place (point)))
+	 (state (c-state-pp-to-literal safe (point)))
+	 open-quote-pos open-paren-pos close-paren-pos close-quote-pos id)
+    (save-excursion
+      (when
+	  (and
+	   (cond
+	    ((null (cadr state))
+	     (or (eq (char-after) ?\")
+		 (search-backward "\"" (max (- (point) 17) (point-min)) t)))
+	    ((and (eq (cadr state) 'string)
+		  (goto-char (car (nth 2 state)))
+		  (or (eq (char-after) ?\")
+		      (search-backward "\"" (max (- (point) 17) (point-min)) t))
+		  (not (bobp)))))
+	   (eq (char-before) ?R)
+	   (looking-at "\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("))
+	(setq open-quote-pos (point)
+	      open-paren-pos (match-end 1)
+	      id (match-string-no-properties 1))
+	(goto-char (1+ open-paren-pos))
+	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
+		   (search-forward-regexp (concat ")" id "\"") nil t))
+	  (setq close-paren-pos (match-beginning 0)
+		close-quote-pos (1- (point))))))
+    (and open-quote-pos
+	 (list
+	  (cond
+	   ((<= (point) open-paren-pos)
+	    'open-delim)
+	   ((and close-paren-pos
+		 (> (point) close-paren-pos))
+	    'close-delim)
+	   (t nil))
+	  open-quote-pos open-paren-pos close-paren-pos close-quote-pos))))
+
+(defun c-clear-raw-string-syntax-table-properties (raw)
+  (if (nth 2 raw)
+      ;; Clear out punctuation syntax-table text props from the string body.
+      (c-clear-char-property-with-value
+       (cadr raw) (nth 2 raw) 'syntax-table '(1))
+    ;; unclosed raw string.
+    (c-clear-char-property (car raw) 'syntax-table)
+    (c-clear-char-property (cadr raw) 'syntax-table))
+  (setq c-new-BEG (min c-new-BEG (car raw)))
+  (setq c-new-END (max c-new-END (1+ (cadr raw)))))
+
+(defun c-before-change-check-c++-raw-strings (beg end)
+  ;; This functions clears syntax-table text properties from C++ raw strings
+  ;; which are being chnaged, or are associated with a change.
+  (c-save-buffer-state
+      ((beg-rs (save-excursion (goto-char c-new-BEG) (c-raw-string-pos)))
+       (end-rs (save-excursion (goto-char c-new-END) (c-raw-string-pos)))
+       )
+    (cond
+     ;; Neither BEG nor END are in raw strings.
+     ((and (null beg-rs) (null end-rs)))
+     ;; BEG is in the opening delimiter or "body" of an unterminated string.
+     ((and beg-rs (null (nth 3 beg-rs)))
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs)))
+     ;; BEG and END are both in the body of the same raw string.
+     ((and (equal (cdr beg-rs) (cdr end-rs))
+	   (null (car beg-rs)) (null (car end-rs))))
+     ;; BEG and END are in the same raw string, (at least) one of them in a
+     ;; delimiter.
+     ((equal (cdr beg-rs) (cdr end-rs))
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs)))
+     ;; BEG is in some raw string, END isn't in it.
+     (beg-rs
+      (c-clear-raw-string-syntax-table-properties (cdr beg-rs))
+      (when end-rs
+	(c-clear-raw-string-syntax-table-properties (cdr end-rs))))
+     ;; BEG isn't in a raw string, END is.
+     (end-rs
+      (c-clear-raw-string-syntax-table-properties (cdr end-rs))))))
+
+(defun c-temp-before-change (beg end)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-before-change-check-c++-raw-strings beg end))
+
+(defun c-after-change-mark-raw-strings (beg end old-len)
+  ;; Put any needed text properties on raw strings.  This function is called
+  ;; as an after-change function.
+  (save-excursion
+    (c-save-buffer-state ()
+      (goto-char c-new-BEG)
+      (while (and (< (point) c-new-END)
+		  (c-syntactic-re-search-forward
+		   "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" c-new-END t))
+	(let ((id (match-string-no-properties 1))
+	      (open-quote (1+ (match-beginning 0)))
+	      (open-paren (match-end 1))
+	      )
+	  (if (search-forward-regexp (concat ")" id "\"") nil t)
+	      (let ((end-string (match-beginning 0))
+		    (after-quote (match-end 0))
+		    )
+		(goto-char open-paren)
+		(while (progn (skip-syntax-forward "^\"" end-string)
+			      (< (point) end-string))
+		  (c-put-char-property (point) 'syntax-table '(1)) ; punctuation
+		  (forward-char))
+		(goto-char after-quote))
+	    (c-put-char-property open-quote 'syntax-table '(1)) ; punctuation
+	    (c-put-char-property open-paren 'syntax-table '(15))))) ; generic string
+
+      )))
+
+(defun c-temp-after-change (beg end old-len)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-after-change-mark-raw-strings beg end old-len))
 \f
 ;; Handling of small scale constructs like types and names.
 
diff --git a/lisp/progmodes/cc-fonts.el b/lisp/progmodes/cc-fonts.el
index 4e83d6d..fd8065a 100644
--- a/lisp/progmodes/cc-fonts.el
+++ b/lisp/progmodes/cc-fonts.el
@@ -723,6 +723,10 @@ c-font-lock-invalid-string
 	(concat ".\\(" c-string-limit-regexp "\\)")
 	'((c-font-lock-invalid-string)))
 
+      ;; Fontify C++ raw strings.
+      ,@(when (c-major-mode-is 'c++-mode)
+	  '(c-font-lock-c++-raw-strings))
+
       ;; Fontify keyword constants.
       ,@(when (c-lang-const c-constant-kwds)
 	  (let ((re (c-make-keywords-re nil (c-lang-const c-constant-kwds))))
@@ -1571,6 +1575,34 @@ c-font-lock-enclosing-decls
 	    (c-forward-syntactic-ws)
 	    (c-font-lock-declarators limit t in-typedef)))))))
 
+(defun c-font-lock-c++-raw-strings (limit)
+  ;; Fontify C++ raw strings.
+  ;;
+  ;; This function will be called from font-lock for a region bounded by POINT
+  ;; and LIMIT, as though it were to identify a keyword for
+  ;; font-lock-keyword-face.  It always returns NIL to inhibit this and
+  ;; prevent a repeat invocation.  See elisp/lispref page "Search-based
+  ;; Fontification".
+  (while (search-forward-regexp
+	  "R\\(\"\\)\\([^ ()\\\n\r\t]\\{,16\\}\\)(" limit t)
+    (when ;; (eq (c-get-char-property (1- (point)) 'face)
+	;;     'font-lock-string-face)
+	(or (and (eobp)
+		 (eq (c-get-char-property (1- (point)) 'face)
+		     'font-lock-warning-face))
+	    (eq (c-get-char-property (point) 'face) 'font-lock-string-face))
+      (if (c-get-char-property (1- (point)) 'syntax-table)
+	  (c-put-font-lock-face (match-beginning 0) (match-end 0)
+				'font-lock-warning-face)
+	(c-put-font-lock-face (match-beginning 1) (match-end 2)
+			      'default)
+	(when (search-forward-regexp
+	       (concat ")\\(" (match-string-no-properties 2) "\\)\"")
+	       limit t)
+	  (c-put-font-lock-face (match-beginning 1) (point)
+				'default)))))
+  nil)
+
 (c-lang-defconst c-simple-decl-matchers
   "Simple font lock matchers for types and declarations.  These are used
 on level 2 only and so aren't combined with `c-complex-decl-matchers'."
diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
index 6f4d1f1..8ba0c5c 100644
--- a/lisp/progmodes/cc-langs.el
+++ b/lisp/progmodes/cc-langs.el
@@ -474,9 +474,12 @@ c-populate-syntax-table
   ;; The value here may be a list of functions or a single function.
   t nil
   c++ '(c-extend-region-for-CPP
+	c-depropertize-region
+	c-before-change-check-c++-raw-strings
 	c-before-change-check-<>-operators
 	c-invalidate-macro-cache)
   (c objc) '(c-extend-region-for-CPP
+	     c-depropertize-region
 	     c-invalidate-macro-cache)
   ;; java 'c-before-change-check-<>-operators
   awk 'c-awk-record-region-clear-NL)
@@ -509,7 +512,8 @@ c-populate-syntax-table
   (c objc) '(c-extend-font-lock-region-for-macros
 	     c-neutralize-syntax-in-and-mark-CPP
 	     c-change-expand-fl-region)
-  c++ '(c-extend-font-lock-region-for-macros
+  c++ '(c-after-change-mark-raw-strings
+	c-extend-font-lock-region-for-macros
 	c-neutralize-syntax-in-and-mark-CPP
 	c-restore-<>-properties
 	c-change-expand-fl-region)
diff --git a/lisp/progmodes/cc-mode.el b/lisp/progmodes/cc-mode.el
index 9ab0480..53322cf 100644
--- a/lisp/progmodes/cc-mode.el
+++ b/lisp/progmodes/cc-mode.el
@@ -877,6 +877,16 @@ c-called-from-text-property-change-p
   (memq (cadr (backtrace-frame 3))
 	'(put-text-property remove-list-of-text-properties)))
 
+(defun c-depropertize-region (beg end)
+  ;; Remove the punctuation syntax-table text property from the region
+  ;; (c-new-BEG c-new-END).
+  ;;
+  ;; This function is in the C/C++/ObjC values of
+  ;; `c-get-state-before-change-functions' and is called exclusively as a
+  ;; before change function.
+  (c-clear-char-property-with-value
+   c-new-BEG c-new-END 'syntax-table '(1)))
+
 (defun c-extend-region-for-CPP (beg end)
   ;; Adjust `c-new-BEG', `c-new-END' respectively to the beginning and end of
   ;; any preprocessor construct they may be in. 
@@ -969,7 +979,7 @@ c-neutralize-syntax-in-and-mark-CPP
   ;; This function might make hidden buffer changes.
   (c-save-buffer-state (limits )
     ;; Clear 'syntax-table properties "punctuation":
-    (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
+    ;; (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
 
     ;; CPP "comment" markers:
     (if (eval-when-compile (memq 'category-properties c-emacs-features));Emacs.


-- 
Alan Mackenzie (Nuremberg, Germany).


[-- Attachment #2: raw-string.cc --]
[-- Type: text/x-c, Size: 373 bytes --]

char foo [] = R"(foo)";
char bar [] = R"bar(bar)bar";
char empty [] = R"()";
char quote1 [] = R"34(")34";
char quote2 [] = R"0x22(foo"bar)0x22";
char sixteen [] = R"0123456789abcdef(sixteen)0123456789abcdef";
char seventeen [] = R"0123456789abcdefg(seventeen)0123456789abcdefg";
char multi_line [] = R"(First line.
Second line.
Third line.)";
char baz [] = R"baz(baz)foo";

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-29 21:36         ` Alan Mackenzie
@ 2016-05-31 14:22           ` Ivan Andrus
  2016-05-31 21:32             ` Alan Mackenzie
  2016-05-31 22:21             ` Alan Mackenzie
  0 siblings, 2 replies; 19+ messages in thread
From: Ivan Andrus @ 2016-05-31 14:22 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 15212

[-- Attachment #1: Type: text/plain, Size: 3345 bytes --]

On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:


Hi, Ivan.

I've now got a patch, which I'd be grateful if you could try out, both
to see if there are any bugs, and also to get your general impression.
I think there are one or two bugs left in the code, and it needs tidying
up quite a lot.  So this won't be the final version.


Awesome.  I’ll keep looking and let you know of any bugs I find.

I did find one.  According to
http://en.cppreference.com/w/cpp/language/string_literal the delimiter can
contain any characters except parentheses, backslash and spaces.  Using
square brackets confuses c++-mode though:

char brackets [] = R"0x22[(foobar)0x22[";

Now, I’ve never actually seen such a construct in the wild, but it would be
good to fix it regardless.  The *Messages* buffer shows

File mode specification error: (invalid-regexp Unmatched [ or [^)
Error during redisplay: (jit-lock-function 1013) signaled (invalid-regexp
"Unmatched [ or [^")

which seems to point to a missing regexp-quote, and indeed it thinks

  char bar [] = R"YYY*(bar)YYY";

is a valid string literal.

Moreover, I was somehow able to get it into a bad state where changing the
delimiters wouldn’t update fontification.  I’ll see if I can come up with a
recipe for how to reproduce it reliably.

The patch will work only on the savannah master branch - sorry, but it
depends on the fix to the "infrastructure" bug which I committed to
master only this morning (timezone +0200).


I'm also attaching a small test file which might interest you.


On Sat, May 28, 2016 at 02:40:45PM +0000, Alan Mackenzie wrote:

Thanks for the suggestion!  I've actually had an almost working solution
myself for just over a week.  Then I got confused with a bug in some CC
Mode "infrastructure" code.  Such is life!


The way I am fontifying these is thus:
(i) For a correctly terminated raw string, everything between the ( and )
inclusive gets string face, everything else just the default face:


           R"foo(bar)foo"
                ^^^^^
         font-lock-string-face.


I was wondering how this would work.  It’s a little weird that regular
string delimiters are fontified with font-lock-string-face, but these
aren’t.  But I think I like this way better since it’s much easier to
confuse these delimiters with the contents of the string than normal string
delimiters.

(ii) For a construct with a raw string opener, not correctly terminated,
I am putting warning face on the entire raw string opener, leaving the
rest of the string with string face, e.g.:


           R"baz(bar)foo"
           ^^^^^^
    font-lock-warning-face
                 ^^^^^^^^
      font-lock-string-face


Of course, that is subject to change if it doesn't work very well.


CC Mode doesn't actually use syntax-ppss and syntax-propertize-function,
since they don't allow enough control.  In particular, on a buffer
change, they erase all syntax-table text properties between point and end
of buffer which is wasteful; it is never necessary to erase these beyond
the next end of statement, and they are quite expensive to apply.


Anyhow, we should be able to have this implemented and the bug closed
pretty soon.


Thanks a bunch for your work on this.

-Ivan

[-- Attachment #2: Type: text/html, Size: 6088 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-31 14:22           ` Ivan Andrus
@ 2016-05-31 21:32             ` Alan Mackenzie
  2016-05-31 23:52               ` Michael Welsh Duggan
  2016-05-31 22:21             ` Alan Mackenzie
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Mackenzie @ 2016-05-31 21:32 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hello again, Ivan.

On Tue, May 31, 2016 at 08:22:07AM -0600, Ivan Andrus wrote:
> On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:

>> I've now got a patch, which I'd be grateful if you could try out, both
>> to see if there are any bugs, and also to get your general impression.
>> I think there are one or two bugs left in the code, and it needs tidying
>> up quite a lot.  So this won't be the final version.


> Awesome.  I’ll keep looking and let you know of any bugs I find.

> I did find one.  According to
> http://en.cppreference.com/w/cpp/language/string_literal the delimiter can
> contain any characters except parentheses, backslash and spaces.

Yes, I've read that and got angry with it.  It's vague - it's not clear
what is meant by "any source character" - the C++11 page in Wikipedia
says that control characters are excluded.  In practice, I suspect it
won't matter all that much - most of the time the delimiter will just be
"\"(" - anybody trying to do anything fancy in the delimiter deserves
everything she gets.  ;-)

> Using square brackets confuses c++-mode though:

> char brackets [] = R"0x22[(foobar)0x22[";

> Now, I’ve never actually seen such a construct in the wild, but it would be
> good to fix it regardless.  The *Messages* buffer shows

> File mode specification error: (invalid-regexp Unmatched [ or [^)
> Error during redisplay: (jit-lock-function 1013) signaled (invalid-regexp
> "Unmatched [ or [^")

> which seems to point to a missing regexp-quote, and indeed it thinks

>   char bar [] = R"YYY*(bar)YYY";

> is a valid string literal.

Yes indeed!  I was doing a regexp search when an ordinary search was
needed (twice).  And a third occasion did indeed need a regexp-quote.
Here's a (supplementary) patch to fix these glitches:



--- cc-engine.el~	2016-05-29 22:21:06.000000000 +0000
+++ cc-engine.el	2016-05-31 20:49:48.000000000 +0000
@@ -5836,7 +5836,7 @@
 	      id (match-string-no-properties 1))
 	(goto-char (1+ open-paren-pos))
 	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
-		   (search-forward-regexp (concat ")" id "\"") nil t))
+		   (search-forward (concat ")" id "\"") nil t))
 	  (setq close-paren-pos (match-beginning 0)
 		close-quote-pos (1- (point))))))
     (and open-quote-pos
@@ -5908,7 +5908,7 @@
 	      (open-quote (1+ (match-beginning 0)))
 	      (open-paren (match-end 1))
 	      )
-	  (if (search-forward-regexp (concat ")" id "\"") nil t)
+	  (if (search-forward (concat ")" id "\"") nil t)
 	      (let ((end-string (match-beginning 0))
 		    (after-quote (match-end 0))
 		    )
--- cc-fonts.el~	2016-05-29 17:49:34.000000000 +0000
+++ cc-fonts.el	2016-05-31 21:01:14.000000000 +0000
@@ -1598,7 +1598,8 @@
 	(c-put-font-lock-face (match-beginning 1) (match-end 2)
 			      'default)
 	(when (search-forward-regexp
-	       (concat ")\\(" (match-string-no-properties 2) "\\)\"")
+	       (concat ")\\(" (regexp-quote (match-string-no-properties 2))
+		       "\\)\"")
 	       limit t)
 	  (c-put-font-lock-face (match-beginning 1) (point)
 				'default)))))



> Moreover, I was somehow able to get it into a bad state where changing the
> delimiters wouldn’t update fontification.  I’ll see if I can come up with a
> recipe for how to reproduce it reliably.

That's happened to me, too.  Maybe it's connected with the above error.
But then again, maybe not.  I'll keep trying to reproduce it, too.


>> The way I am fontifying these is thus:
>> (i) For a correctly terminated raw string, everything between the ( and )
>> inclusive gets string face, everything else just the default face:


>>            R"foo(bar)foo"
>>                 ^^^^^
>>          font-lock-string-face.


> I was wondering how this would work.  It’s a little weird that regular
> string delimiters are fontified with font-lock-string-face, but these
> aren’t.  But I think I like this way better since it’s much easier to
> confuse these delimiters with the contents of the string than normal string
> delimiters.

Glad you like it!

[ .... ]

>> Thanks a bunch for your work on this.

Thank you!

> -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-31 14:22           ` Ivan Andrus
  2016-05-31 21:32             ` Alan Mackenzie
@ 2016-05-31 22:21             ` Alan Mackenzie
  2016-06-01  5:21               ` Ivan Andrus
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Mackenzie @ 2016-05-31 22:21 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hello, yet again, Ivan!

On Tue, May 31, 2016 at 08:22:07AM -0600, Ivan Andrus wrote:
> On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:

[ .... ]

> Moreover, I was somehow able to get it into a bad state where changing the
> delimiters wouldn’t update fontification.  I’ll see if I can come up with a
> recipe for how to reproduce it reliably.

The following gets it into a bad state:
(i) Set up two separate valid raw strings with the same delimiter in
  both.
(ii) "Damage" the closing delimiter of the first string.  There is now
  just one raw string which extends to what used to be the end of the
  second raw string.
(iii) Restore the closing delimiter of the first string.  The
  syntax-table text properties and fontifications are now broken, and, I
  think, need the mode reinitialising to recover.

[ .... ]

> -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-31 21:32             ` Alan Mackenzie
@ 2016-05-31 23:52               ` Michael Welsh Duggan
  2016-06-02 16:36                 ` Alan Mackenzie
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Welsh Duggan @ 2016-05-31 23:52 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Ivan Andrus, 15212

Alan Mackenzie <acm@muc.de> writes:

> Hello again, Ivan.
>
> On Tue, May 31, 2016 at 08:22:07AM -0600, Ivan Andrus wrote:
>> On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:
>
>>> I've now got a patch, which I'd be grateful if you could try out, both
>>> to see if there are any bugs, and also to get your general impression.
>>> I think there are one or two bugs left in the code, and it needs tidying
>>> up quite a lot.  So this won't be the final version.
>
>
>> Awesome.  I’ll keep looking and let you know of any bugs I find.
>
>> I did find one.  According to
>> http://en.cppreference.com/w/cpp/language/string_literal the delimiter can
>> contain any characters except parentheses, backslash and spaces.
>
> Yes, I've read that and got angry with it.  It's vague - it's not clear
> what is meant by "any source character" - the C++11 page in Wikipedia
> says that control characters are excluded.  In practice, I suspect it
> won't matter all that much - most of the time the delimiter will just be
> "\"(" - anybody trying to do anything fancy in the delimiter deserves
> everything she gets.  ;-)

Her's what the standard says:

<raw-string>: 
  " <d-char-sequence(opt)> ( <r-char-sequence(opt)> ) <d-char-sequence(opt)> "

<r-char-sequence>:
  <r-char>
  <r-char-sequence> <r-char>

<r-char>:
  Any member of the source character set, except a right-parenthesis )
  followed by the initial <d-char-sequence> (which may be empty)
  followed by a double quote ".

<d-char-sequence>:
  <d-char>
  <d-char-sequence> <d-char>

<d-char>:
  any member of the basic source character set except:
    space, the left parenthesis (, the right parenthesis ), the
    backslash \, and the control characters representing horizontal tab,
    vertical tab, form feed, and newline.

Here's what it says about the basic source character set:

The basic source character set consists of 96 characters: the space
character, the control characters representing horizontal tab, vertical
tab, form feed, and new-line, plus the following 91 graphical
characters:

  a b c d e f g h i j k l m n o p q r s t u v w x y z
  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
  0 1 2 3 4 5 6 7 8 9
  _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’   


-- 
Michael Welsh Duggan
(md5i@md5i.com)





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-31 22:21             ` Alan Mackenzie
@ 2016-06-01  5:21               ` Ivan Andrus
  2016-06-02 16:07                 ` Alan Mackenzie
       [not found]                 ` <20160602160741.GC4067@acm.fritz.box>
  0 siblings, 2 replies; 19+ messages in thread
From: Ivan Andrus @ 2016-06-01  5:21 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 15212

On May 31, 2016, at 4:21 PM, Alan Mackenzie <acm@muc.de> wrote:
> 
> Hello, yet again, Ivan!
> 
> On Tue, May 31, 2016 at 08:22:07AM -0600, Ivan Andrus wrote:
>> On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:
> 
> [ .... ]
> 
>> Moreover, I was somehow able to get it into a bad state where changing the
>> delimiters wouldn’t update fontification.  I’ll see if I can come up with a
>> recipe for how to reproduce it reliably.
> 
> The following gets it into a bad state:
> (i) Set up two separate valid raw strings with the same delimiter in
>  both.
> (ii) "Damage" the closing delimiter of the first string.  There is now
>  just one raw string which extends to what used to be the end of the
>  second raw string.
> (iii) Restore the closing delimiter of the first string.  The
>  syntax-table text properties and fontifications are now broken, and, I
>  think, need the mode reinitialising to recover.

Good sleuthing.  That would fit with my experience.

-Ivan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-06-01  5:21               ` Ivan Andrus
@ 2016-06-02 16:07                 ` Alan Mackenzie
       [not found]                 ` <20160602160741.GC4067@acm.fritz.box>
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-02 16:07 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hello, Ivan.

On Tue, May 31, 2016 at 11:21:18PM -0600, Ivan Andrus wrote:
> On May 31, 2016, at 4:21 PM, Alan Mackenzie <acm@muc.de> wrote:

> > Hello, yet again, Ivan!

> > On Tue, May 31, 2016 at 08:22:07AM -0600, Ivan Andrus wrote:
> >> On May 29, 2016, at 3:36 PM, Alan Mackenzie <acm@muc.de> wrote:

> > [ .... ]

> >> Moreover, I was somehow able to get it into a bad state where changing the
> >> delimiters wouldn’t update fontification.  I’ll see if I can come up with a
> >> recipe for how to reproduce it reliably.

> > The following gets it into a bad state:
> > (i) Set up two separate valid raw strings with the same delimiter in
> >  both.
> > (ii) "Damage" the closing delimiter of the first string.  There is now
> >  just one raw string which extends to what used to be the end of the
> >  second raw string.
> > (iii) Restore the closing delimiter of the first string.  The
> >  syntax-table text properties and fontifications are now broken, and, I
> >  think, need the mode reinitialising to recover.

> Good sleuthing.  That would fit with my experience.

The following patch (a full patch which works with the savannah master
branch) should fix that problem.

It is not yet a workable patch, since it fails to take proper account of
macros and comments.  Indeed, the fontification fails when the raw string
is inside a macro.  CC Mode has become somewhat unwieldy in this area,
and I'll be working on it in the next few days.

Until then .....



diff -r d83a74c6ec31 cc-engine.el
--- a/cc-engine.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-engine.el	Thu Jun 02 16:01:26 2016 +0000
@@ -2295,7 +2295,8 @@
   ;;     (STATE TYPE (BEG . END))     if TO is in a literal; or
   ;;     (STATE)                      otherwise,
   ;; where STATE is the parsing state at TO, TYPE is the type of the literal
-  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal.
+  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal,
+  ;; including the delimiters.
   ;;
   ;; Unless NOT-IN-DELIMITER is non-nil, when TO is inside a two-character
   ;; comment opener, this is recognized as being in a comment literal.
@@ -5793,6 +5794,132 @@
 				       'c-decl-arg-start)))))))
       (or (c-forward-<>-arglist nil)
 	  (forward-char)))))
+
+\f
+;; Routines to handle C++ raw strings.
+(defun c-raw-string-pos ()
+  ;; Get POINT's relationship to any containing raw string.
+  ;; If point isn't in a raw string, return nil.
+  ;; Otherwise, return the following list:
+  ;;
+  ;;   (POS B\" B\( E\) E\")
+  ;;
+  ;; , where POS is the symbol `open-delim' if point is in the opening
+  ;; delimiter, the symbol `close-delim' if it's in the closing delimiter, and
+  ;; nil if it's in the string body.  B\", B\(, E\), E\" are the positions of
+  ;; the opening and closing quotes and parentheses of a correctly terminated
+  ;; raw string.  (N.B.: E\) and E\" are NOT on the "outside" of these
+  ;; characters.)  If the raw string is not terminated, E\) and E\" are set to
+  ;; nil.
+  ;;
+  ;; Note: this routine is dependant upon the correct syntax-table text
+  ;; properties being set.
+  (let* ((safe (c-state-semi-safe-place (point)))
+	 (state (c-state-pp-to-literal safe (point)))
+	 open-quote-pos open-paren-pos close-paren-pos close-quote-pos id)
+    (save-excursion
+      (when
+	  (and
+	   (cond
+	    ((null (cadr state))
+	     (or (eq (char-after) ?\")
+		 (search-backward "\"" (max (- (point) 17) (point-min)) t)))
+	    ((and (eq (cadr state) 'string)
+		  (goto-char (car (nth 2 state)))
+		  (or (eq (char-after) ?\")
+		      (search-backward "\"" (max (- (point) 17) (point-min)) t))
+		  (not (bobp)))))
+	   (eq (char-before) ?R)
+	   (looking-at "\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("))
+	(setq open-quote-pos (point)
+	      open-paren-pos (match-end 1)
+	      id (match-string-no-properties 1))
+	(goto-char (1+ open-paren-pos))
+	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
+		   (search-forward (concat ")" id "\"") nil t))
+	  (setq close-paren-pos (match-beginning 0)
+		close-quote-pos (1- (point))))))
+    (and open-quote-pos
+	 (list
+	  (cond
+	   ((<= (point) open-paren-pos)
+	    'open-delim)
+	   ((and close-paren-pos
+		 (> (point) close-paren-pos))
+	    'close-delim)
+	   (t nil))
+	  open-quote-pos open-paren-pos close-paren-pos close-quote-pos))))
+
+(defun c-before-change-check-c++-raw-strings (beg end)
+  ;; This function clears syntax-table text properties from C++ raw strings in
+  ;; the region (c-new-BEG c-new-END).
+  (c-save-buffer-state
+      ((beg-rs (progn (goto-char c-new-BEG) (c-raw-string-pos)))
+       (end-rs (progn (goto-char c-new-END) (c-raw-string-pos))) ; FIXME!!!
+					; Optimize this so that we don't call
+					; `c-raw-string-pos' twice when once
+					; will do.  (2016-06-02).
+       )
+    (when beg-rs
+      (setq c-new-BEG (min c-new-BEG (cadr beg-rs)))
+      (if (nth 3 beg-rs)
+	  ;; We've got a terminated raw string.
+	  (when (< (nth 2 beg-rs) beg)
+	    (c-clear-char-property-with-value
+	     (1+ (nth 2 beg-rs)) beg 'syntax-table '(1)))
+	;; We've got an unmatched raw string opening delimiter.
+	(c-clear-char-property (cadr beg-rs) 'syntax-table)
+	(c-clear-char-property (nth 2 beg-rs) 'syntax-table)))
+    (when end-rs
+      (setq c-new-END (max c-new-END
+			   (1+ (or (nth 4 end-rs)
+				   (nth 2 end-rs)))))
+      (if (nth 3 end-rs)
+	  ;; We've got a terminated raw string.
+	  (when (< end (nth 3 end-rs))
+	    (c-clear-char-property-with-value
+	     end (nth 3 end-rs) 'syntax-table '(1)))
+	;; We've got an unmatched raw string opening delimiter.
+	(c-clear-char-property (cadr end-rs) 'syntax-table)
+	(c-clear-char-property (nth 2 end-rs) 'syntax-table)))))
+
+(defun c-temp-before-change (beg end)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-before-change-check-c++-raw-strings beg end))
+
+(defun c-after-change-mark-raw-strings (beg end old-len)
+  ;; Put any needed text properties on raw strings.  This function is called
+  ;; as an after-change function.
+  (save-excursion
+    (c-save-buffer-state ()
+      (goto-char c-new-BEG)
+      (while (and (< (point) c-new-END)
+		  (c-syntactic-re-search-forward
+		   "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" c-new-END t))
+	(let ((id (match-string-no-properties 1))
+	      (open-quote (1+ (match-beginning 0)))
+	      (open-paren (match-end 1))
+	      )
+	  (if (search-forward (concat ")" id "\"") nil t)
+	      (let ((end-string (match-beginning 0))
+		    (after-quote (match-end 0))
+		    )
+		(goto-char open-paren)
+		(while (progn (skip-syntax-forward "^\"" end-string)
+			      (< (point) end-string))
+		  (c-put-char-property (point) 'syntax-table '(1)) ; punctuation
+		  (forward-char))
+		(goto-char after-quote))
+	    (c-put-char-property open-quote 'syntax-table '(1)) ; punctuation
+	    (c-put-char-property open-paren 'syntax-table '(15))))) ; generic string
+
+      )))
+
+(defun c-temp-after-change (beg end old-len)
+  (setq c-new-BEG beg
+	c-new-END end)
+  (c-after-change-mark-raw-strings beg end old-len))
 \f
 ;; Handling of small scale constructs like types and names.
 
diff -r d83a74c6ec31 cc-fonts.el
--- a/cc-fonts.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-fonts.el	Thu Jun 02 16:01:26 2016 +0000
@@ -717,6 +717,10 @@
 	(concat ".\\(" c-string-limit-regexp "\\)")
 	'((c-font-lock-invalid-string)))
 
+      ;; Fontify C++ raw strings.
+      ,@(when (c-major-mode-is 'c++-mode)
+	  '(c-font-lock-c++-raw-strings))
+
       ;; Fontify keyword constants.
       ,@(when (c-lang-const c-constant-kwds)
 	  (let ((re (c-make-keywords-re nil (c-lang-const c-constant-kwds))))
@@ -1572,6 +1576,35 @@
 	    (c-forward-syntactic-ws)
 	    (c-font-lock-declarators limit t in-typedef)))))))
 
+(defun c-font-lock-c++-raw-strings (limit)
+  ;; Fontify C++ raw strings.
+  ;;
+  ;; This function will be called from font-lock for a region bounded by POINT
+  ;; and LIMIT, as though it were to identify a keyword for
+  ;; font-lock-keyword-face.  It always returns NIL to inhibit this and
+  ;; prevent a repeat invocation.  See elisp/lispref page "Search-based
+  ;; Fontification".
+  (while (search-forward-regexp
+	  "R\\(\"\\)\\([^ ()\\\n\r\t]\\{,16\\}\\)(" limit t)
+    (when ;; (eq (c-get-char-property (1- (point)) 'face)
+	;;     'font-lock-string-face)
+	(or (and (eobp)
+		 (eq (c-get-char-property (1- (point)) 'face)
+		     'font-lock-warning-face))
+	    (eq (c-get-char-property (point) 'face) 'font-lock-string-face))
+      (if (c-get-char-property (1- (point)) 'syntax-table)
+	  (c-put-font-lock-face (match-beginning 0) (match-end 0)
+				'font-lock-warning-face)
+	(c-put-font-lock-face (match-beginning 1) (match-end 2)
+			      'default)
+	(when (search-forward-regexp
+	       (concat ")\\(" (regexp-quote (match-string-no-properties 2))
+		       "\\)\"")
+	       limit t)
+	  (c-put-font-lock-face (match-beginning 1) (point)
+				'default)))))
+  nil)
+
 (c-lang-defconst c-simple-decl-matchers
   "Simple font lock matchers for types and declarations.  These are used
 on level 2 only and so aren't combined with `c-complex-decl-matchers'."
diff -r d83a74c6ec31 cc-langs.el
--- a/cc-langs.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-langs.el	Thu Jun 02 16:01:26 2016 +0000
@@ -457,9 +457,12 @@
   ;; The value here may be a list of functions or a single function.
   t nil
   c++ '(c-extend-region-for-CPP
+	c-depropertize-region
+	c-before-change-check-c++-raw-strings
 	c-before-change-check-<>-operators
 	c-invalidate-macro-cache)
   (c objc) '(c-extend-region-for-CPP
+	     c-depropertize-region
 	     c-invalidate-macro-cache)
   ;; java 'c-before-change-check-<>-operators
   awk 'c-awk-record-region-clear-NL)
@@ -492,7 +495,8 @@
   (c objc) '(c-extend-font-lock-region-for-macros
 	     c-neutralize-syntax-in-and-mark-CPP
 	     c-change-expand-fl-region)
-  c++ '(c-extend-font-lock-region-for-macros
+  c++ '(c-after-change-mark-raw-strings
+	c-extend-font-lock-region-for-macros
 	c-neutralize-syntax-in-and-mark-CPP
 	c-restore-<>-properties
 	c-change-expand-fl-region)
diff -r d83a74c6ec31 cc-mode.el
--- a/cc-mode.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-mode.el	Thu Jun 02 16:01:26 2016 +0000
@@ -859,6 +859,16 @@
   (memq (cadr (backtrace-frame 3))
 	'(put-text-property remove-list-of-text-properties)))
 
+(defun c-depropertize-region (beg end)
+  ;; Remove the punctuation syntax-table text property from the region
+  ;; (c-new-BEG c-new-END).
+  ;;
+  ;; This function is in the C/C++/ObjC values of
+  ;; `c-get-state-before-change-functions' and is called exclusively as a
+  ;; before change function.
+  (c-clear-char-property-with-value
+   c-new-BEG c-new-END 'syntax-table '(1)))
+
 (defun c-extend-region-for-CPP (beg end)
   ;; Adjust `c-new-BEG', `c-new-END' respectively to the beginning and end of
   ;; any preprocessor construct they may be in. 
@@ -951,7 +961,7 @@
   ;; This function might make hidden buffer changes.
   (c-save-buffer-state (limits )
     ;; Clear 'syntax-table properties "punctuation":
-    (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
+    ;; (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
 
     ;; CPP "comment" markers:
     (if (memq 'category-properties c-emacs-features) ; GNU Emacs.



> -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-05-31 23:52               ` Michael Welsh Duggan
@ 2016-06-02 16:36                 ` Alan Mackenzie
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-02 16:36 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: Ivan Andrus, 15212

Hello, Michael.

On Tue, May 31, 2016 at 07:52:55PM -0400, Michael Welsh Duggan wrote:
> Alan Mackenzie <acm@muc.de> writes:

[ .... ]

> >> I did find one.  According to
> >> http://en.cppreference.com/w/cpp/language/string_literal the delimiter can
> >> contain any characters except parentheses, backslash and spaces.

> > Yes, I've read that and got angry with it.  It's vague - it's not clear
> > what is meant by "any source character" - the C++11 page in Wikipedia
> > says that control characters are excluded.  In practice, I suspect it
> > won't matter all that much - most of the time the delimiter will just be
> > "\"(" - anybody trying to do anything fancy in the delimiter deserves
> > everything she gets.  ;-)

> Her's what the standard says:

> <raw-string>: 
>   " <d-char-sequence(opt)> ( <r-char-sequence(opt)> ) <d-char-sequence(opt)> "

> <r-char-sequence>:
>   <r-char>
>   <r-char-sequence> <r-char>

> <r-char>:
>   Any member of the source character set, except a right-parenthesis )
>   followed by the initial <d-char-sequence> (which may be empty)
>   followed by a double quote ".

> <d-char-sequence>:
>   <d-char>
>   <d-char-sequence> <d-char>

> <d-char>:
>   any member of the basic source character set except:
>     space, the left parenthesis (, the right parenthesis ), the
>     backslash \, and the control characters representing horizontal tab,
>     vertical tab, form feed, and newline.

> Here's what it says about the basic source character set:

> The basic source character set consists of 96 characters: the space
> character, the control characters representing horizontal tab, vertical
> tab, form feed, and new-line, plus the following 91 graphical
> characters:

>   a b c d e f g h i j k l m n o p q r s t u v w x y z
>   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
>   0 1 2 3 4 5 6 7 8 9
>   _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’   

Thanks, that's appreciated.

> -- 
> Michael Welsh Duggan
> (md5i@md5i.com)

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
       [not found]                 ` <20160602160741.GC4067@acm.fritz.box>
@ 2016-06-06 16:32                   ` Alan Mackenzie
       [not found]                   ` <20160606163203.GA19322@acm.fritz.box>
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-06 16:32 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

[-- Attachment #1: Type: text/plain, Size: 22866 bytes --]

Hello again, Ivan.

On Thu, Jun 02, 2016 at 04:07:41PM +0000, Alan Mackenzie wrote:

[ .... ]

> It is not yet a workable patch, since it fails to take proper account of
> macros and comments.  Indeed, the fontification fails when the raw string
> is inside a macro.  CC Mode has become somewhat unwieldy in this area,
> and I'll be working on it in the next few days.

> Until then .....

In this Email is a patch which does support macros, and (at least to
some extent) comments.  If there is an unbalanced raw string opener
inside a macro, the rest of the macro gets string face, but the text
after the macro should be unaffected by it.

The patch should be quite close to release condition.  As always, it
should apply cleanly to the savannah master branch.  Again, as always,
after applying the patch, please recompile (at least) cc-langs.el,
cc-fonts.el, cc-engine.el, and cc-mode.el

Could you try it in your real code, please, and if you have time, try
and break it.  I'm enclosing the current edition of my test file again.
I look forward to hearing back from you.



diff -r d83a74c6ec31 cc-engine.el
--- a/cc-engine.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-engine.el	Mon Jun 06 16:15:05 2016 +0000
@@ -85,8 +85,9 @@
 ;;
 ;; 'syntax-table
 ;;   Used to modify the syntax of some characters.  It is used to
-;;   mark the "<" and ">" of angle bracket parens with paren syntax, and
-;;   to "hide" obtrusive characters in preprocessor lines.
+;;   mark the "<" and ">" of angle bracket parens with paren syntax, to
+;;   "hide" obtrusive characters in preprocessor lines, and to mark C++
+;;   raw strings to enable their fontification.
 ;;
 ;;   This property is used on single characters and is therefore
 ;;   always treated as front and rear nonsticky (or start and end open
@@ -2295,7 +2296,8 @@
   ;;     (STATE TYPE (BEG . END))     if TO is in a literal; or
   ;;     (STATE)                      otherwise,
   ;; where STATE is the parsing state at TO, TYPE is the type of the literal
-  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal.
+  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal,
+  ;; including the delimiters.
   ;;
   ;; Unless NOT-IN-DELIMITER is non-nil, when TO is inside a two-character
   ;; comment opener, this is recognized as being in a comment literal.
@@ -5673,6 +5675,9 @@
 ;; Set by c-common-init in cc-mode.el.
 (defvar c-new-BEG)
 (defvar c-new-END)
+;; Set by c-after-change in cc-mode.el.
+(defvar c-old-BEG)
+(defvar c-old-END)
 
 (defun c-before-change-check-<>-operators (beg end)
   ;; Unmark certain pairs of "< .... >" which are currently marked as
@@ -5793,6 +5798,329 @@
 				       'c-decl-arg-start)))))))
       (or (c-forward-<>-arglist nil)
 	  (forward-char)))))
+
+\f
+;; Functions to handle C++ raw strings.
+;;
+;; A valid C++ raw string looks like
+;;     R"<id>(<contents>)<id>"
+;; , where <id> is an identifier from 0 to 16 characters long, not containing
+;; spaces, control characters, double quote or left/right paren.  <contents>
+;; can include anything which isn't the terminating )<id>", including new
+;; lines, "s, parentheses, etc.
+;;
+;; CC Mode handles C++ raw strings by the use of `syntax-table' text
+;; properties as follows:
+;;
+;; (i) On a validly terminated raw string, no `syntax-table' text properties
+;;   are applied to the opening and closing delimiters, but any " in the
+;;   contents is given the property value "punctuation" (`(1)') to prevent it
+;;   interacting with the "s in the delimiters.
+;;
+;;   The font locking routine `c-font-lock-c++-raw-strings' (in cc-fonts.el)
+;;   recognizes valid raw strings, and fontifies the delimiters (apart from
+;;   the parentheses) with the default face and the parentheses and the
+;;   <contents> with font-lock-string-face.
+;;
+;; (ii) A valid, but unterminated, raw string opening delimiter gets the
+;;   "punctuation" value (`(1)') of the `syntax-table' text property, and the
+;;   open parenthesis gets the "string fence" value (`(15)').
+;;
+;;   `c-font-lock-c++-raw-strings' puts c-font-lock-warning-face on the entire
+;;   unmatched opening delimiter (from the R up to the open paren), and allows
+;;   the rest of the buffer to get font-lock-string-face, caused by the
+;;   unmatched "string fence" `syntax-table' text property value.
+;;
+;; (iii) Inside a macro, a valid raw string is handled as in (i).  An
+;;   unmatched opening delimiter is handled slightly differently.  In addition
+;;   to the "punctuation" and "string fence" properties on the delimiter,
+;;   another "string fence" `syntax-table' property is applied to the last
+;;   possible character of the macro before the terminating linefeed (if there
+;;   is such a character after the "(").  This "last possible" character is
+;;   never a backslash escaping the end of line.  If the character preceding
+;;   this "last possible" character is itself a backslash, this preceding
+;;   character gets a "punctuation" `syntax-table' value.  If the "(" is
+;;   already at the end of the macro, it gets the "punctuaion" value, and no
+;;   "string fence"s are used.
+;;
+;;   The effect on the fontification of either of these tactics is that rest of
+;;   the macro (if any) after the "(" gets font-lock-string-face, but the rest
+;;   of the file is fontified normally.
+
+
+(defun c-raw-string-pos ()
+  ;; Get POINT's relationship to any containing raw string.
+  ;; If point isn't in a raw string, return nil.
+  ;; Otherwise, return the following list:
+  ;;
+  ;;   (POS B\" B\( E\) E\")
+  ;;
+  ;; , where POS is the symbol `open-delim' if point is in the opening
+  ;; delimiter, the symbol `close-delim' if it's in the closing delimiter, and
+  ;; nil if it's in the string body.  B\", B\(, E\), E\" are the positions of
+  ;; the opening and closing quotes and parentheses of a correctly terminated
+  ;; raw string.  (N.B.: E\) and E\" are NOT on the "outside" of these
+  ;; characters.)  If the raw string is not terminated, E\) and E\" are set to
+  ;; nil.
+  ;;
+  ;; Note: this routine is dependant upon the correct syntax-table text
+  ;; properties being set.
+  (let* ((safe (c-state-semi-safe-place (point)))
+	 (state (c-state-pp-to-literal safe (point)))
+	 open-quote-pos open-paren-pos close-paren-pos close-quote-pos id)
+    (save-excursion
+      (when
+	  (and
+	   (cond
+	    ((null (cadr state))
+	     (or (eq (char-after) ?\")
+		 (search-backward "\"" (max (- (point) 17) (point-min)) t)))
+	    ((and (eq (cadr state) 'string)
+		  (goto-char (car (nth 2 state)))
+		  (or (eq (char-after) ?\")
+		      (search-backward "\"" (max (- (point) 17) (point-min)) t))
+		  (not (bobp)))))
+	   (eq (char-before) ?R)
+	   (looking-at "\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("))
+	(setq open-quote-pos (point)
+	      open-paren-pos (match-end 1)
+	      id (match-string-no-properties 1))
+	(goto-char (1+ open-paren-pos))
+	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
+		   (search-forward (concat ")" id "\"") nil t))
+	  (setq close-paren-pos (match-beginning 0)
+		close-quote-pos (1- (point))))))
+    (and open-quote-pos
+	 (list
+	  (cond
+	   ((<= (point) open-paren-pos)
+	    'open-delim)
+	   ((and close-paren-pos
+		 (> (point) close-paren-pos))
+	    'close-delim)
+	   (t nil))
+	  open-quote-pos open-paren-pos close-paren-pos close-quote-pos))))
+
+(defun c-depropertize-raw-string (id open-quote open-paren bound)
+  ;; Point is immediately after a raw string opening delimiter.  Remove any
+  ;; `syntax-table' text properties associated with the delimiter (if its
+  ;; unmatched) or the raw string.
+  ;;
+  ;; ID, a string, is the delimiter's identifier.  OPEN-QUOTE and OPEN-PAREN
+  ;; are the buffer positions of the delimiter's components.  BOUND is the
+  ;; bound for searching for a matching closing delimiter; it is usually nil,
+  ;; but if we're inside a macro, it's the end of the macro.
+  ;;
+  ;; Point is moved to after the (terminated) raw string, or left after the
+  ;; unmatched opening delimiter, as the case may be.  The return value is of
+  ;; no significance.
+  (let ((open-paren-prop (c-get-char-property open-paren 'syntax-table)))
+    (cond
+     ((null open-paren-prop)
+      ;; A terminated raw string
+      (if (search-forward (concat ")" id "\"") nil t)
+	  (c-clear-char-property-with-value
+	   (1+ open-paren) (match-beginning 0) 'syntax-table '(1))))
+     ((or (and (equal open-paren-prop '(15)) (null bound))
+	  (equal open-paren-prop '(1)))
+      ;; An unterminated raw string either not in a macro, or in a macro with
+      ;; the open parenthesis right up against the end of macro
+      (c-clear-char-property open-quote 'syntax-table)
+      (c-clear-char-property open-paren 'syntax-table))
+     (t
+      ;; An unterminated string in a macro, with at least one char after the
+      ;; open paren
+      (c-clear-char-property open-quote 'syntax-table)
+      (c-clear-char-property open-paren 'syntax-table)
+      (let ((string-fence-pos
+	     (save-excursion
+	       (goto-char (1+ open-paren))
+	       (c-search-forward-char-property 'syntax-table '(15) bound))))
+	(when string-fence-pos
+	  (c-clear-char-property string-fence-pos 'syntax-table)))
+      ))))
+
+(defun c-depropertize-raw-strings-in-region (start finish)
+  ;; Remove any `syntax-table' text properties associated with C++ raw strings
+  ;; contained in the region (START FINISH).  Point is undefined at entry and
+  ;; exit, and the return value has no significance.
+  (goto-char start)
+  (while (and (< (point) finish)
+	      (re-search-forward
+	       (concat "\\("				     ; 1
+		       c-anchored-cpp-prefix		     ; 2
+		       "\\)\\|\\("			     ; 3
+		       "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("   ; 4
+		       "\\)")
+	       finish t))
+    (when (save-excursion
+	    (goto-char (match-beginning 0)) (not (c-in-literal)))
+      (if (match-beginning 4)		; the id
+	  ;; We've found a raw string
+	  (c-depropertize-raw-string
+	   (match-string-no-properties 4) ; id
+	   (1+ (match-beginning 3))	  ; open quote
+	   (match-end 4)		  ; open paren
+	   nil)				  ; bound
+	;; We've found a CPP construct.  Search for raw strings within it.
+	(goto-char (match-beginning 2)) ; the "#"
+	(c-end-of-macro)
+	(let ((eom (point)))
+	  (goto-char (match-end 2))	; after the "#".
+	  (while (and (< (point) eom)
+		      (c-syntactic-re-search-forward
+		       "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" eom t))
+	    (c-depropertize-raw-string
+	     (match-string-no-properties 1) ; id
+	     (1+ (match-beginning 0))	    ; open quote
+	     (match-end 1)		    ; open paren
+	     eom)))))))			    ; bound.
+
+(defun c-before-change-check-c++-raw-strings (beg end)
+  ;; This function clears `syntax-table' text properties from C++ raw strings
+  ;; in the region (c-new-BEG c-new-END).  BEG and END are the standard
+  ;; arguments supplied to any before-change function.
+  ;;
+  ;; Point is undefined on both entry and exit, and the return value has no
+  ;; significance.
+  ;;
+  ;; This function is called as a before-change function solely due to its
+  ;; membership of the C++ value of `c-get-state-before-change-functions'.
+  (c-save-buffer-state
+      ((beg-rs (progn (goto-char beg) (c-raw-string-pos)))
+       (beg-plus (if (null beg-rs)
+		     beg
+		   (max beg
+			(1+ (or (nth 4 beg-rs) (nth 2 beg-rs))))))
+       (end-rs (progn (goto-char end) (c-raw-string-pos))) ; FIXME!!!
+					; Optimize this so that we don't call
+					; `c-raw-string-pos' twice when once
+					; will do.  (2016-06-02).
+       (end-minus (if (null end-rs)
+		      end
+		    (min end (cadr end-rs))))
+       )
+    (when beg-rs
+      (setq c-new-BEG (min c-new-BEG (1- (cadr beg-rs)))))
+    (c-depropertize-raw-strings-in-region c-new-BEG beg-plus)
+
+    (when end-rs
+      (setq c-new-END (max c-new-END
+			   (1+ (or (nth 4 end-rs)
+				   (nth 2 end-rs))))))
+    (c-depropertize-raw-strings-in-region end-minus c-new-END)))
+
+(defun c-propertize-raw-string-opener (id open-quote open-paren bound)
+  ;; Point is immediately after a raw string opening delimiter.  Apply any
+  ;; pertinent `syntax-table' text properties to the delimiter and also the
+  ;; raw string, should there be a valid matching closing delimiter.
+  ;;
+  ;; ID, a string, is the delimiter's identifier.  OPEN-QUOTE and OPEN-PAREN
+  ;; are the buffer positions of the delimiter's components.  BOUND is the
+  ;; bound for searching for a matching closing delimiter; it is usually nil,
+  ;; but if we're inside a macro, it's the end of the macro.
+  ;;
+  ;; Point is moved to after the (terminated) raw string, or left after the
+  ;; unmatched opening delimiter, as the case may be.  The return value is of
+  ;; no significance.
+  (if (search-forward (concat ")" id "\"") bound t)
+      (let ((end-string (match-beginning 0))
+	    (after-quote (match-end 0))
+	    )
+	(goto-char open-paren)
+	(while (progn (skip-syntax-forward "^\"" end-string)
+		      (< (point) end-string))
+	  (c-put-char-property (point) 'syntax-table '(1)) ; punctuation
+	  (forward-char))
+	(goto-char after-quote))
+    (c-put-char-property open-quote 'syntax-table '(1))	     ; punctuation
+    (c-put-char-property open-paren 'syntax-table '(15))     ; generic string
+    (when bound
+      ;; In a CPP construct, we try to apply a generic-string `syntax-table'
+      ;; text property to the last possible character in the string, so that
+      ;; only characters within the macro get "stringed out".
+      (goto-char bound)
+      (if (save-restriction
+	    (narrow-to-region (1+ open-paren) (point-max))
+	    (re-search-backward
+	     (eval-when-compile
+	       (concat "\\("		; 1
+		       "\\(\\`[^\\]?\\|[^\\][^\\]\\)\\(\\\\\\(.\\|\n\\)\\)*" ; 2-4
+		       "\\(\\\\.\\)"	; 5
+		       "\\|"
+		       "\\(\\`\\|[^\\]\\|\\(\\`[^\\]?\\|[^\\][^\\]\\)\\(\\\\\\(.\\|\n\\)\\)+\\)" ; 6-9
+		       "\\([^\\]\\)"	; 10
+		       "\\)"
+		       "\\(\\\\\n\\)*\\=")) ; 11
+             (1+ open-paren) t))
+	  (if (match-beginning 10)
+	      (c-put-char-property (match-beginning 10) 'syntax-table '(15))
+	    (c-put-char-property (match-beginning 5) 'syntax-table '(1))
+	    (c-put-char-property (1+ (match-beginning 5)) 'syntax-table '(15)))
+	(c-put-char-property open-paren 'syntax-table '(1)))
+
+								; )
+      (goto-char bound))))
+
+(defun c-after-change-re-mark-raw-strings (beg end old-len)
+  ;; This function applies `syntax-table' text properties to C++ raw strings
+  ;; beginning in the region (c-new-BEG c-new-END).  BEG, END, and OLD-LEN are
+  ;; the standard arguments supplied to any after-change function.
+  ;;
+  ;; Point is undefined on both entry and exit, and the return value has no
+  ;; significance.
+  ;; 
+  ;; This function is called as an after-change function solely due to its
+  ;; membership of the C++ value of `c-before-font-lock-functions'.
+  (c-save-buffer-state ()
+    ;; If the region (c-new-BEG c-new-END) has expanded, remove
+    ;; `syntax-table' text-properties from the new piece(s).
+    (when (< c-new-BEG c-old-BEG)
+      (let ((beg-rs (progn (goto-char c-old-BEG) (c-raw-string-pos))))
+	(c-depropertize-raw-strings-in-region
+	 c-new-BEG
+	 (if beg-rs
+	     (1+ (or (nth 4 beg-rs) (nth 2 beg-rs)))
+	   c-old-BEG))))
+    (when (> c-new-END c-old-END)
+      (let ((end-rs (progn (goto-char c-old-END) (c-raw-string-pos))))
+	(c-depropertize-raw-strings-in-region
+	 (if end-rs
+	     (cadr end-rs)
+	   c-old-END)
+	 c-new-END)))
+    (goto-char c-new-BEG)
+    (while (and (< (point) c-new-END)
+		(re-search-forward
+		 (concat "\\("				       ; 1
+			 c-anchored-cpp-prefix		       ; 2
+			 "\\)\\|\\("			       ; 3
+			 "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("   ; 4
+			 "\\)")
+		 c-new-END t))
+      (when (save-excursion
+	      (goto-char (match-beginning 0)) (not (c-in-literal)))
+	(if (match-beginning 4)		; the id
+	    ;; We've found a raw string.
+	    (c-propertize-raw-string-opener
+	     (match-string-no-properties 4) ; id
+	     (1+ (match-beginning 3))	    ; open quote
+	     (match-end 4)		    ; open paren
+	     nil)			    ; bound
+	  ;; We've found a CPP construct.  Search for raw strings within it.
+	  (goto-char (match-beginning 2)) ; the "#"
+	  (c-end-of-macro)
+	  (let ((eom (point)))
+	    (goto-char (match-end 2))	; after the "#".
+	    (while (and (< (point) eom)
+			(c-syntactic-re-search-forward
+			 "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" eom t))
+	      (c-propertize-raw-string-opener
+	       (match-string-no-properties 1) ; id
+	       (1+ (match-beginning 0))	      ; open quote
+	       (match-end 1)		      ; open paren
+	       eom))))))))		      ; bound
+
 \f
 ;; Handling of small scale constructs like types and names.
 
diff -r d83a74c6ec31 cc-fonts.el
--- a/cc-fonts.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-fonts.el	Mon Jun 06 16:15:05 2016 +0000
@@ -717,6 +717,10 @@
 	(concat ".\\(" c-string-limit-regexp "\\)")
 	'((c-font-lock-invalid-string)))
 
+      ;; Fontify C++ raw strings.
+      ,@(when (c-major-mode-is 'c++-mode)
+	  '(c-font-lock-c++-raw-strings))
+
       ;; Fontify keyword constants.
       ,@(when (c-lang-const c-constant-kwds)
 	  (let ((re (c-make-keywords-re nil (c-lang-const c-constant-kwds))))
@@ -1572,6 +1576,44 @@
 	    (c-forward-syntactic-ws)
 	    (c-font-lock-declarators limit t in-typedef)))))))
 
+(defun c-font-lock-c++-raw-strings (limit)
+  ;; Fontify C++ raw strings.
+  ;;
+  ;; This function will be called from font-lock for a region bounded by POINT
+  ;; and LIMIT, as though it were to identify a keyword for
+  ;; font-lock-keyword-face.  It always returns NIL to inhibit this and
+  ;; prevent a repeat invocation.  See elisp/lispref page "Search-based
+  ;; Fontification".
+  (while (search-forward-regexp
+	  "R\\(\"\\)\\([^ ()\\\n\r\t]\\{,16\\}\\)(" limit t)
+    (when ;; (eq (c-get-char-property (1- (point)) 'face)
+	;;     'font-lock-string-face)
+	(or (and (eobp)
+		 (eq (c-get-char-property (1- (point)) 'face)
+		     'font-lock-warning-face))
+	    (eq (c-get-char-property (point) 'face) 'font-lock-string-face)
+	    (and (equal (c-get-char-property (match-end 2) 'syntax-table) '(1))
+		 (equal (c-get-char-property (match-beginning 1) 'syntax-table)
+			'(1))))
+      (let ((paren-prop (c-get-char-property (1- (point)) 'syntax-table)))
+	(if paren-prop
+	    (progn
+	      (c-put-font-lock-face (match-beginning 0) (match-end 0)
+				    'font-lock-warning-face)
+	      (when
+		  (and
+		   (equal paren-prop '(15))
+		   (not (c-search-forward-char-property 'syntax-table '(15) limit)))
+		(goto-char limit)))
+	  (c-put-font-lock-face (match-beginning 1) (match-end 2) 'default)
+	  (when (search-forward-regexp
+		 (concat ")\\(" (regexp-quote (match-string-no-properties 2))
+			 "\\)\"")
+		 limit t)
+	    (c-put-font-lock-face (match-beginning 1) (point)
+				  'default))))))
+  nil)
+
 (c-lang-defconst c-simple-decl-matchers
   "Simple font lock matchers for types and declarations.  These are used
 on level 2 only and so aren't combined with `c-complex-decl-matchers'."
diff -r d83a74c6ec31 cc-langs.el
--- a/cc-langs.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-langs.el	Mon Jun 06 16:15:05 2016 +0000
@@ -457,9 +457,12 @@
   ;; The value here may be a list of functions or a single function.
   t nil
   c++ '(c-extend-region-for-CPP
+	c-before-change-check-c++-raw-strings
 	c-before-change-check-<>-operators
+	c-depropertize-CPP
 	c-invalidate-macro-cache)
   (c objc) '(c-extend-region-for-CPP
+	     c-depropertize-CPP
 	     c-invalidate-macro-cache)
   ;; java 'c-before-change-check-<>-operators
   awk 'c-awk-record-region-clear-NL)
@@ -493,6 +496,7 @@
 	     c-neutralize-syntax-in-and-mark-CPP
 	     c-change-expand-fl-region)
   c++ '(c-extend-font-lock-region-for-macros
+	c-after-change-re-mark-raw-strings
 	c-neutralize-syntax-in-and-mark-CPP
 	c-restore-<>-properties
 	c-change-expand-fl-region)
diff -r d83a74c6ec31 cc-mode.el
--- a/cc-mode.el	Sun May 29 11:59:26 2016 +0000
+++ b/cc-mode.el	Mon Jun 06 16:15:05 2016 +0000
@@ -649,6 +649,14 @@
 (make-variable-buffer-local 'c-new-BEG)
 (defvar c-new-END 0)
 (make-variable-buffer-local 'c-new-END)
+;; The following two variables record the values of `c-new-BEG' and
+;; `c-new-END' just after `c-new-END' has been adjusted for the length of text
+;; inserted or removed.  They may be read by any after-change function (but
+;; should not be altered by one).
+(defvar c-old-BEG 0)
+(make-variable-buffer-local 'c-old-BEG)
+(defvar c-old-END 0)
+(make-variable-buffer-local 'c-old-END)
 
 (defun c-common-init (&optional mode)
   "Common initialization for all CC Mode modes.
@@ -859,6 +867,31 @@
   (memq (cadr (backtrace-frame 3))
 	'(put-text-property remove-list-of-text-properties)))
 
+(defun c-depropertize-CPP (beg end)
+  ;; Remove the punctuation syntax-table text property from the CPP parts of
+  ;; (c-new-BEG c-new-END).
+  ;;
+  ;; This function is in the C/C++/ObjC values of
+  ;; `c-get-state-before-change-functions' and is called exclusively as a
+  ;; before change function.
+  (goto-char c-new-BEG)
+  (while (and (< (point) beg)
+	      (search-forward-regexp c-anchored-cpp-prefix beg t))
+    (goto-char (match-beginning 1))
+    (let ((m-beg (point)))
+      (c-end-of-macro)
+      (c-clear-char-property-with-value
+       m-beg (min (point) beg) 'syntax-table '(1))))
+
+  (goto-char end)
+  (while (and (< (point) c-new-END)
+	      (search-forward-regexp c-anchored-cpp-prefix c-new-END t))
+    (goto-char (match-beginning 1))
+    (let ((m-beg (point)))
+      (c-end-of-macro)
+      (c-clear-char-property-with-value
+       m-beg (min (point) c-new-END) 'syntax-table '(1)))))
+
 (defun c-extend-region-for-CPP (beg end)
   ;; Adjust `c-new-BEG', `c-new-END' respectively to the beginning and end of
   ;; any preprocessor construct they may be in. 
@@ -949,9 +982,9 @@
   ;; Note: SPEED _MATTERS_ IN THIS FUNCTION!!!
   ;;
   ;; This function might make hidden buffer changes.
-  (c-save-buffer-state (limits )
+  (c-save-buffer-state (limits)
     ;; Clear 'syntax-table properties "punctuation":
-    (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
+    ;; (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
 
     ;; CPP "comment" markers:
     (if (memq 'category-properties c-emacs-features) ; GNU Emacs.
@@ -1101,8 +1134,8 @@
 
   ;; (c-new-BEG c-new-END) will be the region to fontify.  It may become
   ;; larger than (beg end).
-  ;; (setq c-new-BEG beg  c-new-END end)
   (setq c-new-END (- (+ c-new-END (- end beg)) old-len))
+  (setq c-old-BEG c-new-BEG  c-old-END c-new-END)
 
   (unless (c-called-from-text-property-change-p)
     (setq c-just-done-before-change nil)



> > -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).


[-- Attachment #2: raw-string.cc --]
[-- Type: text/x-c, Size: 754 bytes --]

char foo [] = R"(foo)";
char bar [] = // R"foo(bar)foo";
    R"foo(bar)foo";
char empty [] = R"()";
char quote1 [] = R"34(")34";
char quote2 [] = R"34(fooo"bar)34";
char sixteen [] = R"0123456789abcdef(sixteen)0123456789abcdef";
char seventeen [] = R"0123456789abcdefg(seventeen)0123456789abcdefg";
char multi_line [] = R"(First line.
#error: This should fontify as a CPP construct.
Second line.
Third line.)";
char brackets [] = R"0x22[(foobar)0x22[";

#define FOO(bar) char bar [] = R"bar(abcd"efgh)bar" R"baz(ij"kl)baz"
#define MULTILINE(bar) char bar [] = R"bar(\
\
        )bar"
\
)bar
#define BROKEN(bar) char bar [] = R"foobar(a\n\

char baz [] = R"baz(baz)baz";

char bar [] = R"foo(bar)foo";
char empty [] = R"()";
char baz [] = R"baz(baz)foo";

^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
       [not found]                   ` <20160606163203.GA19322@acm.fritz.box>
@ 2016-06-07 22:06                     ` Michael Welsh Duggan
  2016-06-07 22:21                       ` Alan Mackenzie
  2016-06-09  1:38                     ` Ivan Andrus
  1 sibling, 1 reply; 19+ messages in thread
From: Michael Welsh Duggan @ 2016-06-07 22:06 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 15212

Since you are working on string handling I thought I would verify
whether you handle the following:

Every string literal, including raw string literals, may be proceeded by
an encoding-prefix (no space separating).  The valid encoding-prefixes
are:

u8 u U L

Examples from the standard:

"..."
R"(...)"
u8"..."
u8R"**(...)**"
u"..."
uR"*~(..)*~"
U"..."
UR"zzz(...)zzz"
L"..."
LR"(...)"

The meanings of these prefixes are:

u8: UTF-8 string literal
u:  char16_t literal
U:  char32_t literal
L:  wchar_t literal

-- 
Michael Welsh Duggan
(md5i@md5i.com)





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-06-07 22:06                     ` Michael Welsh Duggan
@ 2016-06-07 22:21                       ` Alan Mackenzie
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-07 22:21 UTC (permalink / raw)
  To: Michael Welsh Duggan; +Cc: 15212

Hello, Michael.

On Tue, Jun 07, 2016 at 06:06:17PM -0400, Michael Welsh Duggan wrote:
> Since you are working on string handling I thought I would verify
> whether you handle the following:

> Every string literal, including raw string literals, may be proceeded by
> an encoding-prefix (no space separating).  The valid encoding-prefixes
> are:

> u8 u U L

> Examples from the standard:

> "..."
> R"(...)"
> u8"..."
> u8R"**(...)**"
> u"..."
> uR"*~(..)*~"
> U"..."
> UR"zzz(...)zzz"
> L"..."
> LR"(...)"

> The meanings of these prefixes are:

> u8: UTF-8 string literal
> u:  char16_t literal
> U:  char32_t literal
> L:  wchar_t literal

Thanks!  I hadn't forgotten about them, I was more postponing them until
the more difficult stuff was done.  To be honest, I'm not sure how much,
if any, special handling they'll need - as far as CC Mode is concerned,
they have no syntactic significance, I think.  The only thing I can
think of at the moment (and it is after midnight here) is that one of
the prefixes prefixing an unterminated raw string might also get
font-lock-warning face, just like the raw string delimiter.  Maybe.

By the way, I haven't forgotten about the state-cache bug, though I've
not made much progress on it, yet.

> -- 
> Michael Welsh Duggan
> (md5i@md5i.com)

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
       [not found]                   ` <20160606163203.GA19322@acm.fritz.box>
  2016-06-07 22:06                     ` Michael Welsh Duggan
@ 2016-06-09  1:38                     ` Ivan Andrus
  2016-06-09 15:04                       ` Alan Mackenzie
  1 sibling, 1 reply; 19+ messages in thread
From: Ivan Andrus @ 2016-06-09  1:38 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: 15212

I have been in a lot of meetings so I haven’t given it the normal usage treatment, but I did spend some time trying to break it. 

You may know about this, but if you comment the first line of a multi-line string, the closing quote will still match (for the purposes of backward-sexp) with the commented opening quote.  The parentheses on the other hand don't match (which is what I would expect).  Personally, I think it’s a small price to pay.

Thanks again for your work on this.  I’ll keep testing it, but as far as I’m concerned you can push it as soon as you want.

-Ivan

> On Jun 6, 2016, at 10:32 AM, Alan Mackenzie <acm@muc.de> wrote:
> 
> Hello again, Ivan.
> 
> On Thu, Jun 02, 2016 at 04:07:41PM +0000, Alan Mackenzie wrote:
> 
> [ .... ]
> 
>> It is not yet a workable patch, since it fails to take proper account of
>> macros and comments.  Indeed, the fontification fails when the raw string
>> is inside a macro.  CC Mode has become somewhat unwieldy in this area,
>> and I'll be working on it in the next few days.
> 
>> Until then .....
> 
> In this Email is a patch which does support macros, and (at least to
> some extent) comments.  If there is an unbalanced raw string opener
> inside a macro, the rest of the macro gets string face, but the text
> after the macro should be unaffected by it.
> 
> The patch should be quite close to release condition.  As always, it
> should apply cleanly to the savannah master branch.  Again, as always,
> after applying the patch, please recompile (at least) cc-langs.el,
> cc-fonts.el, cc-engine.el, and cc-mode.el
> 
> Could you try it in your real code, please, and if you have time, try
> and break it.  I'm enclosing the current edition of my test file again.
> I look forward to hearing back from you.
> 
> 
> 
> diff -r d83a74c6ec31 cc-engine.el
> --- a/cc-engine.el	Sun May 29 11:59:26 2016 +0000
> +++ b/cc-engine.el	Mon Jun 06 16:15:05 2016 +0000
> @@ -85,8 +85,9 @@
> ;;
> ;; 'syntax-table
> ;;   Used to modify the syntax of some characters.  It is used to
> -;;   mark the "<" and ">" of angle bracket parens with paren syntax, and
> -;;   to "hide" obtrusive characters in preprocessor lines.
> +;;   mark the "<" and ">" of angle bracket parens with paren syntax, to
> +;;   "hide" obtrusive characters in preprocessor lines, and to mark C++
> +;;   raw strings to enable their fontification.
> ;;
> ;;   This property is used on single characters and is therefore
> ;;   always treated as front and rear nonsticky (or start and end open
> @@ -2295,7 +2296,8 @@
>   ;;     (STATE TYPE (BEG . END))     if TO is in a literal; or
>   ;;     (STATE)                      otherwise,
>   ;; where STATE is the parsing state at TO, TYPE is the type of the literal
> -  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal.
> +  ;; (one of 'c, 'c++, 'string) and (BEG . END) is the boundaries of the literal,
> +  ;; including the delimiters.
>   ;;
>   ;; Unless NOT-IN-DELIMITER is non-nil, when TO is inside a two-character
>   ;; comment opener, this is recognized as being in a comment literal.
> @@ -5673,6 +5675,9 @@
> ;; Set by c-common-init in cc-mode.el.
> (defvar c-new-BEG)
> (defvar c-new-END)
> +;; Set by c-after-change in cc-mode.el.
> +(defvar c-old-BEG)
> +(defvar c-old-END)
> 
> (defun c-before-change-check-<>-operators (beg end)
>   ;; Unmark certain pairs of "< .... >" which are currently marked as
> @@ -5793,6 +5798,329 @@
> 				       'c-decl-arg-start)))))))
>       (or (c-forward-<>-arglist nil)
> 	  (forward-char)))))
> +
> +\f
> +;; Functions to handle C++ raw strings.
> +;;
> +;; A valid C++ raw string looks like
> +;;     R"<id>(<contents>)<id>"
> +;; , where <id> is an identifier from 0 to 16 characters long, not containing
> +;; spaces, control characters, double quote or left/right paren.  <contents>
> +;; can include anything which isn't the terminating )<id>", including new
> +;; lines, "s, parentheses, etc.
> +;;
> +;; CC Mode handles C++ raw strings by the use of `syntax-table' text
> +;; properties as follows:
> +;;
> +;; (i) On a validly terminated raw string, no `syntax-table' text properties
> +;;   are applied to the opening and closing delimiters, but any " in the
> +;;   contents is given the property value "punctuation" (`(1)') to prevent it
> +;;   interacting with the "s in the delimiters.
> +;;
> +;;   The font locking routine `c-font-lock-c++-raw-strings' (in cc-fonts.el)
> +;;   recognizes valid raw strings, and fontifies the delimiters (apart from
> +;;   the parentheses) with the default face and the parentheses and the
> +;;   <contents> with font-lock-string-face.
> +;;
> +;; (ii) A valid, but unterminated, raw string opening delimiter gets the
> +;;   "punctuation" value (`(1)') of the `syntax-table' text property, and the
> +;;   open parenthesis gets the "string fence" value (`(15)').
> +;;
> +;;   `c-font-lock-c++-raw-strings' puts c-font-lock-warning-face on the entire
> +;;   unmatched opening delimiter (from the R up to the open paren), and allows
> +;;   the rest of the buffer to get font-lock-string-face, caused by the
> +;;   unmatched "string fence" `syntax-table' text property value.
> +;;
> +;; (iii) Inside a macro, a valid raw string is handled as in (i).  An
> +;;   unmatched opening delimiter is handled slightly differently.  In addition
> +;;   to the "punctuation" and "string fence" properties on the delimiter,
> +;;   another "string fence" `syntax-table' property is applied to the last
> +;;   possible character of the macro before the terminating linefeed (if there
> +;;   is such a character after the "(").  This "last possible" character is
> +;;   never a backslash escaping the end of line.  If the character preceding
> +;;   this "last possible" character is itself a backslash, this preceding
> +;;   character gets a "punctuation" `syntax-table' value.  If the "(" is
> +;;   already at the end of the macro, it gets the "punctuaion" value, and no
> +;;   "string fence"s are used.
> +;;
> +;;   The effect on the fontification of either of these tactics is that rest of
> +;;   the macro (if any) after the "(" gets font-lock-string-face, but the rest
> +;;   of the file is fontified normally.
> +
> +
> +(defun c-raw-string-pos ()
> +  ;; Get POINT's relationship to any containing raw string.
> +  ;; If point isn't in a raw string, return nil.
> +  ;; Otherwise, return the following list:
> +  ;;
> +  ;;   (POS B\" B\( E\) E\")
> +  ;;
> +  ;; , where POS is the symbol `open-delim' if point is in the opening
> +  ;; delimiter, the symbol `close-delim' if it's in the closing delimiter, and
> +  ;; nil if it's in the string body.  B\", B\(, E\), E\" are the positions of
> +  ;; the opening and closing quotes and parentheses of a correctly terminated
> +  ;; raw string.  (N.B.: E\) and E\" are NOT on the "outside" of these
> +  ;; characters.)  If the raw string is not terminated, E\) and E\" are set to
> +  ;; nil.
> +  ;;
> +  ;; Note: this routine is dependant upon the correct syntax-table text
> +  ;; properties being set.
> +  (let* ((safe (c-state-semi-safe-place (point)))
> +	 (state (c-state-pp-to-literal safe (point)))
> +	 open-quote-pos open-paren-pos close-paren-pos close-quote-pos id)
> +    (save-excursion
> +      (when
> +	  (and
> +	   (cond
> +	    ((null (cadr state))
> +	     (or (eq (char-after) ?\")
> +		 (search-backward "\"" (max (- (point) 17) (point-min)) t)))
> +	    ((and (eq (cadr state) 'string)
> +		  (goto-char (car (nth 2 state)))
> +		  (or (eq (char-after) ?\")
> +		      (search-backward "\"" (max (- (point) 17) (point-min)) t))
> +		  (not (bobp)))))
> +	   (eq (char-before) ?R)
> +	   (looking-at "\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("))
> +	(setq open-quote-pos (point)
> +	      open-paren-pos (match-end 1)
> +	      id (match-string-no-properties 1))
> +	(goto-char (1+ open-paren-pos))
> +	(when (and (not (c-get-char-property open-paren-pos 'syntax-table))
> +		   (search-forward (concat ")" id "\"") nil t))
> +	  (setq close-paren-pos (match-beginning 0)
> +		close-quote-pos (1- (point))))))
> +    (and open-quote-pos
> +	 (list
> +	  (cond
> +	   ((<= (point) open-paren-pos)
> +	    'open-delim)
> +	   ((and close-paren-pos
> +		 (> (point) close-paren-pos))
> +	    'close-delim)
> +	   (t nil))
> +	  open-quote-pos open-paren-pos close-paren-pos close-quote-pos))))
> +
> +(defun c-depropertize-raw-string (id open-quote open-paren bound)
> +  ;; Point is immediately after a raw string opening delimiter.  Remove any
> +  ;; `syntax-table' text properties associated with the delimiter (if its
> +  ;; unmatched) or the raw string.
> +  ;;
> +  ;; ID, a string, is the delimiter's identifier.  OPEN-QUOTE and OPEN-PAREN
> +  ;; are the buffer positions of the delimiter's components.  BOUND is the
> +  ;; bound for searching for a matching closing delimiter; it is usually nil,
> +  ;; but if we're inside a macro, it's the end of the macro.
> +  ;;
> +  ;; Point is moved to after the (terminated) raw string, or left after the
> +  ;; unmatched opening delimiter, as the case may be.  The return value is of
> +  ;; no significance.
> +  (let ((open-paren-prop (c-get-char-property open-paren 'syntax-table)))
> +    (cond
> +     ((null open-paren-prop)
> +      ;; A terminated raw string
> +      (if (search-forward (concat ")" id "\"") nil t)
> +	  (c-clear-char-property-with-value
> +	   (1+ open-paren) (match-beginning 0) 'syntax-table '(1))))
> +     ((or (and (equal open-paren-prop '(15)) (null bound))
> +	  (equal open-paren-prop '(1)))
> +      ;; An unterminated raw string either not in a macro, or in a macro with
> +      ;; the open parenthesis right up against the end of macro
> +      (c-clear-char-property open-quote 'syntax-table)
> +      (c-clear-char-property open-paren 'syntax-table))
> +     (t
> +      ;; An unterminated string in a macro, with at least one char after the
> +      ;; open paren
> +      (c-clear-char-property open-quote 'syntax-table)
> +      (c-clear-char-property open-paren 'syntax-table)
> +      (let ((string-fence-pos
> +	     (save-excursion
> +	       (goto-char (1+ open-paren))
> +	       (c-search-forward-char-property 'syntax-table '(15) bound))))
> +	(when string-fence-pos
> +	  (c-clear-char-property string-fence-pos 'syntax-table)))
> +      ))))
> +
> +(defun c-depropertize-raw-strings-in-region (start finish)
> +  ;; Remove any `syntax-table' text properties associated with C++ raw strings
> +  ;; contained in the region (START FINISH).  Point is undefined at entry and
> +  ;; exit, and the return value has no significance.
> +  (goto-char start)
> +  (while (and (< (point) finish)
> +	      (re-search-forward
> +	       (concat "\\("				     ; 1
> +		       c-anchored-cpp-prefix		     ; 2
> +		       "\\)\\|\\("			     ; 3
> +		       "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("   ; 4
> +		       "\\)")
> +	       finish t))
> +    (when (save-excursion
> +	    (goto-char (match-beginning 0)) (not (c-in-literal)))
> +      (if (match-beginning 4)		; the id
> +	  ;; We've found a raw string
> +	  (c-depropertize-raw-string
> +	   (match-string-no-properties 4) ; id
> +	   (1+ (match-beginning 3))	  ; open quote
> +	   (match-end 4)		  ; open paren
> +	   nil)				  ; bound
> +	;; We've found a CPP construct.  Search for raw strings within it.
> +	(goto-char (match-beginning 2)) ; the "#"
> +	(c-end-of-macro)
> +	(let ((eom (point)))
> +	  (goto-char (match-end 2))	; after the "#".
> +	  (while (and (< (point) eom)
> +		      (c-syntactic-re-search-forward
> +		       "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" eom t))
> +	    (c-depropertize-raw-string
> +	     (match-string-no-properties 1) ; id
> +	     (1+ (match-beginning 0))	    ; open quote
> +	     (match-end 1)		    ; open paren
> +	     eom)))))))			    ; bound.
> +
> +(defun c-before-change-check-c++-raw-strings (beg end)
> +  ;; This function clears `syntax-table' text properties from C++ raw strings
> +  ;; in the region (c-new-BEG c-new-END).  BEG and END are the standard
> +  ;; arguments supplied to any before-change function.
> +  ;;
> +  ;; Point is undefined on both entry and exit, and the return value has no
> +  ;; significance.
> +  ;;
> +  ;; This function is called as a before-change function solely due to its
> +  ;; membership of the C++ value of `c-get-state-before-change-functions'.
> +  (c-save-buffer-state
> +      ((beg-rs (progn (goto-char beg) (c-raw-string-pos)))
> +       (beg-plus (if (null beg-rs)
> +		     beg
> +		   (max beg
> +			(1+ (or (nth 4 beg-rs) (nth 2 beg-rs))))))
> +       (end-rs (progn (goto-char end) (c-raw-string-pos))) ; FIXME!!!
> +					; Optimize this so that we don't call
> +					; `c-raw-string-pos' twice when once
> +					; will do.  (2016-06-02).
> +       (end-minus (if (null end-rs)
> +		      end
> +		    (min end (cadr end-rs))))
> +       )
> +    (when beg-rs
> +      (setq c-new-BEG (min c-new-BEG (1- (cadr beg-rs)))))
> +    (c-depropertize-raw-strings-in-region c-new-BEG beg-plus)
> +
> +    (when end-rs
> +      (setq c-new-END (max c-new-END
> +			   (1+ (or (nth 4 end-rs)
> +				   (nth 2 end-rs))))))
> +    (c-depropertize-raw-strings-in-region end-minus c-new-END)))
> +
> +(defun c-propertize-raw-string-opener (id open-quote open-paren bound)
> +  ;; Point is immediately after a raw string opening delimiter.  Apply any
> +  ;; pertinent `syntax-table' text properties to the delimiter and also the
> +  ;; raw string, should there be a valid matching closing delimiter.
> +  ;;
> +  ;; ID, a string, is the delimiter's identifier.  OPEN-QUOTE and OPEN-PAREN
> +  ;; are the buffer positions of the delimiter's components.  BOUND is the
> +  ;; bound for searching for a matching closing delimiter; it is usually nil,
> +  ;; but if we're inside a macro, it's the end of the macro.
> +  ;;
> +  ;; Point is moved to after the (terminated) raw string, or left after the
> +  ;; unmatched opening delimiter, as the case may be.  The return value is of
> +  ;; no significance.
> +  (if (search-forward (concat ")" id "\"") bound t)
> +      (let ((end-string (match-beginning 0))
> +	    (after-quote (match-end 0))
> +	    )
> +	(goto-char open-paren)
> +	(while (progn (skip-syntax-forward "^\"" end-string)
> +		      (< (point) end-string))
> +	  (c-put-char-property (point) 'syntax-table '(1)) ; punctuation
> +	  (forward-char))
> +	(goto-char after-quote))
> +    (c-put-char-property open-quote 'syntax-table '(1))	     ; punctuation
> +    (c-put-char-property open-paren 'syntax-table '(15))     ; generic string
> +    (when bound
> +      ;; In a CPP construct, we try to apply a generic-string `syntax-table'
> +      ;; text property to the last possible character in the string, so that
> +      ;; only characters within the macro get "stringed out".
> +      (goto-char bound)
> +      (if (save-restriction
> +	    (narrow-to-region (1+ open-paren) (point-max))
> +	    (re-search-backward
> +	     (eval-when-compile
> +	       (concat "\\("		; 1
> +		       "\\(\\`[^\\]?\\|[^\\][^\\]\\)\\(\\\\\\(.\\|\n\\)\\)*" ; 2-4
> +		       "\\(\\\\.\\)"	; 5
> +		       "\\|"
> +		       "\\(\\`\\|[^\\]\\|\\(\\`[^\\]?\\|[^\\][^\\]\\)\\(\\\\\\(.\\|\n\\)\\)+\\)" ; 6-9
> +		       "\\([^\\]\\)"	; 10
> +		       "\\)"
> +		       "\\(\\\\\n\\)*\\=")) ; 11
> +             (1+ open-paren) t))
> +	  (if (match-beginning 10)
> +	      (c-put-char-property (match-beginning 10) 'syntax-table '(15))
> +	    (c-put-char-property (match-beginning 5) 'syntax-table '(1))
> +	    (c-put-char-property (1+ (match-beginning 5)) 'syntax-table '(15)))
> +	(c-put-char-property open-paren 'syntax-table '(1)))
> +
> +								; )
> +      (goto-char bound))))
> +
> +(defun c-after-change-re-mark-raw-strings (beg end old-len)
> +  ;; This function applies `syntax-table' text properties to C++ raw strings
> +  ;; beginning in the region (c-new-BEG c-new-END).  BEG, END, and OLD-LEN are
> +  ;; the standard arguments supplied to any after-change function.
> +  ;;
> +  ;; Point is undefined on both entry and exit, and the return value has no
> +  ;; significance.
> +  ;; 
> +  ;; This function is called as an after-change function solely due to its
> +  ;; membership of the C++ value of `c-before-font-lock-functions'.
> +  (c-save-buffer-state ()
> +    ;; If the region (c-new-BEG c-new-END) has expanded, remove
> +    ;; `syntax-table' text-properties from the new piece(s).
> +    (when (< c-new-BEG c-old-BEG)
> +      (let ((beg-rs (progn (goto-char c-old-BEG) (c-raw-string-pos))))
> +	(c-depropertize-raw-strings-in-region
> +	 c-new-BEG
> +	 (if beg-rs
> +	     (1+ (or (nth 4 beg-rs) (nth 2 beg-rs)))
> +	   c-old-BEG))))
> +    (when (> c-new-END c-old-END)
> +      (let ((end-rs (progn (goto-char c-old-END) (c-raw-string-pos))))
> +	(c-depropertize-raw-strings-in-region
> +	 (if end-rs
> +	     (cadr end-rs)
> +	   c-old-END)
> +	 c-new-END)))
> +    (goto-char c-new-BEG)
> +    (while (and (< (point) c-new-END)
> +		(re-search-forward
> +		 (concat "\\("				       ; 1
> +			 c-anchored-cpp-prefix		       ; 2
> +			 "\\)\\|\\("			       ; 3
> +			 "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)("   ; 4
> +			 "\\)")
> +		 c-new-END t))
> +      (when (save-excursion
> +	      (goto-char (match-beginning 0)) (not (c-in-literal)))
> +	(if (match-beginning 4)		; the id
> +	    ;; We've found a raw string.
> +	    (c-propertize-raw-string-opener
> +	     (match-string-no-properties 4) ; id
> +	     (1+ (match-beginning 3))	    ; open quote
> +	     (match-end 4)		    ; open paren
> +	     nil)			    ; bound
> +	  ;; We've found a CPP construct.  Search for raw strings within it.
> +	  (goto-char (match-beginning 2)) ; the "#"
> +	  (c-end-of-macro)
> +	  (let ((eom (point)))
> +	    (goto-char (match-end 2))	; after the "#".
> +	    (while (and (< (point) eom)
> +			(c-syntactic-re-search-forward
> +			 "R\"\\([^ ()\\\n\r\t]\\{,16\\}\\)(" eom t))
> +	      (c-propertize-raw-string-opener
> +	       (match-string-no-properties 1) ; id
> +	       (1+ (match-beginning 0))	      ; open quote
> +	       (match-end 1)		      ; open paren
> +	       eom))))))))		      ; bound
> +
> 
> ;; Handling of small scale constructs like types and names.
> 
> diff -r d83a74c6ec31 cc-fonts.el
> --- a/cc-fonts.el	Sun May 29 11:59:26 2016 +0000
> +++ b/cc-fonts.el	Mon Jun 06 16:15:05 2016 +0000
> @@ -717,6 +717,10 @@
> 	(concat ".\\(" c-string-limit-regexp "\\)")
> 	'((c-font-lock-invalid-string)))
> 
> +      ;; Fontify C++ raw strings.
> +      ,@(when (c-major-mode-is 'c++-mode)
> +	  '(c-font-lock-c++-raw-strings))
> +
>       ;; Fontify keyword constants.
>       ,@(when (c-lang-const c-constant-kwds)
> 	  (let ((re (c-make-keywords-re nil (c-lang-const c-constant-kwds))))
> @@ -1572,6 +1576,44 @@
> 	    (c-forward-syntactic-ws)
> 	    (c-font-lock-declarators limit t in-typedef)))))))
> 
> +(defun c-font-lock-c++-raw-strings (limit)
> +  ;; Fontify C++ raw strings.
> +  ;;
> +  ;; This function will be called from font-lock for a region bounded by POINT
> +  ;; and LIMIT, as though it were to identify a keyword for
> +  ;; font-lock-keyword-face.  It always returns NIL to inhibit this and
> +  ;; prevent a repeat invocation.  See elisp/lispref page "Search-based
> +  ;; Fontification".
> +  (while (search-forward-regexp
> +	  "R\\(\"\\)\\([^ ()\\\n\r\t]\\{,16\\}\\)(" limit t)
> +    (when ;; (eq (c-get-char-property (1- (point)) 'face)
> +	;;     'font-lock-string-face)
> +	(or (and (eobp)
> +		 (eq (c-get-char-property (1- (point)) 'face)
> +		     'font-lock-warning-face))
> +	    (eq (c-get-char-property (point) 'face) 'font-lock-string-face)
> +	    (and (equal (c-get-char-property (match-end 2) 'syntax-table) '(1))
> +		 (equal (c-get-char-property (match-beginning 1) 'syntax-table)
> +			'(1))))
> +      (let ((paren-prop (c-get-char-property (1- (point)) 'syntax-table)))
> +	(if paren-prop
> +	    (progn
> +	      (c-put-font-lock-face (match-beginning 0) (match-end 0)
> +				    'font-lock-warning-face)
> +	      (when
> +		  (and
> +		   (equal paren-prop '(15))
> +		   (not (c-search-forward-char-property 'syntax-table '(15) limit)))
> +		(goto-char limit)))
> +	  (c-put-font-lock-face (match-beginning 1) (match-end 2) 'default)
> +	  (when (search-forward-regexp
> +		 (concat ")\\(" (regexp-quote (match-string-no-properties 2))
> +			 "\\)\"")
> +		 limit t)
> +	    (c-put-font-lock-face (match-beginning 1) (point)
> +				  'default))))))
> +  nil)
> +
> (c-lang-defconst c-simple-decl-matchers
>   "Simple font lock matchers for types and declarations.  These are used
> on level 2 only and so aren't combined with `c-complex-decl-matchers'."
> diff -r d83a74c6ec31 cc-langs.el
> --- a/cc-langs.el	Sun May 29 11:59:26 2016 +0000
> +++ b/cc-langs.el	Mon Jun 06 16:15:05 2016 +0000
> @@ -457,9 +457,12 @@
>   ;; The value here may be a list of functions or a single function.
>   t nil
>   c++ '(c-extend-region-for-CPP
> +	c-before-change-check-c++-raw-strings
> 	c-before-change-check-<>-operators
> +	c-depropertize-CPP
> 	c-invalidate-macro-cache)
>   (c objc) '(c-extend-region-for-CPP
> +	     c-depropertize-CPP
> 	     c-invalidate-macro-cache)
>   ;; java 'c-before-change-check-<>-operators
>   awk 'c-awk-record-region-clear-NL)
> @@ -493,6 +496,7 @@
> 	     c-neutralize-syntax-in-and-mark-CPP
> 	     c-change-expand-fl-region)
>   c++ '(c-extend-font-lock-region-for-macros
> +	c-after-change-re-mark-raw-strings
> 	c-neutralize-syntax-in-and-mark-CPP
> 	c-restore-<>-properties
> 	c-change-expand-fl-region)
> diff -r d83a74c6ec31 cc-mode.el
> --- a/cc-mode.el	Sun May 29 11:59:26 2016 +0000
> +++ b/cc-mode.el	Mon Jun 06 16:15:05 2016 +0000
> @@ -649,6 +649,14 @@
> (make-variable-buffer-local 'c-new-BEG)
> (defvar c-new-END 0)
> (make-variable-buffer-local 'c-new-END)
> +;; The following two variables record the values of `c-new-BEG' and
> +;; `c-new-END' just after `c-new-END' has been adjusted for the length of text
> +;; inserted or removed.  They may be read by any after-change function (but
> +;; should not be altered by one).
> +(defvar c-old-BEG 0)
> +(make-variable-buffer-local 'c-old-BEG)
> +(defvar c-old-END 0)
> +(make-variable-buffer-local 'c-old-END)
> 
> (defun c-common-init (&optional mode)
>   "Common initialization for all CC Mode modes.
> @@ -859,6 +867,31 @@
>   (memq (cadr (backtrace-frame 3))
> 	'(put-text-property remove-list-of-text-properties)))
> 
> +(defun c-depropertize-CPP (beg end)
> +  ;; Remove the punctuation syntax-table text property from the CPP parts of
> +  ;; (c-new-BEG c-new-END).
> +  ;;
> +  ;; This function is in the C/C++/ObjC values of
> +  ;; `c-get-state-before-change-functions' and is called exclusively as a
> +  ;; before change function.
> +  (goto-char c-new-BEG)
> +  (while (and (< (point) beg)
> +	      (search-forward-regexp c-anchored-cpp-prefix beg t))
> +    (goto-char (match-beginning 1))
> +    (let ((m-beg (point)))
> +      (c-end-of-macro)
> +      (c-clear-char-property-with-value
> +       m-beg (min (point) beg) 'syntax-table '(1))))
> +
> +  (goto-char end)
> +  (while (and (< (point) c-new-END)
> +	      (search-forward-regexp c-anchored-cpp-prefix c-new-END t))
> +    (goto-char (match-beginning 1))
> +    (let ((m-beg (point)))
> +      (c-end-of-macro)
> +      (c-clear-char-property-with-value
> +       m-beg (min (point) c-new-END) 'syntax-table '(1)))))
> +
> (defun c-extend-region-for-CPP (beg end)
>   ;; Adjust `c-new-BEG', `c-new-END' respectively to the beginning and end of
>   ;; any preprocessor construct they may be in. 
> @@ -949,9 +982,9 @@
>   ;; Note: SPEED _MATTERS_ IN THIS FUNCTION!!!
>   ;;
>   ;; This function might make hidden buffer changes.
> -  (c-save-buffer-state (limits )
> +  (c-save-buffer-state (limits)
>     ;; Clear 'syntax-table properties "punctuation":
> -    (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
> +    ;; (c-clear-char-property-with-value c-new-BEG c-new-END 'syntax-table '(1))
> 
>     ;; CPP "comment" markers:
>     (if (memq 'category-properties c-emacs-features) ; GNU Emacs.
> @@ -1101,8 +1134,8 @@
> 
>   ;; (c-new-BEG c-new-END) will be the region to fontify.  It may become
>   ;; larger than (beg end).
> -  ;; (setq c-new-BEG beg  c-new-END end)
>   (setq c-new-END (- (+ c-new-END (- end beg)) old-len))
> +  (setq c-old-BEG c-new-BEG  c-old-END c-new-END)
> 
>   (unless (c-called-from-text-property-change-p)
>     (setq c-just-done-before-change nil)
> 
> 
> 
>>> -Ivan
> 
> -- 
> Alan Mackenzie (Nuremberg, Germany).
> 
> <raw-string.cc>






^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-06-09  1:38                     ` Ivan Andrus
@ 2016-06-09 15:04                       ` Alan Mackenzie
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-09 15:04 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: 15212

Hello, Ivan.

On Wed, Jun 08, 2016 at 07:38:01PM -0600, Ivan Andrus wrote:
> I have been in a lot of meetings ....

My sympathies.  ;-)

> .... so I haven’t given it the normal usage treatment, but I did spend
> some time trying to break it. 

> You may know about this, but if you comment the first line of a
> multi-line string, the closing quote will still match (for the purposes
> of backward-sexp) with the commented opening quote.  The parentheses on
> the other hand don't match (which is what I would expect).  Personally,
> I think it’s a small price to pay.

For what it's worth, on just an ordinary two line string (with an escaped
EOL), if the first line is commented out (and the backslash deleted), the
two quote marks also match eachother for C-M-f and C-M-b.  So I agree
with you, this isn't such a big problem in raw strings.

> Thanks again for your work on this.  I’ll keep testing it, but as far
> as I’m concerned you can push it as soon as you want.

Thanks for the testing.  I've now pushed the changes (to both savannah
master and CC Mode's c++11-0-1 branch), but if anything more comes up, I
can correct it.

> -Ivan

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

* bug#15212: 24.3.50; c++-mode doesn't support raw string literals
  2016-03-30  3:14 ` Ivan Andrus
  2016-04-03 18:36   ` Alan Mackenzie
@ 2016-06-09 15:06   ` Alan Mackenzie
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Mackenzie @ 2016-06-09 15:06 UTC (permalink / raw)
  To: 15212-done

Bug fixed (i.e. feature implemented) in the master branch.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-06-09 15:06 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-29 21:00 bug#15212: 24.3.50; c++-mode doesn't support raw string literals Ivan Andrus
2016-03-30  3:14 ` Ivan Andrus
2016-04-03 18:36   ` Alan Mackenzie
2016-05-24 17:12     ` Ivan Andrus
2016-05-28 14:40       ` Alan Mackenzie
2016-05-29 21:36         ` Alan Mackenzie
2016-05-31 14:22           ` Ivan Andrus
2016-05-31 21:32             ` Alan Mackenzie
2016-05-31 23:52               ` Michael Welsh Duggan
2016-06-02 16:36                 ` Alan Mackenzie
2016-05-31 22:21             ` Alan Mackenzie
2016-06-01  5:21               ` Ivan Andrus
2016-06-02 16:07                 ` Alan Mackenzie
     [not found]                 ` <20160602160741.GC4067@acm.fritz.box>
2016-06-06 16:32                   ` Alan Mackenzie
     [not found]                   ` <20160606163203.GA19322@acm.fritz.box>
2016-06-07 22:06                     ` Michael Welsh Duggan
2016-06-07 22:21                       ` Alan Mackenzie
2016-06-09  1:38                     ` Ivan Andrus
2016-06-09 15:04                       ` Alan Mackenzie
2016-06-09 15:06   ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).