After writing my original email I thought about something a bit different, and I managed (with suggestions and help from Anders Lindgren) to write a convincing (to me :) proof of concept.  The idea is to use a separate buffer to do the fontification.  I've attached the code; after loading it, it's enough to run

    (font-lock-add-keywords nil '(("^ *>>> \\(.*\\)" (0 (indirect-font-lock-highlighter 1 'python-mode)))))

Stefan (and emacs-devel!), do you think I should add this to ELPA?  Are there downsides I should be aware of?

Cheers,
Clément.

On 2016-10-15 11:19, Clément Pit--Claudel wrote:
> Hi emacs-devel,
> 
> Some languages have a way to quote code in comments.  Some examples:
> 
> * Python
> 
>     def example(foo, *bars):
>         """Foo some bars"""
> 
>         >>> example(1,
>         ...         2,
>         ...         3)
>         3
> 
>         >>> example(4, 8)
>         67
>         """
> 
> * Coq
> 
>     Definition example foo bars :=
>         (* [example foo bars] uses [foo] to foo some [bars].  For example:
>            <<
>              Compute (example 1 [2, 3]).
>              (* 3 *)
>            >> *)
> 
> In Python, ‘>>>’ indicates a doctest (a small bit of example code).  In Coq, ‘[…]’ and ‘<<…>>’ serve as markers (inside of comments) of single-line (resp multi-line) code snippets.  At the moment, Emacs doesn't highlight these snippets.  I originally asked about this in http://emacs.stackexchange.com/questions/19998/code-blocks-in-font-lock-comments , but received no answers.
> 
> There are multiple currently-available workarounds, but none of them that I know of are satisfactory:
> 
> * Duplicate all font-lock rules, creating anchored matchers that recognize code in comments.  The duplication is very unpleasant, and it will require adding ‘prepend’ to a bunch of font-lock rules, which will break some of them.
> 
> * Use a custom syntax-propertize-function to recognize these code snippets and escape out of strings.  This has some potential, but it confuses existing tools.  For example, in Python, one can do the following; it works fine for ‘>>>’ in comments, but in strings it seems to break eldoc, among others:
> 
>     syntax-ppss()
>     python-util-forward-comment(1)
>     python-nav-end-of-defun()
>     python-info-current-defun()
>     (let ((current-defun (python-info-current-defun))) (if current-defun (progn (format "In: %s()" current-defun))))
> 
>     (defconst litpy--doctest-re
>       "^#*\\s-*\\(>>>\\|\\.\\.\\.\\)\\s-*\\(.+\\)$"
>       "Regexp matching doctests.")
> 
>     (defun litpy--syntax-propertize-function (start end)
>       "Mark doctests in START..END."
>       (goto-char start)
>       (while (re-search-forward litpy--doctest-re end t)
>         (let* ((old-syntax (save-excursion (syntax-ppss (match-beginning 1))))
>                (in-docstring-p (eq (nth 3 old-syntax) t))
>                (in-comment-p (eq (nth 4 old-syntax) t))
>                (closing-syntax (cond (in-docstring-p "|") (in-comment-p ">")))
>                (reopening-syntax (cond (in-docstring-p "|") (in-comment-p "<")))
>                (reopening-char (char-after (match-end 2)))
>                (no-reopen (eq (and reopening-char (char-syntax reopening-char))
>                               (cond (in-comment-p ?>)))))
>           (when closing-syntax
>             (put-text-property (1- (match-end 1)) (match-end 1)
>                                'syntax-table (string-to-syntax closing-syntax))
>             (when (and reopening-char (not no-reopen))
>               (put-text-property (match-end 2) (1+ (match-end 2))
>                                  'syntax-table (string-to-syntax reopening-syntax)))))))
> 
> 
> Maybe the second approach can be made to more-or-less work for Python, despite the issue above — I'm not entirely sure.  The idea there is to detect chunks of code, and mark their starting and ending characters in a way that escapes from the surrounding comment or string.
> 
> But this doesn't solve the problem for Coq, for example, because it confuses comment-forward and the like.  Some coq tools depend on Emacs to identify comments and skip over them when running a file (code is sent bit by bit, so if ‘(* foo [some code here] bar *)’ is annotated with syntax properties to make Emacs think that it should be understood as ‘(* foo *) some code here (* bar *)’, then Proof General (a Coq IDE based on Emacs) won't realize that “some code here” is part of a comment, and things will break.
> 
> I'm not sure what the right approach is.  I guess there are two approaches:
> 
> * Mark embedded code in comments as actual code using syntax-propertize-function, and add a way for tools to detect this "code but not really code" situation.  Pros: things like company, eldoc, prettify-symbols-mode, etc. will work in embedded code comments without having to opt them in.  Cons: some things will break, and will need to be fixed (comment-forward, Proof General, Elpy, indentation functions…).
> 
> * Add new "code block starter"/"code-block-ender" syntax classes?  Then font-lock would know that it has to highlight these.  Pros: few things would break.  Cons: Tools would have to be opted-in (company-mode, eldoc, prettify-symbols-mode, …).
> 
> Am I missing another obvious solution?  Has this topic been discussed before?
> 
> Cheers,
> Clément.
> 
>