* Re: Font-lock of comments using comment tokens, does it work? [not found] <mailman.4238.1433357678.904.help-gnu-emacs@gnu.org> @ 2015-06-03 19:11 ` Stefan Monnier 2015-06-03 23:37 ` Björn Lindqvist [not found] ` <mailman.4244.1433374628.904.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 11+ messages in thread From: Stefan Monnier @ 2015-06-03 19:11 UTC (permalink / raw) To: help-gnu-emacs > I have a really complicated font-locking problem I'm trying to solve > for a major mode. It's like I've tried everything but nothing > works. Here is the question I asked on Stack Overflow and got some > help with but it didn't go all the way: The answer there gives you the technique to use. AFAICT you only need to adjust the regexp he used in exmark-syntax-propertize. You'll want to read about syntax-tables in the Elisp manual to understand what the "_" means in that function (for a quick refresher on which char means what, C-h f modify-syntax-entry RET is what I use, but you first need to read the Elisp reference to really understand how that works). Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-03 19:11 ` Font-lock of comments using comment tokens, does it work? Stefan Monnier @ 2015-06-03 23:37 ` Björn Lindqvist [not found] ` <mailman.4244.1433374628.904.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 11+ messages in thread From: Björn Lindqvist @ 2015-06-03 23:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs 2015-06-03 21:11 GMT+02:00 Stefan Monnier <monnier@iro.umontreal.ca>: >> I have a really complicated font-locking problem I'm trying to solve >> for a major mode. It's like I've tried everything but nothing >> works. Here is the question I asked on Stack Overflow and got some >> help with but it didn't go all the way: > > The answer there gives you the technique to use. AFAICT you only need > to adjust the regexp he used in exmark-syntax-propertize. > > You'll want to read about syntax-tables in the Elisp manual to > understand what the "_" means in that function (for a quick > refresher on which char means what, C-h f modify-syntax-entry RET is > what I use, but you first need to read the Elisp reference to really > understand how that works). If you think you know what it should be changed to, can you tell me? I've tried a dozen different permutations of the regexp and none of them produces the desired result. I've also read the syntactic font-lock and syntax table sections of the manual several times and I still don't get it. (sorry for the double mail) -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <mailman.4244.1433374628.904.help-gnu-emacs@gnu.org>]
* Re: Font-lock of comments using comment tokens, does it work? [not found] ` <mailman.4244.1433374628.904.help-gnu-emacs@gnu.org> @ 2015-06-04 3:42 ` Stefan Monnier 2015-06-04 11:10 ` Björn Lindqvist [not found] ` <mailman.4276.1433416248.904.help-gnu-emacs@gnu.org> 0 siblings, 2 replies; 11+ messages in thread From: Stefan Monnier @ 2015-06-04 3:42 UTC (permalink / raw) To: help-gnu-emacs > If you think you know what it should be changed to, can you tell me? I don't know enough of the context to be sure. Also, as Emacs maintainer I have enough experience/knowledge to fix most users's problems, but if I do that I'll just end up with more users with new problems to fix. So instead I'm better off trying to train them so they can fix their problems themselves and even help me improve Emacs. > I've tried a dozen different permutations of the regexp and none of > them produces the desired result. What have you tried? What/where were the undesired results? > I've also read the syntactic font-lock and syntax table sections of > the manual several times and I still don't get it. So you've covered the basics, good. The thing you need to understand is that it all boils down to the "syntax" given to the "!" character. The default is set in the buffer-local syntax-table, and this default is adjusted by `syntax-table' text-properties which are applied via syntax-propertize. So you can always go to a "!" and then hit C-u C-x = to see what is the syntax of *this* particular "!" character, and whether that is the desired syntax. If it's not, then you can try M-: (re-search-forward "theregexp" nil t) to see if the pattern you used does match or doesn't match this char. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-04 3:42 ` Stefan Monnier @ 2015-06-04 11:10 ` Björn Lindqvist [not found] ` <mailman.4276.1433416248.904.help-gnu-emacs@gnu.org> 1 sibling, 0 replies; 11+ messages in thread From: Björn Lindqvist @ 2015-06-04 11:10 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs 2015-06-04 5:42 GMT+02:00 Stefan Monnier <monnier@iro.umontreal.ca>: >> If you think you know what it should be changed to, can you tell me? > > I don't know enough of the context to be sure. What extra context can I provide you with? If there is something in my problem description that is unclear I can try to explain it more precisely. > Also, as Emacs > maintainer I have enough experience/knowledge to fix most users's > problems, but if I do that I'll just end up with more users with new > problems to fix. So instead I'm better off trying to train them so they > can fix their problems themselves and even help me improve Emacs. > >> I've tried a dozen different permutations of the regexp and none of >> them produces the desired result. > > What have you tried? What/where were the undesired results? ("[a-zA-Z0-9_]\\(! \\) " (1 "_"))) ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) ("\\(! \\)[a-zA-Z0-9_]" (1 "_"))) ("[a-zA-Z0-9_]\\(!\\) " (1 "_ "))) ("[a-zA-Z0-9_]\\(!\\) " (1 " _ "))) And so on. The undesired results were incorrect font-locking of comments and regular tokens being falsely identified as comment tokens. -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <mailman.4276.1433416248.904.help-gnu-emacs@gnu.org>]
* Re: Font-lock of comments using comment tokens, does it work? [not found] ` <mailman.4276.1433416248.904.help-gnu-emacs@gnu.org> @ 2015-06-04 22:11 ` Stefan Monnier 2015-06-05 3:29 ` Björn Lindqvist 0 siblings, 1 reply; 11+ messages in thread From: Stefan Monnier @ 2015-06-04 22:11 UTC (permalink / raw) To: help-gnu-emacs >> Also, as Emacs maintainer I have enough experience/knowledge to fix >> most users's problems, but if I do that I'll just end up with more >> users with new problems to fix. So instead I'm better off trying to >> train them so they can fix their problems themselves and even help me >> improve Emacs. >>> I've tried a dozen different permutations of the regexp and none of >>> them produces the desired result. >> What have you tried? What/where were the undesired results? > ("[a-zA-Z0-9_]\\(! \\) " (1 "_"))) IIUC you want all "!" that are surrounded by spaces to be treated as comment starters. And you've marked "!" as a comment starter by default (i.e. in the mode's syntax-table), so you need to mark all "!" which are not surrounded by spaces as being not-comment-starters. The above regexp does part of the work, but only does it for those "!" which are preceded by a latin letter or a number and are followed by a space. E.g. it will fail on those "!" which don't have a space afterwards. > ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) This one will fail on those "!" which are followed with a letter that's neither a space nor a latin letter nor a number. And it will fail on those "!" which are followed by a space but are not preceded by a space. To me, the translation into regexp of «all "!" which are not surrounded by spaces» would look like "[^ ]![^ ]". Have you tried something like that? Of course, it'll still probably require more tweaking because I suspect that «all "!" which are not surrounded by spaces» is not actually a precise description of all cases that matter. E.g. I suspect that if the "!" is preceded by a newline (i.e. is at the beginning of a line) it should still be considered a comment starter. Same thing if it's preceded by a TAB. Also it's likely that " !! " would also start a comment, so "followed by a space" is too strict as well. But then, I don't know if " !!a" would be treated as starting a comment. IOW, maybe you'll want something like "[^ \n\t]\\(!+\\)[^ \t\n]" instead. One more thing: if "! as a normal char" is more common than "! as a comment starter", it might be worthwhile to take the opposite approach and define the syntax of "!" in the mode's syntax-table as being "_" and then in syntax-propertize-function mark those "!" which start a comment as having syntax "<". Yet another thing: if you have trouble catching all cases with a single regexp, you can use more rules, as in (syntax-propertize-rules ("[a-zA-Z0-9_]\\(! \\) " (1 "_")) ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-04 22:11 ` Stefan Monnier @ 2015-06-05 3:29 ` Björn Lindqvist 2015-06-05 6:53 ` tomas 0 siblings, 1 reply; 11+ messages in thread From: Björn Lindqvist @ 2015-06-05 3:29 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs 2015-06-05 0:11 GMT+02:00 Stefan Monnier <monnier@iro.umontreal.ca>: >>> Also, as Emacs maintainer I have enough experience/knowledge to fix >>> most users's problems, but if I do that I'll just end up with more >>> users with new problems to fix. So instead I'm better off trying to >>> train them so they can fix their problems themselves and even help me >>> improve Emacs. >>>> I've tried a dozen different permutations of the regexp and none of >>>> them produces the desired result. >>> What have you tried? What/where were the undesired results? > >> ("[a-zA-Z0-9_]\\(! \\) " (1 "_"))) > > IIUC you want all "!" that are surrounded by spaces to be treated as > comment starters. No. I want two strings, FOO and BAR (or ! doesn't matter, same principle) to start comments iff they are separate tokens. Look at my examples if the definition isn't so precise. FOO written at the top of the buffer and followed by a newline would therefore start a comment. > The above regexp does part of the work, but only does it for those "!" > which are preceded by a latin letter or a number and are followed by > a space. E.g. it will fail on those "!" which don't have a space afterwards. > >> ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) > > This one will fail on those "!" which are followed with a letter that's > neither a space nor a latin letter nor a number. And it will fail on > those "!" which are followed by a space but are not preceded by a space. > > To me, the translation into regexp of «all "!" which are not surrounded > by spaces» would look like "[^ ]![^ ]". Have you tried something like > that? That turns the comment face of if the ! is in the middle, but not if it prefixes or suffixes the token. abcFOO is wrongly interpreted as a comment starter. > Also it's likely that " !! " would also start > a comment, so "followed by a space" is too strict as well. But then, > I don't know if " !!a" would be treated as starting a comment. > IOW, maybe you'll want something like "[^ \n\t]\\(!+\\)[^ > \t\n]" instead. No. In "!!" and "!!a" the comment token is not separate, so no comment. > Yet another thing: if you have trouble catching all cases with a single > regexp, you can use more rules, as in > > (syntax-propertize-rules > ("[a-zA-Z0-9_]\\(! \\) " (1 "_")) > ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) It still messes up the comment font-locking. BTW I've noticed that if the regexp is "test\\(!\\)" emacs correctly does not use comment face on "test!". But if it is "\\(!\\)test" then "!test" is still seen as a comment. That is inconsistent with what you have explained and the elisp manual. So I think it is a bug. -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-05 3:29 ` Björn Lindqvist @ 2015-06-05 6:53 ` tomas 2015-06-05 19:37 ` Björn Lindqvist 0 siblings, 1 reply; 11+ messages in thread From: tomas @ 2015-06-05 6:53 UTC (permalink / raw) To: Björn Lindqvist; +Cc: help-gnu-emacs, Stefan Monnier -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, Jun 05, 2015 at 05:29:36AM +0200, Björn Lindqvist wrote: > 2015-06-05 0:11 GMT+02:00 Stefan Monnier <monnier@iro.umontreal.ca>: [...] > No. I want two strings, FOO and BAR (or ! doesn't matter, same > principle) to start comments iff they are separate tokens. Sorry to jump in the middle. I've been lurking in case I could help (and to learn about font-lock). Björn: you are assuming that everyone knows what's a "token" to you. And you are assuming that everyone has the time to read and grasp all your examples, first time. I for one don't know what your tokens are. To put one extreme example, to C, the string 'a+b' are three tokens, '++a' are two; for Lisp, the first and the second example are both just *one* token. Given that, you can't expect Stefan to even come near a regular expression useful to you, since what they are doing is exactly *separate tokens*. > > To me, the translation into regexp of «all "!" which are not surrounded > > by spaces» would look like "[^ ]![^ ]". Have you tried something like > > that? > > That turns the comment face of if the ! is in the middle, but not if > it prefixes or suffixes the token. abcFOO is wrongly interpreted as a > comment starter. You mean when FOO is at the end of the line? Then no character would be there and the second '[^ ] wouldn't match? That's what Stefan said, you'll have to tweak this. Use an alternative '\|', something like "\(^\|[^ ]\)!\([^ ]\|$\)" (i.e. match at beginning-of-line- or-space, then "!" then space-or-end-of-line. You can use "\(?: ... \) if you want non-capturing groups. Watch out for those backslashes: you want to double them when writing them as an Elisp string [1]. > > Also it's likely that " !! " would also start > > a comment [...] > No. In "!!" and "!!a" the comment token is not separate, so no comment. See? we are all guessing at what your tokens are. Designing the nitty-gritties of your regexps and testing them can only be your work, because we'd be all fighting phantoms. > > Yet another thing: if you have trouble catching all cases with a single > > regexp, you can use more rules, as in > > > > (syntax-propertize-rules > > ("[a-zA-Z0-9_]\\(! \\) " (1 "_")) > > ("\\(!\\)[a-zA-Z0-9_]" (1 "_"))) > > It still messes up the comment font-locking. BTW I've noticed that if > the regexp is "test\\(!\\)" emacs correctly does not use comment face > on "test!". But if it is "\\(!\\)test" then "!test" is still seen as a > comment. That is inconsistent with what you have explained and the > elisp manual. So I think it is a bug. I don't understand you. What is this "test", where does it come from and what is it doing *in* the regular expression? (and in which one: in syntax-propertize rules, as in Stefan's example above, or somewhere else? Regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlVxR0wACgkQBcgs9XrR2kZXwwCdGe35a21eiwMqPmJ/xYmnjd5H RK8AniLaRZ3iKcGU1ah3uAPeJNpERABf =vTni -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-05 6:53 ` tomas @ 2015-06-05 19:37 ` Björn Lindqvist 2015-06-07 3:58 ` Björn Lindqvist 0 siblings, 1 reply; 11+ messages in thread From: Björn Lindqvist @ 2015-06-05 19:37 UTC (permalink / raw) To: tomas; +Cc: help-gnu-emacs, Stefan Monnier >> No. I want two strings, FOO and BAR (or ! doesn't matter, same >> principle) to start comments iff they are separate tokens. > > Sorry to jump in the middle. I've been lurking in case I could help > (and to learn about font-lock). > > Björn: you are assuming that everyone knows what's a "token" to you. > And you are assuming that everyone has the time to read and grasp > all your examples, first time. I for one don't know what your tokens > are. To put one extreme example, to C, the string 'a+b' are three > tokens, '++a' are two; for Lisp, the first and the second example are > both just *one* token. It's not necessary to know what the tokenization rules for my language are to help me. Though the rules are very simple, each whitespace-separated sequence of characters is one token. But if you can just come up with the required syntax-propertize-rules and syntax table setup to make the first for lines of my example highlight as comments and the last for *not* highlight as comments, I would be happy with that: random code ,FOO random comment stuff ,BAR comment "with stuff" ,BAR FOO BAR ,FOO FOObar random come BARFOO random code random code xyFOOzw random code "with string FOO " etc ... Here I've added a comma to show where the start of the comment-face should be. The comma is not present in the output. You can even take this skeleton mode I wrote and just figure out what to write inside the regexp: (defun mm-syntax-propertize (start end) (funcall (syntax-propertize-rules ("WHAT HERE?" (1 "_"))) start end)) (defvar mm-mode-syntax-table (let ((table (make-syntax-table prog-mode-syntax-table))) (modify-syntax-entry ?\n "> " table) (modify-syntax-entry ?! "< " table) table)) (define-derived-mode mm-mode prog-mode "Foo" (setq-local font-lock-defaults '(())) (setq-local syntax-propertize-function 'mm-syntax-propertize)) Here "!" is the comment token, but I'm also trying to get it work for N arbitrary comment tokens. Also I've worked on this problem for a long time (check the date on my stackoverflow posting) and tried dozens of different approaches. So spitballing guesses (I've tried all guesses so far and many permutations of them and none have worked) or telling me to rtfm again is pointless. -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-05 19:37 ` Björn Lindqvist @ 2015-06-07 3:58 ` Björn Lindqvist 2015-06-07 11:34 ` tomas 0 siblings, 1 reply; 11+ messages in thread From: Björn Lindqvist @ 2015-06-07 3:58 UTC (permalink / raw) To: tomas; +Cc: help-gnu-emacs, Stefan Monnier Ok so I finally almost figured it out. The key part was that you must invert the logic so that instead of "unmarking" in syntax-propertize-rules, you use a regexp that adds the comment starter property "<" to the matched strings. Something is bugged with the unmarking approach, it's like it stops looking when it found the comment character. Anyway: (syntax-propertize-rules ("\\(^\\| \\|\t\\)\\(FOO\\|BAR\\)\\($\\| \\|\t\\)" (2 "< "))) Does almost exactly what I want. Amusingly enough the matching is case-INsensitive so bAR, Bar, foo, etc matches the regexp. But it's a small flaw which I can live with. -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Font-lock of comments using comment tokens, does it work? 2015-06-07 3:58 ` Björn Lindqvist @ 2015-06-07 11:34 ` tomas 0 siblings, 0 replies; 11+ messages in thread From: tomas @ 2015-06-07 11:34 UTC (permalink / raw) To: Björn Lindqvist; +Cc: help-gnu-emacs, Stefan Monnier -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, Jun 07, 2015 at 05:58:45AM +0200, Björn Lindqvist wrote: > Ok so I finally almost figured it out. The key part was that you must > invert the logic so that instead of "unmarking" in > syntax-propertize-rules, you use a regexp that adds the comment > starter property "<" to the matched strings. Something is bugged with > the unmarking approach, it's like it stops looking when it found the > comment character. Anyway: > > (syntax-propertize-rules > ("\\(^\\| \\|\t\\)\\(FOO\\|BAR\\)\\($\\| \\|\t\\)" (2 "< "))) Got it. > Does almost exactly what I want. Amusingly enough the matching is > case-INsensitive so bAR, Bar, foo, etc matches the regexp. But it's a > small flaw which I can live with. This is most probably related to the (buffer local) variable case-fold-search, which controls whether the regexp search functions are case sensitive or not. By default, it's set to t (that's what you usually expect interactively). You could try to set this variable to nil and see whether it works better. Alas, there doesn't seem to be a way to control that in the simple syntax-propertize-rules -- possibly you'll have to go the way of defining a syntax-propertize-function, where you can set this variable dynamically for the relevant call. Ducking around, here's a little snippet which might help to get you started: <http://www.lunaryorn.com/feed.atom#hooking-into-syntactic-analyses> HTH, regards - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlV0LFgACgkQBcgs9XrR2kaEvwCcCR5xyTagZ/ihFBj7iv9keKTl pVAAnjRjsvahbfSW4+1sJ0UEconY1lz2 =5Ywn -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Font-lock of comments using comment tokens, does it work? @ 2015-06-03 16:46 Björn Lindqvist 0 siblings, 0 replies; 11+ messages in thread From: Björn Lindqvist @ 2015-06-03 16:46 UTC (permalink / raw) To: help-gnu-emacs Hello emacs, I have a really complicated font-locking problem I'm trying to solve for a major mode. It's like I've tried everything but nothing works. Here is the question I asked on Stack Overflow and got some help with but it didn't go all the way: http://stackoverflow.com/questions/29973458/avoid-font-locking-interfering-inside-of-comments I want to font-lock to understand that two short strings, e.g FOO and BAR are the comment tokens. The tokens themselves should be font-locked as comments and everything following them until the end of line should also be comments. The problem is that the strings only start comments if they are free-standing tokens. So on these four lines there are comments: random code FOO random comment stuff BAR comment "with stuff" BAR FOO BAR FOO On these four lines there are NO comments: FOObar random come BARFOO random code random code xyFOOzw random code "with string FOO " etc ... Because the comment tokens are not separate. I'm suspecting that I've found a limitation in emacs font-locking and that this is impossible to get completely right. I'd love to be proven wrong though. :) -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-06-07 11:34 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <mailman.4238.1433357678.904.help-gnu-emacs@gnu.org> 2015-06-03 19:11 ` Font-lock of comments using comment tokens, does it work? Stefan Monnier 2015-06-03 23:37 ` Björn Lindqvist [not found] ` <mailman.4244.1433374628.904.help-gnu-emacs@gnu.org> 2015-06-04 3:42 ` Stefan Monnier 2015-06-04 11:10 ` Björn Lindqvist [not found] ` <mailman.4276.1433416248.904.help-gnu-emacs@gnu.org> 2015-06-04 22:11 ` Stefan Monnier 2015-06-05 3:29 ` Björn Lindqvist 2015-06-05 6:53 ` tomas 2015-06-05 19:37 ` Björn Lindqvist 2015-06-07 3:58 ` Björn Lindqvist 2015-06-07 11:34 ` tomas 2015-06-03 16:46 Björn Lindqvist
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).