all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Regex Problem => "overlapping words"
@ 2005-12-02 19:35 Tim Johnson
  2005-12-02 20:56 ` Tim Johnson
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Johnson @ 2005-12-02 19:35 UTC (permalink / raw)


I have a regex problem: the form containing the regex should appear
as one line between two lines of asterisks:
******************************************************************************************************************************
'("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(ef\\|oes\\)\\|func\\(tion\\)\\|has\\|sub?\\)\\>" (1 prepend) (2 font-lock-keyword-face))
******************************************************************************************************************************

The intent is that the following 'words': def, does, func, function,
has, sub

Should be highlighted as per 'font-lock-keyword-face.

'function' is *not* highlighted, so I have not handled
the "overlapping" words properly. 

All others are displayed as intended

I hope that someone can show me what I have done wrong.
thanks
tim

-- 
Tim Johnson <tim@johnsons-web.com>
      http://www.alaska-internet-solutions.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
  2005-12-02 19:35 Tim Johnson
@ 2005-12-02 20:56 ` Tim Johnson
  0 siblings, 0 replies; 9+ messages in thread
From: Tim Johnson @ 2005-12-02 20:56 UTC (permalink / raw)


I may have solved it... Being a regex noob, I might not use all of the
appropriate wording here, but essentially, I has not defining the word
boundaries properly.

I arrived at what I think is the correct expression by evaluating the
following in *Scratch*:

(regexp-opt '("def" "does" "func" "function" "has" "sub"))
;; which gave me
"d\\(?:ef\\|oes\\)\\|func\\(?:tion\\)?\\|has\\|sub"
;; and led me to change the form to:

************************************************************************************************************************************
'("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(?:ef\\|oes\\)\\|func\\(?:tion\\)?\\|has\\|sub?\\)\\>" (1 prepend) (2 font-lock-keyword-face))
************************************************************************************************************************************

Since I'm still pretty wet behind the ears with regex, I still welcome
any comments or suggestions should anyone find anything incorrect ....
thanks
tim

* Tim Johnson <tim@johnsons-web.com> [051202 10:45]:
> I have a regex problem: the form containing the regex should appear
> as one line between two lines of asterisks:
> ******************************************************************************************************************************
> '("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(ef\\|oes\\)\\|func\\(tion\\)\\|has\\|sub?\\)\\>" (1 prepend) (2 font-lock-keyword-face))
> ******************************************************************************************************************************
> 
> The intent is that the following 'words': def, does, func, function,
> has, sub
> 
> Should be highlighted as per 'font-lock-keyword-face.
> 
> 'function' is *not* highlighted, so I have not handled
> the "overlapping" words properly. 
> 
> All others are displayed as intended
> 
> I hope that someone can show me what I have done wrong.
> thanks
> tim
> 
> -- 
> Tim Johnson <tim@johnsons-web.com>
>       http://www.alaska-internet-solutions.com
> 
> 
> _______________________________________________
> Help-gnu-emacs mailing list
> Help-gnu-emacs@gnu.org
> http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

-- 
Tim Johnson <tim@johnsons-web.com>
      http://www.alaska-internet-solutions.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
       [not found] <mailman.17680.1133552219.20277.help-gnu-emacs@gnu.org>
@ 2005-12-03  3:04 ` Stefan Monnier
  2005-12-03 20:24   ` Tim Johnson
       [not found]   ` <mailman.17779.1133641535.20277.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Monnier @ 2005-12-03  3:04 UTC (permalink / raw)


> '("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(ef\\|oes\\)\\|func\\(tion\\)\\|has\\|sub?\\)\\>" (1 prepend) (2 font-lock-keyword-face))

You want to remove the ? after `sub' (it causes your regexp to match both
`sub' and `su') and you want to add a ? right after the \\(tion\\) group.


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
  2005-12-03  3:04 ` Regex Problem => "overlapping words" Stefan Monnier
@ 2005-12-03 20:24   ` Tim Johnson
       [not found]   ` <mailman.17779.1133641535.20277.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Tim Johnson @ 2005-12-03 20:24 UTC (permalink / raw)


* Stefan Monnier <monnier@iro.umontreal.ca> [051202 18:31]:
Hi Stefan:
> > '("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(ef\\|oes\\)\\|func\\(tion\\)\\|has\\|sub?\\)\\>" (1 prepend) (2 font-lock-keyword-face))
> 
> You want to remove the ? after `sub' (it causes your regexp to match both
> `sub' and `su') and you want to add a ? right after the \\(tion\\) group.
   Understood. And done. Thanks!
It know looks as follows:
'("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(?:ef\\|oes\\)\\|func\\(?:tion\\)?\\|has\\|sub\\)\\>" (1 prepend) (2 font-lock-keyword-face))

There is still a problem tho': Emacs seems to think that "-" is a
word or symbol boundary. Example
type "def" and it is highlighted. Adding a hyphen, we now have "def-"
the the "def" substring remains highlighted.

Perhaps this is not a regex issue, but some other setting to the mode? 
Example, in c-mode, "_" is considered part of a symbol? In lisp mode
"-" should be considered part of a symbol. In this mode, which is called
"Rebol Mode" "-" should be part of a symbol. I've hacked on this mode,
which appears to have been abandoned by the original author, and I
should probably be looking for some expression that defines what
constitutes characters that make up symbols.

Am I correct? If so, what should I be looking for.

Thanks again.
cheers
tim

-- 
Tim Johnson <tim@johnsons-web.com>
      http://www.alaska-internet-solutions.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
       [not found]   ` <mailman.17779.1133641535.20277.help-gnu-emacs@gnu.org>
@ 2005-12-03 20:50     ` Johan Bockgård
  2005-12-03 21:52     ` Stefan Monnier
  1 sibling, 0 replies; 9+ messages in thread
From: Johan Bockgård @ 2005-12-03 20:50 UTC (permalink / raw)


Tim Johnson <tim@johnsons-web.com> writes:

> Perhaps this is not a regex issue, but some other setting to the
> mode? Example, in c-mode, "_" is considered part of a symbol? In
> lisp mode "-" should be considered part of a symbol. In this mode,
> which is called "Rebol Mode" "-" should be part of a symbol. I've
> hacked on this mode, which appears to have been abandoned by the
> original author, and I should probably be looking for some
> expression that defines what constitutes characters that make up
> symbols.
>
> Am I correct? If so, what should I be looking for.

"syntax-table"

-- 
Johan Bockgård

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
       [not found]   ` <mailman.17779.1133641535.20277.help-gnu-emacs@gnu.org>
  2005-12-03 20:50     ` Johan Bockgård
@ 2005-12-03 21:52     ` Stefan Monnier
  2005-12-03 23:09       ` Tim Johnson
       [not found]       ` <mailman.17796.1133651456.20277.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 9+ messages in thread
From: Stefan Monnier @ 2005-12-03 21:52 UTC (permalink / raw)


> There is still a problem tho': Emacs seems to think that "-" is a
> word or symbol boundary.

This depends on the syntax-table settings.

Please don't change the syntax of ?- in the main syntax-table, but only in
the syntax-table used during font-locking:

  Take a look at the place where font-lock-defaults is set.
  Then look at the docstring of font-lock-defaults.
  Then locate the SYNTAX-ALIST element and set it so that the character ?-
  has syntax "w".


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
  2005-12-03 21:52     ` Stefan Monnier
@ 2005-12-03 23:09       ` Tim Johnson
       [not found]       ` <mailman.17796.1133651456.20277.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 9+ messages in thread
From: Tim Johnson @ 2005-12-03 23:09 UTC (permalink / raw)


* Stefan Monnier <monnier@iro.umontreal.ca> [051203 13:20]:
> > There is still a problem tho': Emacs seems to think that "-" is a
> > word or symbol boundary.
> 
> This depends on the syntax-table settings.
> 
> Please don't change the syntax of ?- in the main syntax-table, but only in
> the syntax-table used during font-locking:
> 
>   Take a look at the place where font-lock-defaults is set.
>   Then look at the docstring of font-lock-defaults.
>   Then locate the SYNTAX-ALIST element and set it so that the character ?-
>   has syntax "w".

    Thanks Stefan: Below are the two elisp forms that contain the symbol
    'font-lock-defaults:
     (make-local-variable 'font-lock-defaults)
     (setq font-lock-defaults '(rebol-font-lock-keywords nil nil))

     ;; there does not appear to be a docstring and there is no
     ;; symbol syntax-alist (or SYNTAX-ALIST) in the mode file
     
I'm going to put a copy of the code that I believe sets the mode-specific
syntax table at the end of this message, but I am beginning to suspect that the
problem may remain at the regex stage.

Let me first describe the functionality and the symptoms: Rebol uses a sort of
lambda calculus and subroutines are themselves subroutines (sort of like
'defmacro in lisp), and can be defined with different interfaces and scope
rules. Adding a colon to any symbol binds that symbol to definitions that
follow. A colon must immediately follow the symbol with no intervening
whitespace.

Bearing in mind that the keywords for creating subroutines in rebol are as
follows (for my setup) "def" "does" "function" "func" "has" "sub", then we
define a symbol as one of these subroutine constructs in the following example:

hello: def[S][print ["hello " S]]

so that 
hello "Stefan" => "hello Stefan"

The specific highlight occurs when a colon and a space precede that keyword.

Now if I type another keyword, say 'print which is defined with another
font-lock group and *then* add a hyphen, the highlight for the symbol is turned
off (as it should be). Because I can only partially follow the code in the
syntax-table, this is my long-winded way of saying that perhaps this remains a
regex problem.

So.... following is the regex form as I now have it coded
**********************************************************************************************************************************
'("\\([^][ \t\r\n{}()]+\\):[ ]*\\(d\\(?:ef\\|oes\\)\\|func\\(?:tion\\)?\\|has\\|sub\\)\\>" (1 prepend) (2 font-lock-keyword-face)) 
**********************************************************************************************************************************
  My thanks again. This is no big deal, just helping to edify me about 
  elisp.        
;; syntax table code follows
(defvar rebol-mode-syntax-table nil 
  "Syntax table for REBOL buffers.")

(if (not rebol-mode-syntax-table)
    (let ((i 0))
      (setq rebol-mode-syntax-table (make-syntax-table))
      (set-syntax-table rebol-mode-syntax-table)

      ;; Default is `word' constituent.
      (while (< i 256)
        (modify-syntax-entry i "_   ")
        (setq i (1+ i)))

      ;; Digits are word components.
      (setq i ?0)
      (while (<= i ?9)
        (modify-syntax-entry i "w   ")
        (setq i (1+ i)))

      ;; As are upper and lower case.
      (setq i ?A)
      (while (<= i ?Z)
        (modify-syntax-entry i "w   ")
        (setq i (1+ i)))
      (setq i ?a)
      (while (<= i ?z)
        (modify-syntax-entry i "w   ")
        (setq i (1+ i)))

      ;; Whitespace
      (modify-syntax-entry ?\t "    ")
      (modify-syntax-entry ?\n ">   ")
      (modify-syntax-entry ?\f "    ")
      (modify-syntax-entry ?\r "    ")
      (modify-syntax-entry ?  "    ")

      ;; Delimiters
      (modify-syntax-entry ?[ "(]  ")
      (modify-syntax-entry ?] ")[  ")
      (modify-syntax-entry ?\( "()  ")
      (modify-syntax-entry ?\) ")(  ")

      ;; comments
      (modify-syntax-entry ?\; "<   ")
      (modify-syntax-entry ?\" "\"    ")
      (modify-syntax-entry ?{ "    ")
      (modify-syntax-entry ?} "    ")
      (modify-syntax-entry ?' "  p")
      (modify-syntax-entry ?` "  p")

      (modify-syntax-entry ?^ "\\   ")))
-- 
Tim Johnson <tim@johnsons-web.com>
      http://www.alaska-internet-solutions.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
       [not found]       ` <mailman.17796.1133651456.20277.help-gnu-emacs@gnu.org>
@ 2005-12-04 15:50         ` Stefan Monnier
  2005-12-04 17:13           ` Tim Johnson
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2005-12-04 15:50 UTC (permalink / raw)


>> Take a look at the place where font-lock-defaults is set.
>> Then look at the docstring of font-lock-defaults.
>> Then locate the SYNTAX-ALIST element and set it so that the character ?-
>> has syntax "w".

>     Thanks Stefan: Below are the two elisp forms that contain the symbol
>     'font-lock-defaults:
>      (make-local-variable 'font-lock-defaults)
>      (setq font-lock-defaults '(rebol-font-lock-keywords nil nil))

>      ;; there does not appear to be a docstring and there is no
>      ;; symbol syntax-alist (or SYNTAX-ALIST) in the mode file

I must say I don't understand your message.
It seems you managed the first step (finding the place where
font-lock-defaults is set).  Have you managed to do the second step (look
at the docstring of font-lock-defaults)?  What about the third?
     

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Regex Problem => "overlapping words"
  2005-12-04 15:50         ` Stefan Monnier
@ 2005-12-04 17:13           ` Tim Johnson
  0 siblings, 0 replies; 9+ messages in thread
From: Tim Johnson @ 2005-12-04 17:13 UTC (permalink / raw)


* Stefan Monnier <monnier@iro.umontreal.ca> [051204 07:22]:
> >> Take a look at the place where font-lock-defaults is set.
> >> Then look at the docstring of font-lock-defaults.
> >> Then locate the SYNTAX-ALIST element and set it so that the character ?-
> >> has syntax "w".
> 
> >     Thanks Stefan: Below are the two elisp forms that contain the symbol
> >     'font-lock-defaults:
> >      (make-local-variable 'font-lock-defaults)
> >      (setq font-lock-defaults '(rebol-font-lock-keywords nil nil))
> 
> >      ;; there does not appear to be a docstring and there is no
> >      ;; symbol syntax-alist (or SYNTAX-ALIST) in the mode file
> 
> I must say I don't understand your message.
   
    Sorry ...

> It seems you managed the first step (finding the place where
> font-lock-defaults is set).  
> Have you managed to do the second step (look at the docstring of font-lock-defaults)?  

  There is none that I can find. Where else would I look? There is no
  other references in the file.

> What about the third?
      
  What I stated in my message is that from all appearances, the hyphen
  *IS* properly contained in the syntax table. 
  
  The substrings above can actually be in two different font-lock groups
  depending on whether or not they are used to create a subroutine and
  are preceded by a colon. (Please see the regex).

  Two test, I placed "def" in another font-lock group.
  (rebol-user-functions) Now when 
  I type in "def" *without* a preceding colon, it is highlighted as per
  of the 'rebol-user-functions group. If I then type a hyphen, so we
  have "def-" the highlight is disabled. CORRECT.

  Now, if I type in "def" following the colon, it is first highlighted
  as per 'rebol-user-functions. *Then*, if I add a hyphen, the
  highlighting for the "def" substring changes to the color of
  'font-lock-keyword-face AND the color of the hyphen and every
  character following it is "painted" the 'default color. WRONG.

  =======================================================================
   This suggests to me that my regex is still lacking something, and the
   problem is probably *not* in the syntax table.
  =======================================================================

  I appreciate your help, but this is no big deal. Does not effect
  functionality. 

  Thank you very much.
  tim

> 
>         Stefan
> _______________________________________________
> Help-gnu-emacs mailing list
> Help-gnu-emacs@gnu.org
> http://lists.gnu.org/mailman/listinfo/help-gnu-emacs

-- 
Tim Johnson <tim@johnsons-web.com>
      http://www.alaska-internet-solutions.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-12-04 17:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.17680.1133552219.20277.help-gnu-emacs@gnu.org>
2005-12-03  3:04 ` Regex Problem => "overlapping words" Stefan Monnier
2005-12-03 20:24   ` Tim Johnson
     [not found]   ` <mailman.17779.1133641535.20277.help-gnu-emacs@gnu.org>
2005-12-03 20:50     ` Johan Bockgård
2005-12-03 21:52     ` Stefan Monnier
2005-12-03 23:09       ` Tim Johnson
     [not found]       ` <mailman.17796.1133651456.20277.help-gnu-emacs@gnu.org>
2005-12-04 15:50         ` Stefan Monnier
2005-12-04 17:13           ` Tim Johnson
2005-12-02 19:35 Tim Johnson
2005-12-02 20:56 ` Tim Johnson

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.