all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* c-mode syntax strings and regexp word boundaries
@ 2013-09-06  5:14 Jon Dufresne
  2013-09-06 12:21 ` Andreas Röhler
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jon Dufresne @ 2013-09-06  5:14 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

Hi,

I am trying to modify a major mode derived from c-mode. I am adding support
for an alternative string syntax (PHP heredoc). To do this I am using
"syntax-propertize-function" and
"syntax-propertize-extend-region-functions". (As an aside I am not sure
this is the best approach, but it is best I have come up with so far.)

When trying to extend the propertize region, a regexp fails, but I am not
clear as to why. I have isolated the problem with the following test case.

---
(with-temp-buffer
  (c-mode)
  (insert "END;\n")
  (goto-char (point-min))
  (message "Search forward first time")
  (re-search-forward "^END\\b")
  (put-text-property (1- (point)) (point)
                     'syntax-table (string-to-syntax "|"))
  (goto-char (point-min))
  (message "Search forward second time")
  (re-search-forward "^END\\b"))
---

Running this give me the output:

Search forward first time
Search forward second time
Search failed: "^END\\b"

I do not understand why the second regexp fails. I suppose it has to do
with the text property string boundary. What is the correct way to look for
the boundary in this case?

Thanks for any help.

[-- Attachment #2: Type: text/html, Size: 1421 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06  5:14 c-mode syntax strings and regexp word boundaries Jon Dufresne
@ 2013-09-06 12:21 ` Andreas Röhler
  2013-09-06 14:07 ` Stefan Monnier
  2013-09-06 20:19 ` Alan Mackenzie
  2 siblings, 0 replies; 9+ messages in thread
From: Andreas Röhler @ 2013-09-06 12:21 UTC (permalink / raw)
  To: emacs-devel; +Cc: Jon Dufresne

Am 06.09.2013 07:14, schrieb Jon Dufresne:
> Hi,
>
> I am trying to modify a major mode derived from c-mode. I am adding support
> for an alternative string syntax (PHP heredoc). To do this I am using
> "syntax-propertize-function" and
> "syntax-propertize-extend-region-functions". (As an aside I am not sure
> this is the best approach, but it is best I have come up with so far.)
>
> When trying to extend the propertize region, a regexp fails, but I am not
> clear as to why. I have isolated the problem with the following test case.
>
> ---
> (with-temp-buffer
>    (c-mode)
>    (insert "END;\n")
>    (goto-char (point-min))
>    (message "Search forward first time")
>    (re-search-forward "^END\\b")
>    (put-text-property (1- (point)) (point)
>                       'syntax-table (string-to-syntax "|"))
>    (goto-char (point-min))
>    (message "Search forward second time")
>    (re-search-forward "^END\\b"))
> ---
>
> Running this give me the output:
>
> Search forward first time
> Search forward second time
> Search failed: "^END\\b"
>
> I do not understand why the second regexp fails. I suppose it has to do
> with the text property string boundary. What is the correct way to look for
> the boundary in this case?
>
> Thanks for any help.
>

For me it's strange too.

Second

(re-search-forward "^END\\s.")

matches that way.

Can't see why the word-boundary isn't recognised any more.

Andreas



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06  5:14 c-mode syntax strings and regexp word boundaries Jon Dufresne
  2013-09-06 12:21 ` Andreas Röhler
@ 2013-09-06 14:07 ` Stefan Monnier
  2013-09-06 14:55   ` Jon Dufresne
  2013-09-06 20:19 ` Alan Mackenzie
  2 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2013-09-06 14:07 UTC (permalink / raw)
  To: Jon Dufresne; +Cc: emacs-devel

>   (re-search-forward "^END\\b")
>   (put-text-property (1- (point)) (point)
>                      'syntax-table (string-to-syntax "|"))

This changes the syntax of this "D" from "w" to "|".

>   (goto-char (point-min))
>   (message "Search forward second time")
>   (re-search-forward "^END\\b"))

The char before ";" is "D" which is not a word char any more, and the
";" is not a word char either, so the point between the two is not
a word boundary any more.


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06 14:07 ` Stefan Monnier
@ 2013-09-06 14:55   ` Jon Dufresne
  2013-09-06 15:44     ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Dufresne @ 2013-09-06 14:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1343 bytes --]

Thanks.

On Fri, Sep 6, 2013 at 7:07 AM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

> >   (re-search-forward "^END\\b")
> >   (put-text-property (1- (point)) (point)
> >                      'syntax-table (string-to-syntax "|"))
>
> This changes the syntax of this "D" from "w" to "|".
>

In this simplified example I *want* the entire string "END" to be a single
end of string syntax. It is not part of the actual string, only marks the
end of one. Instead, I went with only marking the "D" because marking all
characters appears (to me) to create three string fences when I only want
one.


>
> >   (goto-char (point-min))
> >   (message "Search forward second time")
> >   (re-search-forward "^END\\b"))
>
> The char before ";" is "D" which is not a word char any more, and the
> ";" is not a word char either, so the point between the two is not
> a word boundary any more.
>
>
Ah, ok. Understood. Is it possible to do regexp search of the buffer with
all text properties ignored? Or is there a better way to search for the
"END" delimiter? I'll need it to work before the buffer is propertized and
after the propertized buffer changed.

The current implementation (as you might be able to guess) works when the
buffer is first propertized, but not when the buffer is modified (as the
regexp in the test example fails).

Thanks,
Jon

[-- Attachment #2: Type: text/html, Size: 2197 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06 14:55   ` Jon Dufresne
@ 2013-09-06 15:44     ` Stefan Monnier
  2013-09-06 17:04       ` Jon Dufresne
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2013-09-06 15:44 UTC (permalink / raw)
  To: Jon Dufresne; +Cc: emacs-devel

>> This changes the syntax of this "D" from "w" to "|".
> In this simplified example I *want* the entire string "END" to be
> a single end of string syntax.

Yes, I understand.

> It is not part of the actual string, only marks the end of one.
> Instead, I went with only marking the "D" because marking all
> characters appears (to me) to create three string fences when I only
> want one.

Depending on the particular constraints imposed on you, you might get
away with marking the character *after* END as being the string
delimiter.  But that tends to come with its own set of problems, so most
likely, you'll have to live with what you're currently using.

> Ah, ok. Understood. Is it possible to do regexp search of the buffer with
> all text properties ignored?

No.  We have `parse-sexp-lookup-properties' to control whether
syntax-table text-properties are ignored or not by parsing function
(like forward-sexp), but there's no such control for regexp matching.

> Or is there a better way to search for the "END" delimiter? I'll need
> it to work before the buffer is propertized and after the propertized
> buffer changed.

Don't rely on syntax (i.e. things like \b, \>, \<, \s), and instead use
something like "^END\\(?:[^[:alnum:]]\\|\\'\\)".


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06 15:44     ` Stefan Monnier
@ 2013-09-06 17:04       ` Jon Dufresne
  0 siblings, 0 replies; 9+ messages in thread
From: Jon Dufresne @ 2013-09-06 17:04 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

On Fri, Sep 6, 2013 at 8:44 AM, Stefan Monnier <monnier@iro.umontreal.ca>wrote:

> Don't rely on syntax (i.e. things like \b, \>, \<, \s), and instead use
> something like "^END\\(?:[^[:alnum:]]\\|\\'\\)".
>

Thanks. That was very helpful and informative. I will give your suggestion
a shot and see how things go.

Cheers,
Jon

[-- Attachment #2: Type: text/html, Size: 757 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06  5:14 c-mode syntax strings and regexp word boundaries Jon Dufresne
  2013-09-06 12:21 ` Andreas Röhler
  2013-09-06 14:07 ` Stefan Monnier
@ 2013-09-06 20:19 ` Alan Mackenzie
  2013-09-06 22:38   ` Jon Dufresne
  2013-09-11  8:57   ` Andreas Röhler
  2 siblings, 2 replies; 9+ messages in thread
From: Alan Mackenzie @ 2013-09-06 20:19 UTC (permalink / raw)
  To: Jon Dufresne; +Cc: emacs-devel

Hi, Jon.

On Thu, Sep 05, 2013 at 10:14:16PM -0700, Jon Dufresne wrote:
> Hi,

> I am trying to modify a major mode derived from c-mode. I am adding support
> for an alternative string syntax (PHP heredoc). To do this I am using
> "syntax-propertize-function" and
> "syntax-propertize-extend-region-functions". (As an aside I am not sure
> this is the best approach, but it is best I have come up with so far.)

> When trying to extend the propertize region, a regexp fails, but I am not
> clear as to why. I have isolated the problem with the following test case.

> ---
> (with-temp-buffer
>   (c-mode)
>   (insert "END;\n")
>   (goto-char (point-min))
>   (message "Search forward first time")
>   (re-search-forward "^END\\b")
>   (put-text-property (1- (point)) (point)
>                      'syntax-table (string-to-syntax "|"))
>   (goto-char (point-min))
>   (message "Search forward second time")
>   (re-search-forward "^END\\b"))
> ---

> Thanks for any help.

Not what you asked, but as a suggestion, you might want to use the CC
Mode macros which deal with text properties on single characters.  For
example, you could have used `c-put-char-property' instead of
`put-text-property'.

This has the following pros/cons: (i) these macros are slightly less
cumbersome to use, since they take only a single position parameter; (ii)
They also work in XEmacs; (iii) there is some effort involved in learning
about them and how they work.

If you're interested, search cc-defs.el for

    Macros/functions to handle so-called "char properties",

and read the definitions in that page.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06 20:19 ` Alan Mackenzie
@ 2013-09-06 22:38   ` Jon Dufresne
  2013-09-11  8:57   ` Andreas Röhler
  1 sibling, 0 replies; 9+ messages in thread
From: Jon Dufresne @ 2013-09-06 22:38 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Fri, Sep 6, 2013 at 1:19 PM, Alan Mackenzie <acm@muc.de> wrote:

> Not what you asked, but as a suggestion, you might want to use the CC
> Mode macros which deal with text properties on single characters.  For
> example, you could have used `c-put-char-property' instead of
> `put-text-property'.
>
> This has the following pros/cons: (i) these macros are slightly less
> cumbersome to use, since they take only a single position parameter; (ii)
> They also work in XEmacs; (iii) there is some effort involved in learning
> about them and how they work.
>
> If you're interested, search cc-defs.el for
>
>     Macros/functions to handle so-called "char properties",
>
> and read the definitions in that page.
>

Thanks for the tip. I will check that out.

[-- Attachment #2: Type: text/html, Size: 1146 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: c-mode syntax strings and regexp word boundaries
  2013-09-06 20:19 ` Alan Mackenzie
  2013-09-06 22:38   ` Jon Dufresne
@ 2013-09-11  8:57   ` Andreas Röhler
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Röhler @ 2013-09-11  8:57 UTC (permalink / raw)
  To: emacs-devel

Am 06.09.2013 22:19, schrieb Alan Mackenzie:
> Hi, Jon.
>
> On Thu, Sep 05, 2013 at 10:14:16PM -0700, Jon Dufresne wrote:
>> Hi,
>
>> I am trying to modify a major mode derived from c-mode. I am adding support
>> for an alternative string syntax (PHP heredoc). To do this I am using
>> "syntax-propertize-function" and
>> "syntax-propertize-extend-region-functions". (As an aside I am not sure
>> this is the best approach, but it is best I have come up with so far.)
>
>> When trying to extend the propertize region, a regexp fails, but I am not
>> clear as to why. I have isolated the problem with the following test case.
>
>> ---
>> (with-temp-buffer
>>    (c-mode)
>>    (insert "END;\n")
>>    (goto-char (point-min))
>>    (message "Search forward first time")
>>    (re-search-forward "^END\\b")
>>    (put-text-property (1- (point)) (point)
>>                       'syntax-table (string-to-syntax "|"))
>>    (goto-char (point-min))
>>    (message "Search forward second time")
>>    (re-search-forward "^END\\b"))
>> ---
>
>> Thanks for any help.
>
> Not what you asked, but as a suggestion, you might want to use the CC
> Mode macros which deal with text properties on single characters.  For
> example, you could have used `c-put-char-property' instead of
> `put-text-property'.
>
> This has the following pros/cons: (i) these macros are slightly less
> cumbersome to use, since they take only a single position parameter; (ii)
> They also work in XEmacs; (iii) there is some effort involved in learning
> about them and how they work.
>
> If you're interested, search cc-defs.el for
>
>      Macros/functions to handle so-called "char properties",
>
> and read the definitions in that page.
>


Just for the record:

the error in the present case was the wrong position, which would fail also with a single char.

Instead of

  (put-text-property (1- (point)) (point)

  (put-text-property (point) (1+ point))

should work

Cheers




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-09-11  8:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-06  5:14 c-mode syntax strings and regexp word boundaries Jon Dufresne
2013-09-06 12:21 ` Andreas Röhler
2013-09-06 14:07 ` Stefan Monnier
2013-09-06 14:55   ` Jon Dufresne
2013-09-06 15:44     ` Stefan Monnier
2013-09-06 17:04       ` Jon Dufresne
2013-09-06 20:19 ` Alan Mackenzie
2013-09-06 22:38   ` Jon Dufresne
2013-09-11  8:57   ` Andreas Röhler

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.