unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
@ 2011-12-03 23:23 Alan Mackenzie
  2011-12-03 23:40 ` Daniel Colascione
  2011-12-04  3:39 ` Stefan Monnier
  0 siblings, 2 replies; 16+ messages in thread
From: Alan Mackenzie @ 2011-12-03 23:23 UTC (permalink / raw)
  To: emacs-devel

Hi, Emacs.

There's a problem with parse-partial-sexp.  If one scans to the middle
of a comment opener  /*
                      ^
                      |
, parse-partial-sexp gives no indication that we might be half inside a
comment.  In particular, checking (nth 3 state) and (nth 4 state) is
insufficient to know that one is at a "safe place".

parse-partial-sexp does, however, notify the caller when it is just
after a backquote, a somewhat analogous situation.

No doubt there is some record of this state hidden away in (nth 9
state).

I think it would be a good idea to provide a function to test for this
"half comment" state, somewhat like `syntax-ppss-toplevel-pos'.  This
new defun could be called something like
`syntax-ppss-comment-half-opener' and calling it would return nil
usually, but ?/ in these circumstances.

What do other people think?

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-03 23:23 Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe Alan Mackenzie
@ 2011-12-03 23:40 ` Daniel Colascione
  2011-12-04  3:39 ` Stefan Monnier
  1 sibling, 0 replies; 16+ messages in thread
From: Daniel Colascione @ 2011-12-03 23:40 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 520 bytes --]

On 12/3/11 3:23 PM, Alan Mackenzie wrote:
> I think it would be a good idea to provide a function to test for this
> "half comment" state, somewhat like `syntax-ppss-toplevel-pos'.  This
> new defun could be called something like
> `syntax-ppss-comment-half-opener' and calling it would return nil
> usually, but ?/ in these circumstances.
> 
> What do other people think?

Why not just scan one character further ahead, using the previous position and
parse state, to see whether you then enter a comment?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 235 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-03 23:23 Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe Alan Mackenzie
  2011-12-03 23:40 ` Daniel Colascione
@ 2011-12-04  3:39 ` Stefan Monnier
  2011-12-04 10:41   ` martin rudalics
  1 sibling, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2011-12-04  3:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: emacs-devel

> There's a problem with parse-partial-sexp.  If one scans to the middle
> of a comment opener  /*
>                       ^
>                       |
> , parse-partial-sexp gives no indication that we might be half inside a
[...]
> No doubt there is some record of this state hidden away in (nth 9 state).

IIRC you're just a bit too optimistic: parse-partial-sexp does not
record this info anywhere.  And yes, if my recollection is right, that
means it's got a bug.

The better way to fix it is probably to change the (nth 5 ppss) value so
it holds something like "buffer position actually described by PPSS in
case the requested buffer position is in the middle of a lexeme" and
so it can be used for both backslashes and multi-char comment markers.


        Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04  3:39 ` Stefan Monnier
@ 2011-12-04 10:41   ` martin rudalics
  2011-12-04 15:21     ` Stefan Monnier
  0 siblings, 1 reply; 16+ messages in thread
From: martin rudalics @ 2011-12-04 10:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, emacs-devel

 > The better way to fix it is probably to change the (nth 5 ppss) value so
 > it holds something like "buffer position actually described by PPSS in
 > case the requested buffer position is in the middle of a lexeme" and
 > so it can be used for both backslashes and multi-char comment markers.

If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
is unreliable in this special case.  I think Daniel is right here: Check
whether the character following TO completes a comment begin (or comment
end) lexeme and in that case return consistently the in-comment value.

martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04 10:41   ` martin rudalics
@ 2011-12-04 15:21     ` Stefan Monnier
  2011-12-04 17:06       ` martin rudalics
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2011-12-04 15:21 UTC (permalink / raw)
  To: martin rudalics; +Cc: Alan Mackenzie, emacs-devel

>> The better way to fix it is probably to change the (nth 5 ppss) value so
>> it holds something like "buffer position actually described by PPSS in
>> case the requested buffer position is in the middle of a lexeme" and
>> so it can be used for both backslashes and multi-char comment markers.
> If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
> is unreliable in this special case.

Not if (nth 5 ppss) says that the buffer position is the one *after* the
"/*" sequence.  Of course for "*/" we'd conversely want to use the state
*before* "*/".


        Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04 15:21     ` Stefan Monnier
@ 2011-12-04 17:06       ` martin rudalics
  2011-12-04 20:47         ` Andreas Röhler
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: martin rudalics @ 2011-12-04 17:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, emacs-devel

 >> If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
 >> is unreliable in this special case.
 >
 > Not if (nth 5 ppss) says that the buffer position is the one *after* the
 > "/*" sequence.  Of course for "*/" we'd conversely want to use the state
 > *before* "*/".

What I meant was that the caller would have to care about (nth 5 ppss)
too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).  If
we say that a comment is everything in between and including both
delimiters she won't have to care about (nth 5 ppss) in the first place.

Admittedly, it's not entirely trivial to implement.  But the fact that
between "/" and "*" we are not in a comment whilst between "*" and "/"
we are doesn't strike me as very intuitive.

martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04 17:06       ` martin rudalics
@ 2011-12-04 20:47         ` Andreas Röhler
  2011-12-05  3:33         ` Stefan Monnier
  2011-12-05 11:25         ` Alan Mackenzie
  2 siblings, 0 replies; 16+ messages in thread
From: Andreas Röhler @ 2011-12-04 20:47 UTC (permalink / raw)
  To: emacs-devel

Am 04.12.2011 18:06, schrieb martin rudalics:
>  >> If you change (nth 5 ppss) you would still have to say that (nth 4
> ppss)
>  >> is unreliable in this special case.
>  >
>  > Not if (nth 5 ppss) says that the buffer position is the one *after* the
>  > "/*" sequence. Of course for "*/" we'd conversely want to use the state
>  > *before* "*/".
>
> What I meant was that the caller would have to care about (nth 5 ppss)
> too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss). If
> we say that a comment is everything in between and including both
> delimiters she won't have to care about (nth 5 ppss) in the first place.
>
> Admittedly, it's not entirely trivial to implement. But the fact that
> between "/" and "*" we are not in a comment whilst between "*" and "/"
> we are doesn't strike me as very intuitive.
>
> martin
>
>

Hi,

a more striking example might deliver comments in html

<!-- base href="https://blub+index" -->

thinks it's only the beginning which needs to be cared beside pps

worked around it with

- looking-at comment-start
- a check, if inside the begin-string, using string-match

Andreas





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04 17:06       ` martin rudalics
  2011-12-04 20:47         ` Andreas Röhler
@ 2011-12-05  3:33         ` Stefan Monnier
  2011-12-05  7:41           ` martin rudalics
  2011-12-05 11:35           ` Alan Mackenzie
  2011-12-05 11:25         ` Alan Mackenzie
  2 siblings, 2 replies; 16+ messages in thread
From: Stefan Monnier @ 2011-12-05  3:33 UTC (permalink / raw)
  To: martin rudalics; +Cc: Alan Mackenzie, emacs-devel

>>> If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
>>> is unreliable in this special case.
>> Not if (nth 5 ppss) says that the buffer position is the one *after* the
>> "/*" sequence.  Of course for "*/" we'd conversely want to use the state
>> *before* "*/".
> What I meant was that the caller would have to care about (nth 5 ppss)
> too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).

That's what I understood and my suggestion does address this issue (tho
it means that (nth 5 ppss) will sometimes refer to a buffer position
after (point) and sometimes before).

A case that needs to work is "/*/" in C mode, for example.


        Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-05  3:33         ` Stefan Monnier
@ 2011-12-05  7:41           ` martin rudalics
  2011-12-05 14:01             ` Stefan Monnier
  2011-12-05 11:35           ` Alan Mackenzie
  1 sibling, 1 reply; 16+ messages in thread
From: martin rudalics @ 2011-12-05  7:41 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, emacs-devel

 >> What I meant was that the caller would have to care about (nth 5 ppss)
 >> too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).
 >
 > That's what I understood and my suggestion does address this issue (tho
 > it means that (nth 5 ppss) will sometimes refer to a buffer position
 > after (point) and sometimes before).

I still miss what you need (nth 5 ppss) for here.  Is it for providing
the OLDSTATE argument in another call to `parse-partial-sexp'?

martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-04 17:06       ` martin rudalics
  2011-12-04 20:47         ` Andreas Röhler
  2011-12-05  3:33         ` Stefan Monnier
@ 2011-12-05 11:25         ` Alan Mackenzie
  2011-12-06 10:15           ` martin rudalics
  2 siblings, 1 reply; 16+ messages in thread
From: Alan Mackenzie @ 2011-12-05 11:25 UTC (permalink / raw)
  To: martin rudalics; +Cc: Stefan Monnier, emacs-devel

Hello, Martin.

On Sun, Dec 04, 2011 at 06:06:16PM +0100, martin rudalics wrote:
>  >> If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
>  >> is unreliable in this special case.

>  > Not if (nth 5 ppss) says that the buffer position is the one *after* the
>  > "/*" sequence.  Of course for "*/" we'd conversely want to use the state
>  > *before* "*/".

> What I meant was that the caller would have to care about (nth 5 ppss)
> too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).  If
> we say that a comment is everything in between and including both
> delimiters she won't have to care about (nth 5 ppss) in the first place.

The parse-partial scanner works strictly left to right.  If (nth 5 ppss)
records the left hand bit of "/*", we are not yet in a comment.  We're
probably about to do a division.  Similarly, after * of "*/", we're still
in the comment, probably just passed a comment prefix.

Admittedly CC Mode records the entire comment, including /* and */.

> Admittedly, it's not entirely trivial to implement.  But the fact that
> between "/" and "*" we are not in a comment whilst between "*" and "/"
> we are doesn't strike me as very intuitive.

I disagree.  I think keeping the stricly L to R invariant of the parse is
critically important (but don't ask me why :-).

> martin

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-05  3:33         ` Stefan Monnier
  2011-12-05  7:41           ` martin rudalics
@ 2011-12-05 11:35           ` Alan Mackenzie
  1 sibling, 0 replies; 16+ messages in thread
From: Alan Mackenzie @ 2011-12-05 11:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: martin rudalics, emacs-devel

Hello, Stefan,

On Sun, Dec 04, 2011 at 10:33:37PM -0500, Stefan Monnier wrote:
> >>> If you change (nth 5 ppss) you would still have to say that (nth 4 ppss)
> >>> is unreliable in this special case.
> >> Not if (nth 5 ppss) says that the buffer position is the one *after* the
> >> "/*" sequence.  Of course for "*/" we'd conversely want to use the state
> >> *before* "*/".
> > What I meant was that the caller would have to care about (nth 5 ppss)
> > too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).

> That's what I understood and my suggestion does address this issue (tho
> it means that (nth 5 ppss) will sometimes refer to a buffer position
> after (point) and sometimes before).

I think this is very wrong, and will lead to unwanted complications.  I
would suggest this:

  5. `t' if point is just after a quote character.  The character just
  scanned if that might be part of a double character comment boundary.

This should be straightforward to hack.

However, there will be crazy hackers who have tested (nth 5 ppss) as
being non-nil, rather than looking for t.  :-(  I say, tough on them.

> A case that needs to work is "/*/" in C mode, for example.

The above suggestion would handle this appropriately.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-05  7:41           ` martin rudalics
@ 2011-12-05 14:01             ` Stefan Monnier
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2011-12-05 14:01 UTC (permalink / raw)
  To: martin rudalics; +Cc: Alan Mackenzie, emacs-devel

>>> What I meant was that the caller would have to care about (nth 5 ppss)
>>> too, wherever she now looked only at (nth 3 ppss) and (nth 4 ppss).
>> That's what I understood and my suggestion does address this issue (tho
>> it means that (nth 5 ppss) will sometimes refer to a buffer position
>> after (point) and sometimes before).
> I still miss what you need (nth 5 ppss) for here.  Is it for providing
> the OLDSTATE argument in another call to `parse-partial-sexp'?

Yes.
Think of calling parse-partial-sexp twice, passing the first result to
the second call, where the first result is in the middle of a "/*/".


        Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-05 11:25         ` Alan Mackenzie
@ 2011-12-06 10:15           ` martin rudalics
  2011-12-06 10:33             ` Alan Mackenzie
  0 siblings, 1 reply; 16+ messages in thread
From: martin rudalics @ 2011-12-06 10:15 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel

 > The parse-partial scanner works strictly left to right.  If (nth 5 ppss)
 > records the left hand bit of "/*", we are not yet in a comment.  We're
 > probably about to do a division.  Similarly, after * of "*/", we're still
 > in the comment, probably just passed a comment prefix.

If we can look ahead by one character, there is no probability but
certainty.  And the latter is what you want in (nth 4 ppss).  The
remaining case is with an "/" at the end of a buffer and that case
wouldn't trouble me.

 > I disagree.  I think keeping the stricly L to R invariant of the parse is
 > critically important (but don't ask me why :-).

Why would looking ahead violate a L to R rule?

martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-06 10:15           ` martin rudalics
@ 2011-12-06 10:33             ` Alan Mackenzie
  2011-12-06 13:39               ` martin rudalics
  2011-12-06 13:50               ` Stefan Monnier
  0 siblings, 2 replies; 16+ messages in thread
From: Alan Mackenzie @ 2011-12-06 10:33 UTC (permalink / raw)
  To: martin rudalics; +Cc: Stefan Monnier, emacs-devel

Hello, Martin.

On Tue, Dec 06, 2011 at 11:15:22AM +0100, martin rudalics wrote:
>  > The parse-partial scanner works strictly left to right.  If (nth 5
>  > ppss) records the left hand bit of "/*", we are not yet in a
>  > comment.  We're probably about to do a division.  Similarly, after *
>  > of "*/", we're still in the comment, probably just passed a comment
>  > prefix.

> If we can look ahead by one character, there is no probability but
> certainty.  And the latter is what you want in (nth 4 ppss).  The
> remaining case is with an "/" at the end of a buffer and that case
> wouldn't trouble me.

One can delete anything inside a comment and it is still a comment.  We
(i.e. I :-) don't want to introduce an extra special case about the first
character of a comment.

>  > I disagree.  I think keeping the stricly L to R invariant of the
>  > parse is critically important (but don't ask me why :-).

> Why would looking ahead violate a L to R rule?

Think of it as the direction one's head is turned on a British street
when about to cross it suicidally.  At the moment, parse-partial-sexp
looks only at the characters to the left; it never pays any attention
whatsoever to characters on the right.

p-p-s is a finite state machine.  If it starts looking to the right, it
will still be a fsm, but with many more states.

Again, what of "/*/" mentioned by Stefan?  If we're already in the
comment after the first "/", then we're apparently looking at a comment
ender.  This complication (and it is complicated) surely condemns the
approach.

I think we should use the same approach as for escape characters: record
the fact in (nth 5 state) that we've passed one, but otherwise take no
action.

> martin

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-06 10:33             ` Alan Mackenzie
@ 2011-12-06 13:39               ` martin rudalics
  2011-12-06 13:50               ` Stefan Monnier
  1 sibling, 0 replies; 16+ messages in thread
From: martin rudalics @ 2011-12-06 13:39 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: Stefan Monnier, emacs-devel

 > One can delete anything inside a comment and it is still a comment.  We
 > (i.e. I :-) don't want to introduce an extra special case about the first
 > character of a comment.

What is "the first character of a comment"?  With current Emacs sources
the first character of a "/* ... */" comment is the leading "/" when
looking at (nth 8 ppss).  But at the position to the right of that
character we're still not "within" that comment.  Doesn't that strike
you as paradoxical at least?

 > p-p-s is a finite state machine.  If it starts looking to the right, it
 > will still be a fsm, but with many more states.

I think there won't be any more states than with your proposal.

 > Again, what of "/*/" mentioned by Stefan?  If we're already in the
 > comment after the first "/", then we're apparently looking at a comment
 > ender.  This complication (and it is complicated) surely condemns the
 > approach.

This complication exists already as you can verify by looking at the
corresponding code.  The value of the last comment start position (the
position before the leading "/") is IMHO sufficient to handle this case
well.

 > I think we should use the same approach as for escape characters: record
 > the fact in (nth 5 state) that we've passed one, but otherwise take no
 > action.

Since you're the person most affected, the choice should be yours.
Nevertheless, I think that your initial claim

   In particular, checking (nth 3 state) and (nth 4 state) is
   insufficient to know that one is at a "safe place".

could be easily corrected.

martin



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe.
  2011-12-06 10:33             ` Alan Mackenzie
  2011-12-06 13:39               ` martin rudalics
@ 2011-12-06 13:50               ` Stefan Monnier
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2011-12-06 13:50 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: martin rudalics, emacs-devel

> I think we should use the same approach as for escape characters: record
> the fact in (nth 5 state) that we've passed one, but otherwise take no
> action.

I think I agree.  But I suspect it's going to be painful to write the
patch for it.  It's probably going to be easier to store in
(nth 5 state) a buffer position from where to pick up the parse.


        Stefan



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-06 13:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-03 23:23 Musings: Supposed places of safety, guaranteed by parse-partial-sexp are not safe Alan Mackenzie
2011-12-03 23:40 ` Daniel Colascione
2011-12-04  3:39 ` Stefan Monnier
2011-12-04 10:41   ` martin rudalics
2011-12-04 15:21     ` Stefan Monnier
2011-12-04 17:06       ` martin rudalics
2011-12-04 20:47         ` Andreas Röhler
2011-12-05  3:33         ` Stefan Monnier
2011-12-05  7:41           ` martin rudalics
2011-12-05 14:01             ` Stefan Monnier
2011-12-05 11:35           ` Alan Mackenzie
2011-12-05 11:25         ` Alan Mackenzie
2011-12-06 10:15           ` martin rudalics
2011-12-06 10:33             ` Alan Mackenzie
2011-12-06 13:39               ` martin rudalics
2011-12-06 13:50               ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).