unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#31290: Fundamental bugs in syntax-propertize
@ 2018-04-27 21:08 Alan Mackenzie
  2018-05-08 12:35 ` Dmitry Gutov
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Mackenzie @ 2018-04-27 21:08 UTC (permalink / raw)
  To: 31290

Hello, Emacs.

There are fundamental bugs in syntax-propertize and
syntax-propertize-function.  The doc string of the latter states:

    The specified function may call `syntax-ppss' on any position before
    END, ....

This is untrue.  True is that syntax-ppss can be called on a position
only up to syntax-propertize--done.  After this point, the syntax-table
properties haven't been applied, so calling syntax-ppss is, in general,
going to give a false result.

At least that would be true if syntax-propertize--done hadn't been
prematurely and spuriously increased, crudely to prevent an infinite
recursion, falsely indicating to the syntax-ppss infrastructure that the
syntax-table properties have already been applied to the region (BEGIN
END).

    .... but it should not call `syntax-ppss-flush-cache', ....

Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
back to its true value, allowing the wrongly allowed syntax-ppss calls at
a later position to cause a recursive loop.

    .... which means that it should not call `syntax-ppss' on some
    position and later modify the buffer on some earlier position.

This is a bad restriction, because sometimes syntax-table properties can
only be correctly determined by examining the syntax of later buffer
positions.  An example of this is giving the string-fence syntax-table
text property to an unbalanced opening string quote, but not to correctly
matched quotes.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The plain fact is that (syntax-ppss pos) calls (syntax-propertize pos),
so syntax-propertize cannot itself use syntax-ppss because of the
recursive loop thus created.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Proposed solutions:

1. Major modes' syntax-propertize-function's are somehow given read
access to syntax-propertize--done, and may call syntax-ppss up to that
point only.  syntax-propertize--done is updated only after the
syntax-table properties have been applied.  Or....

2. syntax-propertize-function's are banned from using syntax-ppss, the
documentation instead directing them to use parse-partial-sexp directly.

In either solution, the restriction on using syntax-ppss-flush-cache
would no longer be necessary, and there would be no restriction on
setting syntax-table text properties at an earlier position than the one
currently being analysed.

I think solution 2 is the better one.

--
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#31290: Fundamental bugs in syntax-propertize
  2018-04-27 21:08 bug#31290: Fundamental bugs in syntax-propertize Alan Mackenzie
@ 2018-05-08 12:35 ` Dmitry Gutov
  2018-05-12 11:26   ` Alan Mackenzie
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Gutov @ 2018-05-08 12:35 UTC (permalink / raw)
  To: Alan Mackenzie, 31290

On 4/28/18 12:08 AM, Alan Mackenzie wrote:

> At least that would be true if syntax-propertize--done hadn't been
> prematurely and spuriously increased, crudely to prevent an infinite
> recursion, falsely indicating to the syntax-ppss infrastructure that the
> syntax-table properties have already been applied to the region (BEGIN
> END).
> 
>      .... but it should not call `syntax-ppss-flush-cache', ....
> 
> Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
> back to its true value, allowing the wrongly allowed syntax-ppss calls at
> a later position to cause a recursive loop.

Maybe we should "allow" it to loop, in certain cases? Leaving it to be 
the responsibility of the programmer, to make sure the result doesn't 
infloop, even if these rules are violated.

>      .... which means that it should not call `syntax-ppss' on some
>      position and later modify the buffer on some earlier position.
> 
> This is a bad restriction, because sometimes syntax-table properties can
> only be correctly determined by examining the syntax of later buffer
> positions.  An example of this is giving the string-fence syntax-table
> text property to an unbalanced opening string quote, but not to correctly
> matched quotes.

I'm not exactly convinced by the given example (why would we use the 
string-fence in that case?), but it might be better if something like 
this was possible, indeed.

> 2. syntax-propertize-function's are banned from using syntax-ppss, the
> documentation instead directing them to use parse-partial-sexp directly.

The ones that currently call syntax-ppss, can't simply switch over to 
parse-partial-sexp without becoming slower due to the lack of cache.

Before tackling this bug, I'd rather we see a real-world problem that it 
caused, and pick a particular approach based on it.

But off the top of my head, we could introduce a "stricter but somewhat 
slower" variation of syntax-ppss to be called inside 
syntax-propertize-function's, which would treat the values in question 
more carefully, somehow.





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#31290: Fundamental bugs in syntax-propertize
  2018-05-08 12:35 ` Dmitry Gutov
@ 2018-05-12 11:26   ` Alan Mackenzie
  2018-05-13  7:33     ` Andreas Röhler
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Mackenzie @ 2018-05-12 11:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 31290

Hello, Dmitry.

On Tue, May 08, 2018 at 15:35:14 +0300, Dmitry Gutov wrote:
> On 4/28/18 12:08 AM, Alan Mackenzie wrote:

> > At least that would be true if syntax-propertize--done hadn't been
> > prematurely and spuriously increased, crudely to prevent an infinite
> > recursion, falsely indicating to the syntax-ppss infrastructure that the
> > syntax-table properties have already been applied to the region (BEGIN
> > END).

> >      .... but it should not call `syntax-ppss-flush-cache', ....

> > Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
> > back to its true value, allowing the wrongly allowed syntax-ppss calls at
> > a later position to cause a recursive loop.

> Maybe we should "allow" it to loop, in certain cases? Leaving it to be 
> the responsibility of the programmer, to make sure the result doesn't 
> infloop, even if these rules are violated.

I'm not sure how this could work.  We would need to formalise the rules
very carefully, to avoid the need to read syntax.{c,el}'s source code.

> >      .... which means that it should not call `syntax-ppss' on some
> >      position and later modify the buffer on some earlier position.

> > This is a bad restriction, because sometimes syntax-table properties can
> > only be correctly determined by examining the syntax of later buffer
> > positions.  An example of this is giving the string-fence syntax-table
> > text property to an unbalanced opening string quote, but not to correctly
> > matched quotes.

> I'm not exactly convinced by the given example (why would we use the 
> string-fence in that case?), but it might be better if something like 
> this was possible, indeed.

String fence can be used to signal to font lock that the delimiter
(together with the "mismatching" unescaped EOL) should be fontified in
warning face.

A better example might be C++ Mode's marking of a "< ... >" pair with
paren syntax.  This isn't done with syntax-propertize-function (as you
know), but it would be nice if this were possible.

> > 2. syntax-propertize-function's are banned from using syntax-ppss, the
> > documentation instead directing them to use parse-partial-sexp directly.

> The ones that currently call syntax-ppss, can't simply switch over to 
> parse-partial-sexp without becoming slower due to the lack of cache.

The cache at the pertinent buffer position doesn't exist at the time:
consistent syntax-table properties aren't on the preceding buffer
positions.

> Before tackling this bug, I'd rather we see a real-world problem that it 
> caused, and pick a particular approach based on it.

My enhancements for bug#30393: "24.4; cperl-mode: indentation failure -
Documentation enhancements", where (almost) any change which affects the
syntactic state is programmed to call syntax-ppss-flush-cache from the C
level, clashes with the mechanism in this bug report.  Most of the time
it's fine, but when a change affecting the syntactic state is made from
inside a synax-propertize-function, Emacs goes into an infinite recursive
loop.

This isn't good.

> But off the top of my head, we could introduce a "stricter but somewhat 
> slower" variation of syntax-ppss to be called inside 
> syntax-propertize-function's, which would treat the values in question 
> more carefully, somehow.

That's an idea worth exploring.

-- 
Alan Mackenzie (Nuremberg, Germany).





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#31290: Fundamental bugs in syntax-propertize
  2018-05-12 11:26   ` Alan Mackenzie
@ 2018-05-13  7:33     ` Andreas Röhler
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Röhler @ 2018-05-13  7:33 UTC (permalink / raw)
  To: 31290

On 12.05.2018 13:26, Alan Mackenzie wrote:
> Hello, Dmitry.
> 
> On Tue, May 08, 2018 at 15:35:14 +0300, Dmitry Gutov wrote:
>> On 4/28/18 12:08 AM, Alan Mackenzie wrote:
> 
>>> At least that would be true if syntax-propertize--done hadn't been
>>> prematurely and spuriously increased, crudely to prevent an infinite
>>> recursion, falsely indicating to the syntax-ppss infrastructure that the
>>> syntax-table properties have already been applied to the region (BEGIN
>>> END).
> 
>>>       .... but it should not call `syntax-ppss-flush-cache', ....
> 
>>> Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
>>> back to its true value, allowing the wrongly allowed syntax-ppss calls at
>>> a later position to cause a recursive loop.
> 
>> Maybe we should "allow" it to loop, in certain cases? Leaving it to be
>> the responsibility of the programmer, to make sure the result doesn't
>> infloop, even if these rules are violated.
> 
> I'm not sure how this could work.  We would need to formalise the rules
> very carefully, to avoid the need to read syntax.{c,el}'s source code.
> 
>>>       .... which means that it should not call `syntax-ppss' on some
>>>       position and later modify the buffer on some earlier position.
> 
>>> This is a bad restriction, because sometimes syntax-table properties can
>>> only be correctly determined by examining the syntax of later buffer
>>> positions.  An example of this is giving the string-fence syntax-table
>>> text property to an unbalanced opening string quote, but not to correctly
>>> matched quotes.
> 
>> I'm not exactly convinced by the given example (why would we use the
>> string-fence in that case?), but it might be better if something like
>> this was possible, indeed.
> 
> String fence can be used to signal to font lock that the delimiter
> (together with the "mismatching" unescaped EOL) should be fontified in
> warning face.
> 
> A better example might be C++ Mode's marking of a "< ... >" pair with
> paren syntax.  This isn't done with syntax-propertize-function (as you
> know), but it would be nice if this were possible.
> 
>>> 2. syntax-propertize-function's are banned from using syntax-ppss, the
>>> documentation instead directing them to use parse-partial-sexp directly.
> 
>> The ones that currently call syntax-ppss, can't simply switch over to
>> parse-partial-sexp without becoming slower due to the lack of cache.
> 
> The cache at the pertinent buffer position doesn't exist at the time:
> consistent syntax-table properties aren't on the preceding buffer
> positions.
> 
>> Before tackling this bug, I'd rather we see a real-world problem that it
>> caused, and pick a particular approach based on it.
> 
> My enhancements for bug#30393: "24.4; cperl-mode: indentation failure -
> Documentation enhancements", where (almost) any change which affects the
> syntactic state is programmed to call syntax-ppss-flush-cache from the C
> level, clashes with the mechanism in this bug report.  Most of the time
> it's fine, but when a change affecting the syntactic state is made from
> inside a synax-propertize-function, Emacs goes into an infinite recursive
> loop.
> 
> This isn't good.
> 
>> But off the top of my head, we could introduce a "stricter but somewhat
>> slower" variation of syntax-ppss to be called inside
>> syntax-propertize-function's, which would treat the values in question
>> more carefully, somehow.
> 
> That's an idea worth exploring.
> 

Hi folks,

from what I've seen month ago just may stress the term fundamental.
Gave up to follow details WRT to check-ins made.

That part needs some person treating the gordic knot according to its 
quality...








^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-05-13  7:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 21:08 bug#31290: Fundamental bugs in syntax-propertize Alan Mackenzie
2018-05-08 12:35 ` Dmitry Gutov
2018-05-12 11:26   ` Alan Mackenzie
2018-05-13  7:33     ` Andreas Röhler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).