Writing syntax-propertize-function for strings in code in strings, etc

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Writing syntax-propertize-function for strings in code in strings, etc
@ 2012-09-08  3:23 Dmitry Gutov
  2012-09-08 19:31 ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Gutov @ 2012-09-08  3:23 UTC (permalink / raw)
  To: emacs-devel

Hi all,

I've been looking into this bug: http://bugs.ruby-lang.org/issues/6090

To elaborate, Ruby allows arbitrary code between string interpolation
braces, and even unlimited nesting of those.

Sublime Text handles these aspects rather excellently, and even
highlights the code inside as code, not string contents:
http://i.imgur.com/NH1Ye.png

Is there a proper way to do so in Emacs?

My first idea was, when propertizing interpolation, to see what kind of
string we're inside, and apply the appropriate syntax to the enclosing
braces, thus splitting the literal in two.  But (a) string quotes class
doesn't work that way (text characters on both ends of a literal must
be the same), (b) if we're inside a percent literal (syntax class:
generic string), and the literal spans several lines, we need to be able
to jump to its real beginning position from its end, but with this
approach (nth 8 (syntax-ppss)) will just return the beginning of the
last piece.  Saving buffer positions to text properties looks not very
reliable, since the respective text may be deleted and re-inserted.

Suggestions?

A quick and dirty way is to limit the support to double-quoted strings,
no change in highlighting, and no nesting, but that would be the last
resort.

--Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-09-08  3:23 Writing syntax-propertize-function for strings in code in strings, etc Dmitry Gutov
@ 2012-09-08 19:31 ` Stefan Monnier
  2012-09-09  0:13   ` Dmitry Gutov
       [not found]   ` <504FE870.7070002@yandex.ru>
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Monnier @ 2012-09-08 19:31 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Sublime Text handles these aspects rather excellently, and even
> highlights the code inside as code, not string contents:
> http://i.imgur.com/NH1Ye.png
> Is there a proper way to do so in Emacs?

Currently, it's pretty difficult for Emacs to handle it like in the
picture above.

> My first idea was, when propertizing interpolation, to see what kind of
> string we're inside, and apply the appropriate syntax to the enclosing
> braces, thus splitting the literal in two.  But (a) string quotes class
> doesn't work that way (text characters on both ends of a literal must
> be the same), (b) if we're inside a percent literal (syntax class:
> generic string), and the literal spans several lines, we need to be able
> to jump to its real beginning position from its end, but with this
> approach (nth 8 (syntax-ppss)) will just return the beginning of the
> last piece.  Saving buffer positions to text properties looks not very
> reliable, since the respective text may be deleted and re-inserted.

> Suggestions?

I think the better approach is to extend syntax.c with such a notion of
"syntax within strings".  This could hopefully be used for:
- Strings within strings (e.g. Postscript nested strings).
- Comments within strings (I think some regexps allow comments).
- Code within strings (as here and in shell scripts).
I'm not sure what that would look like concretely.  Maybe a new string
quote syntax which specifies a syntax-table to use within the string?


        Stefan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-09-08 19:31 ` Stefan Monnier
@ 2012-09-09  0:13   ` Dmitry Gutov
       [not found]   ` <504FE870.7070002@yandex.ru>
  1 sibling, 0 replies; 9+ messages in thread
From: Dmitry Gutov @ 2012-09-09  0:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 08.09.2012 23:31, Stefan Monnier wrote:
>> Sublime Text handles these aspects rather excellently, and even
>> highlights the code inside as code, not string contents:
>> http://i.imgur.com/NH1Ye.png
>> Is there a proper way to do so in Emacs?
>
> Currently, it's pretty difficult for Emacs to handle it like in the
> picture above.
>
>> My first idea was, when propertizing interpolation, to see what kind of
>> string we're inside, and apply the appropriate syntax to the enclosing
>> braces, thus splitting the literal in two.  But (a) string quotes class
>> doesn't work that way (text characters on both ends of a literal must
>> be the same), (b) if we're inside a percent literal (syntax class:
>> generic string), and the literal spans several lines, we need to be able
>> to jump to its real beginning position from its end, but with this
>> approach (nth 8 (syntax-ppss)) will just return the beginning of the
>> last piece.  Saving buffer positions to text properties looks not very
>> reliable, since the respective text may be deleted and re-inserted.
>
>> Suggestions?
>
> I think the better approach is to extend syntax.c with such a notion of
> "syntax within strings".  This could hopefully be used for:
> - Strings within strings (e.g. Postscript nested strings).
> - Comments within strings (I think some regexps allow comments).
> - Code within strings (as here and in shell scripts).
> I'm not sure what that would look like concretely.  Maybe a new string
> quote syntax which specifies a syntax-table to use within the string?

In the current case, the syntactic meanings of characters are the same 
as outside the string, except a certain character should end the "inner" 
region and return the state after it to "inside string" (*).

Maybe just two new classes, similar to open and close parenthesis (to 
support nesting)?

* Preferably, only when it's not inside an "inner" string or comment.
At least, that's how it works in Ruby 1.9:

irb(main):011:0> %(#{"})"})
=> "})"

irb(main):013:0> %(#{#})
irb(main):014:0> })
=> ""

The above examples also won't work with current percent literals 
handling, but that's less important, I think.

parse-partial-sexp will probably need to keep some sort of stack for 
string-related data, so that when we're after the end of an "inner" 
region, we could find out what is the outer string's type and where it 
started.
And when inside the inner region, the position of its start.
Use the 9th state element and bump the total number to 10?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
       [not found]     ` <jwvlietxls1.fsf-monnier+emacs@gnu.org>
@ 2012-10-26 19:18       ` Dmitry Gutov
  2012-10-26 20:41         ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Gutov @ 2012-10-26 19:18 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

On 26.10.2012 20:19, Stefan Monnier wrote:
>>> I think the better approach is to extend syntax.c with such a notion of
>>> "syntax within strings".  This could hopefully be used for:
>>> - Strings within strings (e.g. Postscript nested strings).
>>> - Comments within strings (I think some regexps allow comments).
>>> - Code within strings (as here and in shell scripts).
>>> I'm not sure what that would look like concretely.  Maybe a new string
>>> quote syntax which specifies a syntax-table to use within the string?
>> In the current case, the syntactic meanings of characters are the same as
>> outside the string, except a certain character should end the "inner" region
>> and return the state after it to "inside string" (*).
>
> Right, that's the "code within string" case, where you just need one
> char to mean "pop last state".

Or that last character's text would just be assigned a class from the 
syntax-propertize-function, no different syntax table required.
Not sure how useful would the first option be.

>> Maybe just two new classes, similar to open and close parenthesis (to
>> support nesting)?
>
> Yes, one "push <inner-syntax-table>" and one "pop".

So, I don't see the usefulness of the <inner-syntax-table> value in the 
simple case of embedding code in the same language.

Unless we're doing something like the "multiple-modes" use case, which 
we discussed in another thread. This looks like a more general solution.

> Of course, this is fine for parse-partial-sexp, but it's a different
> matter for backward-sexp, where the "pop" would also need to know the
> <inner-syntax-table>.

Maybe in the latter case the scanning function, when encountering the 
"pop" syntax property, would just skip ahead until it finds the 
corresponding "push"?
Unless we want to support intersecting subregions, like ([{])}.

>> parse-partial-sexp will probably need to keep some sort of stack for
>> string-related data, so that when we're after the end of an "inner" region,
>> we could find out what is the outer string's type and where it started.
>> And when inside the inner region, the position of its start.
>> Use the 9th state element and bump the total number to 10?
>
> The total number is already 10.  And yes, I think we can use the 9th
> element.  Currently, the 9th element is a stack of open-paren positions.
> So, I think we can reuse it (presumably we'd want parens and nested
> strings to be "mutually properly nested").

--Dmitry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-10-26 19:18       ` Dmitry Gutov
@ 2012-10-26 20:41         ` Stefan Monnier
  2012-10-26 21:52           ` Dmitry Gutov
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2012-10-26 20:41 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

>> Yes, one "push <inner-syntax-table>" and one "pop".
> So, I don't see the usefulness of the <inner-syntax-table> value in the
> simple case of embedding code in the same language.

It's for the other cases: strings with strings and comments within strings.

> Unless we're doing something like the "multiple-modes" use case, which we
> discussed in another thread.

Yes, it potentially could be used for m-m-m, tho it would only be
a piece of the puzzle (and it's not clear how useful that piece would be
in the end, once we have the whole puzzle).

>> Of course, this is fine for parse-partial-sexp, but it's a different
>> matter for backward-sexp, where the "pop" would also need to know the
>> <inner-syntax-table>.
> Maybe in the latter case the scanning function, when encountering the "pop"
> syntax property, would just skip ahead until it finds the corresponding
> "push"?

Without knowing the inner syntax table, it's pretty difficult to know
what can be skipped (unless we assume that the "push" can only be marked
with a `syntax-table' text-property).

> Unless we want to support intersecting subregions, like ([{])}.

No, I don't think we have much hope to support that at this stage.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-10-26 20:41         ` Stefan Monnier
@ 2012-10-26 21:52           ` Dmitry Gutov
  2012-10-28 15:46             ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Gutov @ 2012-10-26 21:52 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 27.10.2012 0:41, Stefan Monnier wrote:
>>> Yes, one "push <inner-syntax-table>" and one "pop".
>> So, I don't see the usefulness of the <inner-syntax-table> value in the
>> simple case of embedding code in the same language.
>
> It's for the other cases: strings with strings and comments within strings.

Okay. I guess I just don't know [well enough] any languages with 
different embedded syntaxes.

>> Unless we're doing something like the "multiple-modes" use case, which we
>> discussed in another thread.
>
> Yes, it potentially could be used for m-m-m, tho it would only be
> a piece of the puzzle (and it's not clear how useful that piece would be
> in the end, once we have the whole puzzle).

It will help with third-party frameworks, at least, which is what we 
discussed back then, that (syntax-ppss) will return reasonable values.

>>> Of course, this is fine for parse-partial-sexp, but it's a different
>>> matter for backward-sexp, where the "pop" would also need to know the
>>> <inner-syntax-table>.
>> Maybe in the latter case the scanning function, when encountering the "pop"
>> syntax property, would just skip ahead until it finds the corresponding
>> "push"?
>
> Without knowing the inner syntax table, it's pretty difficult to know
> what can be skipped (unless we assume that the "push" can only be marked
> with a `syntax-table' text-property).

Indeed. But I think it's a reasonable assumption. In all cases I can 
think about the "region opener" is at least two characters long, and it 
often depends on the context (like only inside a string).

But suppose "push" characters can be set inside a syntax table.
Let's move point inside an embedded code region, maybe several levels 
deep. Now we want to call `forward-sexp'. How will it know the effective 
syntax-table value at that position?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-10-26 21:52           ` Dmitry Gutov
@ 2012-10-28 15:46             ` Stefan Monnier
  2012-11-02  4:54               ` Dmitry Gutov
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2012-10-28 15:46 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

>>>> Yes, one "push <inner-syntax-table>" and one "pop".
>>> So, I don't see the usefulness of the <inner-syntax-table> value in the
>>> simple case of embedding code in the same language.
>> It's for the other cases: strings with strings and comments within strings.
> Okay. I guess I just don't know [well enough] any languages with different
> embedded syntaxes.

I mentioned Postscript as a language that allows strings within strings
(Postscript strings are delimited by parentheses).

>> Without knowing the inner syntax table, it's pretty difficult to know
>> what can be skipped (unless we assume that the "push" can only be marked
>> with a `syntax-table' text-property).
> Indeed. But I think it's a reasonable assumption.

It's not for Postscript strings, I think.

> In all cases I can think about the "region opener" is at least two
> characters long, and it often depends on the context (like only inside
> a string).

OTOH, if we really need to find the inner syntax-table, we could ask
syntax-ppss to give the state right before the "pop", which will also
immediately tell us where is the matching push.
So maybe it's an acceptable workaround (provide a config variable to
either use syntax-ppss or assume that a push can only be within
a syntax-table text-property).

> Let's move point inside an embedded code region, maybe several levels
> deep.  Now we want to call `forward-sexp'.  How will it know the
> effective syntax-table value at that position?

That's not a new problem, actually, since if you start a `forward-sexp'
from within a string or within a comment you already get
similar problems.

        Stefan

PS: My secret longer-term agenda for world domination might include
    using push/pop for all strings and comments as well.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-10-28 15:46             ` Stefan Monnier
@ 2012-11-02  4:54               ` Dmitry Gutov
  2012-11-15  1:40                 ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Gutov @ 2012-11-02  4:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 28.10.2012 19:46, Stefan Monnier wrote:
>>>>> Yes, one "push <inner-syntax-table>" and one "pop".
>>>> So, I don't see the usefulness of the <inner-syntax-table> value in the
>>>> simple case of embedding code in the same language.
>>> It's for the other cases: strings with strings and comments within strings.
>> Okay. I guess I just don't know [well enough] any languages with different
>> embedded syntaxes.
>
> I mentioned Postscript as a language that allows strings within strings
> (Postscript strings are delimited by parentheses).

I see.

>>> Without knowing the inner syntax table, it's pretty difficult to know
>>> what can be skipped (unless we assume that the "push" can only be marked
>>> with a `syntax-table' text-property).
>> Indeed. But I think it's a reasonable assumption.
>
> It's not for Postscript strings, I think.
>
>> In all cases I can think about the "region opener" is at least two
>> characters long, and it often depends on the context (like only inside
>> a string).
>
> OTOH, if we really need to find the inner syntax-table, we could ask
> syntax-ppss to give the state right before the "pop", which will also
> immediately tell us where is the matching push.
> So maybe it's an acceptable workaround (provide a config variable to
> either use syntax-ppss or assume that a push can only be within
> a syntax-table text-property).

Sounds good to me.

>> Let's move point inside an embedded code region, maybe several levels
>> deep.  Now we want to call `forward-sexp'.  How will it know the
>> effective syntax-table value at that position?
>
> That's not a new problem, actually, since if you start a `forward-sexp'
> from within a string or within a comment you already get
> similar problems.

True, although with those it's mostly a non-problem, AFAICT: jumping 
forward or back from comments works just fine, and jumping solely 
between strings makes a certain amount of sense.

> PS: My secret longer-term agenda for world domination might include
>      using push/pop for all strings and comments as well.

Could you point to the use case?
I think this might be good for e.g. comments in regexps, but that's 
already going to be one of the uses for "code in strings", no?

P.S. Sorry for offtopic, but I've been receiving this kind of "message 
undelivered" messages for the past few weeks from 
mailer-daemon@yandex.ru, one for each email I sent you, each with 
considerable delay:

<monnier@IRO.UMontreal.CA>: connect to
     perlin.IRO.UMontreal.CA[132.204.24.51]:25: Connection timed out

Is this a problem on your end, or should I switch to another SMTP server?

--Dmitry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Writing syntax-propertize-function for strings in code in strings, etc
  2012-11-02  4:54               ` Dmitry Gutov
@ 2012-11-15  1:40                 ` Stefan Monnier
  0 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2012-11-15  1:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

>> That's not a new problem, actually, since if you start
>> a `forward-sexp' from within a string or within a comment you already
>> get similar problems.
> True, although with those it's mostly a non-problem, AFAICT: jumping forward
> or back from comments works just fine, and jumping solely between strings
> makes a certain amount of sense.

Both cases are problematic, but you're right that for multi-mode cases
those same problems come become more acute.

>> PS: My secret longer-term agenda for world domination might include
>> using push/pop for all strings and comments as well.
> Could you point to the use case?

World domination, obviously.

> <monnier@IRO.UMontreal.CA>: connect to
>     perlin.IRO.UMontreal.CA[132.204.24.51]:25: Connection timed out
> Is this a problem on your end, or should I switch to another SMTP server?

No idea.  But I wouldn't worry too much about it: the sysadmins here
like to use those nasty blacklisting services, so it's quite possible
that your server got on one of those blacklists for some idiotic reason,
and then it'll take a while to get off of it.  But any/all servers get
there at some point (even GNU's mailing-list server), so the best for
you is to hope it'll pass.

        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-15  1:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-08  3:23 Writing syntax-propertize-function for strings in code in strings, etc Dmitry Gutov
2012-09-08 19:31 ` Stefan Monnier
2012-09-09  0:13   ` Dmitry Gutov
     [not found]   ` <504FE870.7070002@yandex.ru>
     [not found]     ` <jwvlietxls1.fsf-monnier+emacs@gnu.org>
2012-10-26 19:18       ` Dmitry Gutov
2012-10-26 20:41         ` Stefan Monnier
2012-10-26 21:52           ` Dmitry Gutov
2012-10-28 15:46             ` Stefan Monnier
2012-11-02  4:54               ` Dmitry Gutov
2012-11-15  1:40                 ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).