* Writing syntax-propertize-function for strings in code in strings, etc @ 2012-09-08 3:23 Dmitry Gutov 2012-09-08 19:31 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Gutov @ 2012-09-08 3:23 UTC (permalink / raw) To: emacs-devel Hi all, I've been looking into this bug: http://bugs.ruby-lang.org/issues/6090 To elaborate, Ruby allows arbitrary code between string interpolation braces, and even unlimited nesting of those. Sublime Text handles these aspects rather excellently, and even highlights the code inside as code, not string contents: http://i.imgur.com/NH1Ye.png Is there a proper way to do so in Emacs? My first idea was, when propertizing interpolation, to see what kind of string we're inside, and apply the appropriate syntax to the enclosing braces, thus splitting the literal in two. But (a) string quotes class doesn't work that way (text characters on both ends of a literal must be the same), (b) if we're inside a percent literal (syntax class: generic string), and the literal spans several lines, we need to be able to jump to its real beginning position from its end, but with this approach (nth 8 (syntax-ppss)) will just return the beginning of the last piece. Saving buffer positions to text properties looks not very reliable, since the respective text may be deleted and re-inserted. Suggestions? A quick and dirty way is to limit the support to double-quoted strings, no change in highlighting, and no nesting, but that would be the last resort. --Dmitry ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-09-08 3:23 Writing syntax-propertize-function for strings in code in strings, etc Dmitry Gutov @ 2012-09-08 19:31 ` Stefan Monnier 2012-09-09 0:13 ` Dmitry Gutov [not found] ` <504FE870.7070002@yandex.ru> 0 siblings, 2 replies; 9+ messages in thread From: Stefan Monnier @ 2012-09-08 19:31 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel > Sublime Text handles these aspects rather excellently, and even > highlights the code inside as code, not string contents: > http://i.imgur.com/NH1Ye.png > Is there a proper way to do so in Emacs? Currently, it's pretty difficult for Emacs to handle it like in the picture above. > My first idea was, when propertizing interpolation, to see what kind of > string we're inside, and apply the appropriate syntax to the enclosing > braces, thus splitting the literal in two. But (a) string quotes class > doesn't work that way (text characters on both ends of a literal must > be the same), (b) if we're inside a percent literal (syntax class: > generic string), and the literal spans several lines, we need to be able > to jump to its real beginning position from its end, but with this > approach (nth 8 (syntax-ppss)) will just return the beginning of the > last piece. Saving buffer positions to text properties looks not very > reliable, since the respective text may be deleted and re-inserted. > Suggestions? I think the better approach is to extend syntax.c with such a notion of "syntax within strings". This could hopefully be used for: - Strings within strings (e.g. Postscript nested strings). - Comments within strings (I think some regexps allow comments). - Code within strings (as here and in shell scripts). I'm not sure what that would look like concretely. Maybe a new string quote syntax which specifies a syntax-table to use within the string? Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-09-08 19:31 ` Stefan Monnier @ 2012-09-09 0:13 ` Dmitry Gutov [not found] ` <504FE870.7070002@yandex.ru> 1 sibling, 0 replies; 9+ messages in thread From: Dmitry Gutov @ 2012-09-09 0:13 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel On 08.09.2012 23:31, Stefan Monnier wrote: >> Sublime Text handles these aspects rather excellently, and even >> highlights the code inside as code, not string contents: >> http://i.imgur.com/NH1Ye.png >> Is there a proper way to do so in Emacs? > > Currently, it's pretty difficult for Emacs to handle it like in the > picture above. > >> My first idea was, when propertizing interpolation, to see what kind of >> string we're inside, and apply the appropriate syntax to the enclosing >> braces, thus splitting the literal in two. But (a) string quotes class >> doesn't work that way (text characters on both ends of a literal must >> be the same), (b) if we're inside a percent literal (syntax class: >> generic string), and the literal spans several lines, we need to be able >> to jump to its real beginning position from its end, but with this >> approach (nth 8 (syntax-ppss)) will just return the beginning of the >> last piece. Saving buffer positions to text properties looks not very >> reliable, since the respective text may be deleted and re-inserted. > >> Suggestions? > > I think the better approach is to extend syntax.c with such a notion of > "syntax within strings". This could hopefully be used for: > - Strings within strings (e.g. Postscript nested strings). > - Comments within strings (I think some regexps allow comments). > - Code within strings (as here and in shell scripts). > I'm not sure what that would look like concretely. Maybe a new string > quote syntax which specifies a syntax-table to use within the string? In the current case, the syntactic meanings of characters are the same as outside the string, except a certain character should end the "inner" region and return the state after it to "inside string" (*). Maybe just two new classes, similar to open and close parenthesis (to support nesting)? * Preferably, only when it's not inside an "inner" string or comment. At least, that's how it works in Ruby 1.9: irb(main):011:0> %(#{"})"}) => "})" irb(main):013:0> %(#{#}) irb(main):014:0> }) => "" The above examples also won't work with current percent literals handling, but that's less important, I think. parse-partial-sexp will probably need to keep some sort of stack for string-related data, so that when we're after the end of an "inner" region, we could find out what is the outer string's type and where it started. And when inside the inner region, the position of its start. Use the 9th state element and bump the total number to 10? ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <504FE870.7070002@yandex.ru>]
[parent not found: <jwvlietxls1.fsf-monnier+emacs@gnu.org>]
* Re: Writing syntax-propertize-function for strings in code in strings, etc [not found] ` <jwvlietxls1.fsf-monnier+emacs@gnu.org> @ 2012-10-26 19:18 ` Dmitry Gutov 2012-10-26 20:41 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Gutov @ 2012-10-26 19:18 UTC (permalink / raw) To: Stefan Monnier, emacs-devel On 26.10.2012 20:19, Stefan Monnier wrote: >>> I think the better approach is to extend syntax.c with such a notion of >>> "syntax within strings". This could hopefully be used for: >>> - Strings within strings (e.g. Postscript nested strings). >>> - Comments within strings (I think some regexps allow comments). >>> - Code within strings (as here and in shell scripts). >>> I'm not sure what that would look like concretely. Maybe a new string >>> quote syntax which specifies a syntax-table to use within the string? >> In the current case, the syntactic meanings of characters are the same as >> outside the string, except a certain character should end the "inner" region >> and return the state after it to "inside string" (*). > > Right, that's the "code within string" case, where you just need one > char to mean "pop last state". Or that last character's text would just be assigned a class from the syntax-propertize-function, no different syntax table required. Not sure how useful would the first option be. >> Maybe just two new classes, similar to open and close parenthesis (to >> support nesting)? > > Yes, one "push <inner-syntax-table>" and one "pop". So, I don't see the usefulness of the <inner-syntax-table> value in the simple case of embedding code in the same language. Unless we're doing something like the "multiple-modes" use case, which we discussed in another thread. This looks like a more general solution. > Of course, this is fine for parse-partial-sexp, but it's a different > matter for backward-sexp, where the "pop" would also need to know the > <inner-syntax-table>. Maybe in the latter case the scanning function, when encountering the "pop" syntax property, would just skip ahead until it finds the corresponding "push"? Unless we want to support intersecting subregions, like ([{])}. >> parse-partial-sexp will probably need to keep some sort of stack for >> string-related data, so that when we're after the end of an "inner" region, >> we could find out what is the outer string's type and where it started. >> And when inside the inner region, the position of its start. >> Use the 9th state element and bump the total number to 10? > > The total number is already 10. And yes, I think we can use the 9th > element. Currently, the 9th element is a stack of open-paren positions. > So, I think we can reuse it (presumably we'd want parens and nested > strings to be "mutually properly nested"). --Dmitry ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-10-26 19:18 ` Dmitry Gutov @ 2012-10-26 20:41 ` Stefan Monnier 2012-10-26 21:52 ` Dmitry Gutov 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2012-10-26 20:41 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel >> Yes, one "push <inner-syntax-table>" and one "pop". > So, I don't see the usefulness of the <inner-syntax-table> value in the > simple case of embedding code in the same language. It's for the other cases: strings with strings and comments within strings. > Unless we're doing something like the "multiple-modes" use case, which we > discussed in another thread. Yes, it potentially could be used for m-m-m, tho it would only be a piece of the puzzle (and it's not clear how useful that piece would be in the end, once we have the whole puzzle). >> Of course, this is fine for parse-partial-sexp, but it's a different >> matter for backward-sexp, where the "pop" would also need to know the >> <inner-syntax-table>. > Maybe in the latter case the scanning function, when encountering the "pop" > syntax property, would just skip ahead until it finds the corresponding > "push"? Without knowing the inner syntax table, it's pretty difficult to know what can be skipped (unless we assume that the "push" can only be marked with a `syntax-table' text-property). > Unless we want to support intersecting subregions, like ([{])}. No, I don't think we have much hope to support that at this stage. Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-10-26 20:41 ` Stefan Monnier @ 2012-10-26 21:52 ` Dmitry Gutov 2012-10-28 15:46 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Gutov @ 2012-10-26 21:52 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel On 27.10.2012 0:41, Stefan Monnier wrote: >>> Yes, one "push <inner-syntax-table>" and one "pop". >> So, I don't see the usefulness of the <inner-syntax-table> value in the >> simple case of embedding code in the same language. > > It's for the other cases: strings with strings and comments within strings. Okay. I guess I just don't know [well enough] any languages with different embedded syntaxes. >> Unless we're doing something like the "multiple-modes" use case, which we >> discussed in another thread. > > Yes, it potentially could be used for m-m-m, tho it would only be > a piece of the puzzle (and it's not clear how useful that piece would be > in the end, once we have the whole puzzle). It will help with third-party frameworks, at least, which is what we discussed back then, that (syntax-ppss) will return reasonable values. >>> Of course, this is fine for parse-partial-sexp, but it's a different >>> matter for backward-sexp, where the "pop" would also need to know the >>> <inner-syntax-table>. >> Maybe in the latter case the scanning function, when encountering the "pop" >> syntax property, would just skip ahead until it finds the corresponding >> "push"? > > Without knowing the inner syntax table, it's pretty difficult to know > what can be skipped (unless we assume that the "push" can only be marked > with a `syntax-table' text-property). Indeed. But I think it's a reasonable assumption. In all cases I can think about the "region opener" is at least two characters long, and it often depends on the context (like only inside a string). But suppose "push" characters can be set inside a syntax table. Let's move point inside an embedded code region, maybe several levels deep. Now we want to call `forward-sexp'. How will it know the effective syntax-table value at that position? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-10-26 21:52 ` Dmitry Gutov @ 2012-10-28 15:46 ` Stefan Monnier 2012-11-02 4:54 ` Dmitry Gutov 0 siblings, 1 reply; 9+ messages in thread From: Stefan Monnier @ 2012-10-28 15:46 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel >>>> Yes, one "push <inner-syntax-table>" and one "pop". >>> So, I don't see the usefulness of the <inner-syntax-table> value in the >>> simple case of embedding code in the same language. >> It's for the other cases: strings with strings and comments within strings. > Okay. I guess I just don't know [well enough] any languages with different > embedded syntaxes. I mentioned Postscript as a language that allows strings within strings (Postscript strings are delimited by parentheses). >> Without knowing the inner syntax table, it's pretty difficult to know >> what can be skipped (unless we assume that the "push" can only be marked >> with a `syntax-table' text-property). > Indeed. But I think it's a reasonable assumption. It's not for Postscript strings, I think. > In all cases I can think about the "region opener" is at least two > characters long, and it often depends on the context (like only inside > a string). OTOH, if we really need to find the inner syntax-table, we could ask syntax-ppss to give the state right before the "pop", which will also immediately tell us where is the matching push. So maybe it's an acceptable workaround (provide a config variable to either use syntax-ppss or assume that a push can only be within a syntax-table text-property). > Let's move point inside an embedded code region, maybe several levels > deep. Now we want to call `forward-sexp'. How will it know the > effective syntax-table value at that position? That's not a new problem, actually, since if you start a `forward-sexp' from within a string or within a comment you already get similar problems. Stefan PS: My secret longer-term agenda for world domination might include using push/pop for all strings and comments as well. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-10-28 15:46 ` Stefan Monnier @ 2012-11-02 4:54 ` Dmitry Gutov 2012-11-15 1:40 ` Stefan Monnier 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Gutov @ 2012-11-02 4:54 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel On 28.10.2012 19:46, Stefan Monnier wrote: >>>>> Yes, one "push <inner-syntax-table>" and one "pop". >>>> So, I don't see the usefulness of the <inner-syntax-table> value in the >>>> simple case of embedding code in the same language. >>> It's for the other cases: strings with strings and comments within strings. >> Okay. I guess I just don't know [well enough] any languages with different >> embedded syntaxes. > > I mentioned Postscript as a language that allows strings within strings > (Postscript strings are delimited by parentheses). I see. >>> Without knowing the inner syntax table, it's pretty difficult to know >>> what can be skipped (unless we assume that the "push" can only be marked >>> with a `syntax-table' text-property). >> Indeed. But I think it's a reasonable assumption. > > It's not for Postscript strings, I think. > >> In all cases I can think about the "region opener" is at least two >> characters long, and it often depends on the context (like only inside >> a string). > > OTOH, if we really need to find the inner syntax-table, we could ask > syntax-ppss to give the state right before the "pop", which will also > immediately tell us where is the matching push. > So maybe it's an acceptable workaround (provide a config variable to > either use syntax-ppss or assume that a push can only be within > a syntax-table text-property). Sounds good to me. >> Let's move point inside an embedded code region, maybe several levels >> deep. Now we want to call `forward-sexp'. How will it know the >> effective syntax-table value at that position? > > That's not a new problem, actually, since if you start a `forward-sexp' > from within a string or within a comment you already get > similar problems. True, although with those it's mostly a non-problem, AFAICT: jumping forward or back from comments works just fine, and jumping solely between strings makes a certain amount of sense. > PS: My secret longer-term agenda for world domination might include > using push/pop for all strings and comments as well. Could you point to the use case? I think this might be good for e.g. comments in regexps, but that's already going to be one of the uses for "code in strings", no? P.S. Sorry for offtopic, but I've been receiving this kind of "message undelivered" messages for the past few weeks from mailer-daemon@yandex.ru, one for each email I sent you, each with considerable delay: <monnier@IRO.UMontreal.CA>: connect to perlin.IRO.UMontreal.CA[132.204.24.51]:25: Connection timed out Is this a problem on your end, or should I switch to another SMTP server? --Dmitry ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Writing syntax-propertize-function for strings in code in strings, etc 2012-11-02 4:54 ` Dmitry Gutov @ 2012-11-15 1:40 ` Stefan Monnier 0 siblings, 0 replies; 9+ messages in thread From: Stefan Monnier @ 2012-11-15 1:40 UTC (permalink / raw) To: Dmitry Gutov; +Cc: emacs-devel >> That's not a new problem, actually, since if you start >> a `forward-sexp' from within a string or within a comment you already >> get similar problems. > True, although with those it's mostly a non-problem, AFAICT: jumping forward > or back from comments works just fine, and jumping solely between strings > makes a certain amount of sense. Both cases are problematic, but you're right that for multi-mode cases those same problems come become more acute. >> PS: My secret longer-term agenda for world domination might include >> using push/pop for all strings and comments as well. > Could you point to the use case? World domination, obviously. > <monnier@IRO.UMontreal.CA>: connect to > perlin.IRO.UMontreal.CA[132.204.24.51]:25: Connection timed out > Is this a problem on your end, or should I switch to another SMTP server? No idea. But I wouldn't worry too much about it: the sysadmins here like to use those nasty blacklisting services, so it's quite possible that your server got on one of those blacklists for some idiotic reason, and then it'll take a while to get off of it. But any/all servers get there at some point (even GNU's mailing-list server), so the best for you is to hope it'll pass. Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-11-15 1:40 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-08 3:23 Writing syntax-propertize-function for strings in code in strings, etc Dmitry Gutov 2012-09-08 19:31 ` Stefan Monnier 2012-09-09 0:13 ` Dmitry Gutov [not found] ` <504FE870.7070002@yandex.ru> [not found] ` <jwvlietxls1.fsf-monnier+emacs@gnu.org> 2012-10-26 19:18 ` Dmitry Gutov 2012-10-26 20:41 ` Stefan Monnier 2012-10-26 21:52 ` Dmitry Gutov 2012-10-28 15:46 ` Stefan Monnier 2012-11-02 4:54 ` Dmitry Gutov 2012-11-15 1:40 ` Stefan Monnier
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.