* Handling extensions of programming languages @ 2021-03-19 18:53 Harald Jörg 2021-03-20 17:02 ` Matt Armstrong 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-19 18:53 UTC (permalink / raw) To: Emacs Developer List Hello List, today I'm looking for advice or hints how to deal with a task for CPerl mode which might have been solved for other programming languages: How to handle extensions of the language. That's not about user-defined functions, but about extensions that change what needs to be included in imenu, or which affect highlighting (cumbersome but straightforward) and indentation (tricky). * Is it a good idea to implement each of them as a minor mode which only makes sense in CPerl mode buffers? * Or should the extensions be loaded by a command from CPerl mode? * Should that be one multi-file package or should each extension go into a package of its own? Or even a mixture of both, to allow contributions from ELPA and Non-GNU ELPA? * Are there templates or conventions to follow (beyond the rules how to build packages, I'm aware of these)? Background: In Perl, adding new syntax to the language is easy enough so that many developers have done this and published their work as extension modules on CPAN. Some of these extensions have become very popular, some are quite exotic. Occasionally they are competing with each other for the same keywords, but with different syntax. Eventually, popular keywords might make it into the Perl core, with yet another syntax. My first approach was to keep all the code in one place and evaluate all the font-lock and indenting variables at runtime, as buffer-local variables, for the different versions. This works to some extent for highlightingq, but fails if an extension needs different logic for indentation. -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-19 18:53 Handling extensions of programming languages Harald Jörg @ 2021-03-20 17:02 ` Matt Armstrong 2021-03-20 23:40 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Matt Armstrong @ 2021-03-20 17:02 UTC (permalink / raw) To: Harald Jörg, Emacs Developer List haj@posteo.de (Harald Jörg) writes: > today I'm looking for advice or hints how to deal with a task for > CPerl mode which might have been solved for other programming > languages: How to handle extensions of the language. [...] > Background: In Perl, adding new syntax to the language is easy enough > so that many developers have done this [...] > My first approach was to keep all the code in one place and evaluate > all the font-lock and indenting variables at runtime, as buffer-local > variables, for the different versions. This works to some extent for > highlightingq, but fails if an extension needs different logic for > indentation. I'm not an expert in this topic it pertains to Emacs itself, but I've always editor and development tools interesting and so have paid attention to these issues over the years. Very good Emacs support for languages with flexible syntax, which have a high level of faithfulness to the language, or even "perfect" faithfulness, all seem to rely on tools native to the language and external to Emacs, usually by way of some sort of external server. Examples: SLIME and Sly for Common Lisp, https://www.racket-mode.com/ for Racket, and, to a lesser degree of functionality, every language with LSP support, especially C++ (which is known to be effectively impossible to parse faithfully without what amounts to an entire compiler frontend). Indentation (formatting) source code is part of the LSP protocol. The common theme seems to be using the interpreter/compiler itself to parse, without relying on the editor to understand the code deeply. For a different approach, you have examples of complete or nearly complete parsers written in Emacs Lisp. There is at least one parser for Javascript that was at one time fully compliant with the language standard to the point of providing a full parse tree to Lisp (https://elpa.gnu.org/packages/js2-mode.html). The CEDIT package has some complex parser technology. cc-mode for the C family of languages is surprisingly good. The drawback here is that, by design, any syntax extensions and local mini-DSLs, etc., must also have parsers written in Emacs Lisp. You see this issue with js2-mode, where it lags the current language standard a bit. (info "(ccmode)Custom Macros") is an example of how cc-mode supports a limited form of syntax extension. I think most modes in Emacs Lisp take a pragmatic approach, using heuristics that get the job done most of the time without being too computationally expensive. The SMIE package is a generalization of this idea, see (info "(elisp)SMIE"). I am not aware of anything like SMIE that allows for languages extensions to be "plugged in" in a general way. In languages that support 'embeddng' other languages in sub-sections of code (e.g. CSS or PHP in HTML), the kinds of approaches seen at https://www.emacswiki.org/emacs/MultipleModes have been tried. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-20 17:02 ` Matt Armstrong @ 2021-03-20 23:40 ` Harald Jörg 2021-03-21 2:18 ` Clément Pit-Claudel 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-20 23:40 UTC (permalink / raw) To: Matt Armstrong; +Cc: Emacs Developer List Matt Armstrong <matt@rfc20.org> writes: > I'm not an expert in this topic it pertains to Emacs itself, but I've > always editor and development tools interesting and so have paid > attention to these issues over the years. Thanks for sharing your insights! > [...] > Very good Emacs support for languages with flexible syntax, which have a > high level of faithfulness to the language, or even "perfect" > faithfulness, all seem to rely on tools native to the language and > external to Emacs, usually by way of some sort of external server. > [...] The common theme seems to be using the interpreter/compiler > itself to parse, without relying on the editor to understand the code > deeply. This is fine. For Perl, this has some limitations since you actually need to run parts of the code to find out whether it compiles (or, more precise, whether it can be interpreted correctly). This might be undesired, e.g. for security reasons with "unknown" code. > For a different approach, you have examples of complete or nearly > complete parsers written in Emacs Lisp. > [...] > The drawback here is that, by design, any syntax extensions and > local mini-DSLs, etc., must also have parsers written in Emacs Lisp. Exactly! "How hard can that be?" -- Damian Conway, in a presentation which shows, among other tricks, a ~2000-line Perl regular expression which matches (not actually parses) Perl code. I *guess* that Emacs Lisp is well suited for a pragmatic/heuristic approach, and I want to give it a try. > (info "(ccmode)Custom Macros") is an example of how cc-mode supports a > limited form of syntax extension. Many thanks! This is the sort of pointers I'm after. I'll take a look how this is implemented. > I think most modes in Emacs Lisp take a pragmatic approach, using > heuristics that get the job done most of the time without being too > computationally expensive. The SMIE package is a generalization of this > idea, see (info "(elisp)SMIE"). > I am not aware of anything like SMIE that allows for languages > extensions to be "plugged in" in a general way. Well, I have my doubts that Perl is a good candidate for SMIE, and trying to use SMIE in CPerl mode would be a major rewrite anyway. I guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to do the job. -- Cheers, and again thanks for your time, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-20 23:40 ` Harald Jörg @ 2021-03-21 2:18 ` Clément Pit-Claudel 2021-03-21 11:41 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Clément Pit-Claudel @ 2021-03-21 2:18 UTC (permalink / raw) To: emacs-devel On 3/20/21 7:40 PM, Harald Jörg wrote: > Well, I have my doubts that Perl is a good candidate for SMIE, and > trying to use SMIE in CPerl mode would be a major rewrite anyway. I > guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to > do the job. I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not sure how it relates to font-lock-add-keywords and hooks, so maybe we're not thinking of the same thing? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-21 2:18 ` Clément Pit-Claudel @ 2021-03-21 11:41 ` Harald Jörg 2021-03-21 12:39 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-21 11:41 UTC (permalink / raw) To: Clément Pit-Claudel; +Cc: emacs-devel Clément Pit-Claudel <cpitclaudel@gmail.com> writes: > On 3/20/21 7:40 PM, Harald Jörg wrote: >> Well, I have my doubts that Perl is a good candidate for SMIE, and >> trying to use SMIE in CPerl mode would be a major rewrite anyway. I >> guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to >> do the job. > > I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not > sure how it relates to font-lock-add-keywords and hooks, so maybe > we're not thinking of the same thing? I admit that I don't know much about SMIE, so maybe I'm wrong here. Most of Perl is pretty similar to C or Java, but there are cases where Perl's syntax just can't be parsed statically. About the relation to font-lock-add-keywords - let me show an example. "Traditional" Perl has no keywords for object oriented programming, but there are dozens of extensions which add them. For example, with Object::Pad you can write (I apologize for the nonsensical example): class Coffee::Machine extends Lawn::Mower { has $grinder :reader :writer(replace_grinder) method grind { ...; } } If I want to support that with CPerl mode, I need to: - highlight class, extends, method and some more I haven't included in that example as keywords. That's where font-lock-add-keywords comes into play. Also, "Dishwasher" and "clean_up" should be highlighted like package and sub names. - add "Dishwasher" and "clean_up" to the imenu index. - make sure that indentation recognizes that the closing braces end a statement after "class" and "method". Perl syntax has various cases where it doesn't. I guess this is the part where SMIE would help. For the latter two tasks, I need to "hook" the logic somehow into CPerl's implementations of `imenu-create-index-function' and the various indentation functions. The current indentation code in CPerl mode is... a bit messy, and some old bugs call for attention anyway. If, however, that same class would be defined using the Dios extension, it would look like this: class Coffee::Machine is Lawn::Mower { has $.grinder is rw method grind { ... } } ...and also offer the keywords "func" and "submethod" for stuff that should go into imenu, and "lex" for declaring variables. There's a dozen or more other extensions providing OO frameworks for Perl. So, if the Emacs support for an extension could be done by a separate .el file, these could be developed within GNU Emacs, in GNU ELPA, but also contributed via NonGNU elpa, MELPA or GitHub. I am aware that probably as soon as such an extension mechanism is available, _someone_ will publish a Perl extension which can't be covered :) -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-21 11:41 ` Harald Jörg @ 2021-03-21 12:39 ` Stefan Monnier 2021-03-21 15:48 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-21 12:39 UTC (permalink / raw) To: Harald Jörg; +Cc: Clément Pit-Claudel, emacs-devel >>> Well, I have my doubts that Perl is a good candidate for SMIE, and >>> trying to use SMIE in CPerl mode would be a major rewrite anyway. I >>> guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to >>> do the job. >> >> I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not >> sure how it relates to font-lock-add-keywords and hooks, so maybe >> we're not thinking of the same thing? FWIW, I'm sure SMIE could be made to work, but I highly doubt it would "work wonderfully" in the sense that it would likely take a fair bit of effort to make SMIE indent Perl mode as well as the current indentation code in `cperl-mode` or in `perl-mode`. > class Coffee::Machine extends Lawn::Mower > { > has $grinder :reader :writer(replace_grinder) > method grind { ...; } > } [...] > - add "Dishwasher" and "clean_up" to the imenu index. That seems to require AI (unless you're talking about a slightly different example than the one quoted above ;-). > - make sure that indentation recognizes that the closing braces end a > statement after "class" and "method". Perl syntax has various cases > where it doesn't. I guess this is the part where SMIE would help. Actually, the closing brace which also closes a statement is one of the major pain points in `sm-c-mode`, so it would be one of the parts where you'd need extra work to make SMIE understand what's going on. > For the latter two tasks, I need to "hook" the logic somehow into > CPerl's implementations of `imenu-create-index-function' and the various > indentation functions. The current indentation code in CPerl mode > is... a bit messy, and some old bugs call for attention anyway. AFAIK font-lock and imenu are easy. For font-lock there's `font-lock-add-keywords` and for imenu, you should be able to make it work fairly well with just `add-function` to `imenu-create-index-function`. For indentation, it's fundamentally harder (for the same reason that combining two LALR grammars doesn't necessarily give you an LALR grammar), so it will have to be done in a somewhat ad-hoc way. I suspect that if the base mode uses SMIE, it would make it significantly easier to add extensions (because the structure of SMIE imposes constraints that expose the "compositional" aspect of the grammar, in some sense), but that's not what you have to work with currently, so you're going to have to dig into the indentation code and try and figure out how to make it work with your extension(s) and then how to express the changes "from outside" (e.g. by using hooks, `add-function`, or `advice-add`; we can of course add hooks to `cperl-mode.el` or `perl-mode.el` to make that easier). Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-21 12:39 ` Stefan Monnier @ 2021-03-21 15:48 ` Harald Jörg 2021-03-21 17:59 ` Stefan Monnier 2021-03-30 18:41 ` Handling extensions of programming languages Stephen Leake 0 siblings, 2 replies; 19+ messages in thread From: Harald Jörg @ 2021-03-21 15:48 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: > [...] >> class Coffee::Machine extends Lawn::Mower >> { >> has $grinder :reader :writer(replace_grinder) >> method grind { ...; } >> } > [...] >> - add "Dishwasher" and "clean_up" to the imenu index. > > That seems to require AI (unless you're talking about a slightly > different example than the one quoted above ;-). Ouch. I goofed up when deleting stuff from my test file to make the example shorter :) I wanted to add "grind" instead of "clean_up". But, jokes aside: I actually consider adding entries to the imenu index _which aren't there._ In the example above, Object::Pad will silently create the methods `grinder' and `replace_grinder'. I think these *should* go to imenu because if your code in another source calls $cm->grinder you might otherwise have a hard time finding where that routine is declared. >> - make sure that indentation recognizes that the closing braces end a >> statement after "class" and "method". Perl syntax has various cases >> where it doesn't. I guess this is the part where SMIE would help. > > Actually, the closing brace which also closes a statement is one of the > major pain points in `sm-c-mode`, so it would be one of the parts where > you'd need extra work to make SMIE understand what's going on. Given the effort CPerl mode spends to distinguish these two I guessed so. There are some open bugs regarding indentation in CPerl mode (Bug#8077, Bug#11773, Bug#28640) which I'd like to fix while on the way. Also, a few days ago there was a discussion in the Perlmonks forum where CPerl mode guesses horribly wrong: https://www.perlmonks.org/?node_id=11129870 >> For the latter two tasks, I need to "hook" the logic somehow into >> CPerl's implementations of `imenu-create-index-function' and the various >> indentation functions. The current indentation code in CPerl mode >> is... a bit messy, and some old bugs call for attention anyway. > > AFAIK font-lock and imenu are easy. For font-lock there's > `font-lock-add-keywords` and for imenu, you should be able to make it > work fairly well with just `add-function` to > `imenu-create-index-function`. For certain values of easy :). But yes, that's the plan. The font-lock mechanism is indeed very powerful. For Object::Pad, the keyword declaration takes about 120 lines (in rx notation, which is rather wordy) - mostly due to the effort to avoid false positives. For imenu, adding regexps to the list of `or'ed expressions to search for seems to be an alternative. Or maybe it doesn't, if I want to add entries which can't be easily searched for. > For indentation, it's fundamentally harder (for the same reason that > combining two LALR grammars doesn't necessarily give you an LALR > grammar), so it will have to be done in a somewhat ad-hoc way. Indeed. Indentation needs more "context". > I suspect that if the base mode uses SMIE, it would make it > significantly easier to add extensions (because the structure of SMIE > imposes constraints that expose the "compositional" aspect of the > grammar, in some sense), but that's not what you have to work with > currently, so you're going to have to dig into the indentation code and > try and figure out how to make it work with your extension(s) and then > how to express the changes "from outside" (e.g. by using hooks, > `add-function`, or `advice-add`; we can of course add hooks > to `cperl-mode.el` or `perl-mode.el` to make that easier). Your last parens touch another interesting aspect: Can that stuff be used by cperl-mode.el _and_ perl-mode.el? Well, as it turns out, the font-lock stuff "works" for both. It looks a bit weird with Perl mode because the "new" keywords like `method' have different faces than the "old" ones like `my'. For imenu, things are different: Perl mode uses `imenu-generic-expression', whereas CPerl mode uses a rather complex `imenu-create-index-function ', so that it can prepend the current namespace to the name of functions. And as for indentation... I'd say the code in both modes needs to catch up with current perl before we consider extensions. Maybe they could share functions or regular expressions how to find the beginning of a function, or how to identify closing braces which terminate a statement: The specification for this logic comes from Perl and should be the same for both modes. -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-21 15:48 ` Harald Jörg @ 2021-03-21 17:59 ` Stefan Monnier 2021-03-22 14:08 ` Handling extensions of programming languages (Perl) Harald Jörg 2021-03-30 18:41 ` Handling extensions of programming languages Stephen Leake 1 sibling, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-21 17:59 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel > But, jokes aside: I actually consider adding entries to the imenu index > _which aren't there._ In the example above, Object::Pad will silently > create the methods `grinder' and `replace_grinder'. I think these > *should* go to imenu because if your code in another source calls > $cm->grinder you might otherwise have a hard time finding where that > routine is declared. I don't see any problem with that. You could even argue that they *are* there. >>> For the latter two tasks, I need to "hook" the logic somehow into >>> CPerl's implementations of `imenu-create-index-function' and the various >>> indentation functions. The current indentation code in CPerl mode >>> is... a bit messy, and some old bugs call for attention anyway. >> AFAIK font-lock and imenu are easy. For font-lock there's >> `font-lock-add-keywords` and for imenu, you should be able to make it >> work fairly well with just `add-function` to >> `imenu-create-index-function`. > For certain values of easy :). I meant "easy" in the sense that once you've figured out how to match those constructs and how to put the right face on the various parts, adding it modularly (e.g. from a minor mode) should be reasonably easy, because it shouldn't interact in too complex ways with the rest of the font-lock rules. > Your last parens touch another interesting aspect: Can that stuff be > used by cperl-mode.el _and_ perl-mode.el? For imenu and font-lock, I can't see why not. > Well, as it turns out, the font-lock stuff "works" for both. It looks a > bit weird with Perl mode because the "new" keywords like `method' have > different faces than the "old" ones like `my'. I'm not sure why that would be: AFAICT, both `perl-mode` and `cperl-mode` highlight keywords (like `sub`, `if`, `for`, ...) using the `font-lock-keyword-face` (they generally use fairly different faces, but this is a part where they agree ;-). > For imenu, things are different: Perl mode uses > `imenu-generic-expression', whereas CPerl mode uses a rather complex > `imenu-create-index-function ', so that it can prepend the current > namespace to the name of functions. If you code uses `add-function` on `imenu-create-index-function` it should work in both cases (`perl-mode` simply keeps the default value of `imenu-create-index-function` which is the function that implements `imenu-generic-expression`). > And as for indentation... I'd say the code in both modes needs to catch > up with current perl before we consider extensions. Maybe they could > share functions or regular expressions how to find the beginning of a > function, or how to identify closing braces which terminate a statement: > The specification for this logic comes from Perl and should be the same > for both modes. Consolidation between the two modes is progress, so you got my vote. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages (Perl) 2021-03-21 17:59 ` Stefan Monnier @ 2021-03-22 14:08 ` Harald Jörg 2021-03-22 14:48 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-22 14:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel [I've added "Perl" to the subject since this is going specific] Stefan Monnier <monnier@iro.umontreal.ca> writes: > [...] >> Your last parens touch another interesting aspect: Can that stuff be >> used by cperl-mode.el _and_ perl-mode.el? > > For imenu and font-lock, I can't see why not. Nice. How would the set of shared functions be distributed? In a new, separate file which both modes `require'? That file would then also be part of a CPerl distribution on ELPA, I guess? In my opinion, it would make sense in any case to use the "perl" prefix for shared stuff. >> Well, as it turns out, the font-lock stuff "works" for both. It looks a >> bit weird with Perl mode because the "new" keywords like `method' have >> different faces than the "old" ones like `my'. > > I'm not sure why that would be: AFAICT, both `perl-mode` and > `cperl-mode` highlight keywords (like `sub`, `if`, `for`, ...) using the > `font-lock-keyword-face` (they generally use fairly different faces, but > this is a part where they agree ;-). Overall they agree, but there are differences in details (some might even be unintended). For new keywords and syntax there's indeed no need to use different faces, but they should be somewhat consistent to existing highlighting. Some results from first tests and debugging: - Declarators (like "my") are type-face in perl-mode, keyword-face in cperl-mode. I noticed this because the new "has" is fontified by perl-mode (though it isn't part of Perl) and the additions don't override it. CPerl mode abuses type-face for builtin functions, I wonder how much stir it makes if this is changed. - Names of packages are not fontified in perl-mode when they are `use'd or `require'd (on closer inspection, this is probably unintended: The first capture group is either an empty string or punctuation/space and should be shy). - In cperl-mode, ':' is a symbol, but a punctuation character in perl-mode. This makes interpretation of "\\_<" different. Perhaps changing cperl-mode's syntax table to making ':' punctuation would be the way to go - but punctuation also has its deficits for perl-mode, as apparent in "package Foo::Bar", so i would need more work. I haven't investigated further. >> For imenu, things are different: Perl mode uses >> `imenu-generic-expression', whereas CPerl mode uses a rather complex >> `imenu-create-index-function ', so that it can prepend the current >> namespace to the name of functions. > > If you code uses `add-function` on `imenu-create-index-function` it > should work in both cases (`perl-mode` simply keeps the default value > of `imenu-create-index-function` which is the function that implements > `imenu-generic-expression`). Ah, yes, of course. I didn't think of that (nor read the docs). The two modes have different styles how they present their results, though. Adding new entries needs some "rearrangement" to put it into the right place(s) in the index. >> And as for indentation... I'd say the code in both modes needs to catch >> up with current perl before we consider extensions. Maybe they could >> share functions or regular expressions how to find the beginning of a >> function, or how to identify closing braces which terminate a statement: >> The specification for this logic comes from Perl and should be the same >> for both modes. > > Consolidation between the two modes is progress, so you got my vote. Thanks! -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages (Perl) 2021-03-22 14:08 ` Handling extensions of programming languages (Perl) Harald Jörg @ 2021-03-22 14:48 ` Stefan Monnier 2021-03-22 17:32 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-22 14:48 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel >> For imenu and font-lock, I can't see why not. > Nice. Beware: I might just be blinded by optimism. > How would the set of shared functions be distributed? Good question. I guess it largely depends on the size, the way you intend to distribute it, the possible bad interaction with other Perl extensions, ... E.g. for an extension which doesn't collide with any other known Perl extension, you could imagine enabling it by default (and maybe even forego offering a way to disable it). I think the most natural/convenient form of an extension that can be enabled or not would be as a minor mode. And as for where to put the code, it could be in a completely separate file, or directly in `perl-mode.el` (which `cperl-mode.el` could require: it's a mere 50kB compared to `cperl-mode.el`s 300kB). > Overall they agree, but there are differences in details (some might > even be unintended). For new keywords and syntax there's indeed no need > to use different faces, but they should be somewhat consistent to > existing highlighting. Some results from first tests and debugging: > > - Declarators (like "my") are type-face in perl-mode, keyword-face in > cperl-mode. I noticed this because the new "has" is fontified by > perl-mode (though it isn't part of Perl) and the additions don't > override it. I think such discrepancies are just misfeatures, so it would be nice to fix them (ideally by choosing that one that seems less arbitrary). Using type-face for `my` or `local` doesn't seem useful, so we should probably change them to keyword. > CPerl mode abuses type-face for builtin functions, I > wonder how much stir it makes if this is changed. Try it ;-) Unsurprisingly, I vote for using the font-lock-builtin-face for them. > - Names of packages are not fontified in perl-mode when they are `use'd > or `require'd (on closer inspection, this is probably unintended: The > first capture group is either an empty string or punctuation/space > and should be shy). Sounds like a bug, indeed. > - In cperl-mode, ':' is a symbol, but a punctuation character in perl-mode. Ah, right, this could make it significantly harder to share code between the two major modes. I don't think either choice is clearly superior, but we should make them agree on the syntax-table. > This makes interpretation of "\\_<" different. Perhaps changing > cperl-mode's syntax table to making ':' punctuation would be the > way to go - but punctuation also has its deficits for perl-mode, as > apparent in "package Foo::Bar", so i would need more work. > I haven't investigated further. I suspect it can also impact other parts of the code (since it impacts things like `forward-sexp`). I think my recommendation would be to change `perl-mode` to agree with `cperl-mode` since `perl-mode.el` is much smaller so the amount of breakage should be correspondingly smaller. [ Also, from a user's point of view it's good that `C-M-x` skips over the whole of "Foo::bar" instead of stopping after "Foo". ] > The two modes have different styles how they present their results, > though. Adding new entries needs some "rearrangement" to put it into > the right place(s) in the index. Then again, you could focus on making it "work well" for one of the modes (presumably `cperl-mode`) and content yourself with "works" for the other ;-) Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages (Perl) 2021-03-22 14:48 ` Stefan Monnier @ 2021-03-22 17:32 ` Harald Jörg 2021-03-22 18:27 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-22 17:32 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >>> For imenu and font-lock, I can't see why not. >> Nice. > > Beware: I might just be blinded by optimism. Optimism is my second name! (My first name is "Unwarranted") >> How would the set of shared functions be distributed? > > Good question. I guess it largely depends on the size, the way you > intend to distribute it, the possible bad interaction with other Perl > extensions, ... > > E.g. for an extension which doesn't collide with any other known Perl > extension, you could imagine enabling it by default (and maybe even > forego offering a way to disable it). I would prefer this behavior (not being able to disable it) for things that come with new Perl versions, but not for extensions. If you run an older Perl version (which happens all the time because distributions take time to catch up) it might behave strange, but I think that this is ok as a reminder that "at some time you will need to change this anyway". For extensions it is really difficult to find out whether they might collide with another extension. Sometimes new extensions quickly rise in popularity but need different handling. A typical example is exception handling with Try::Tiny which is lightweight and nice and all, but it comes with the pitfall that the try block, different from all other extensions for exception handling with try/catch/finally, _requires_ a semicolon. > I think the most natural/convenient form of an extension that can be > enabled or not would be as a minor mode. I've been hoping that this is acceptable. It brings a lot of infrastructure and therefore consistency for users. > And as for where to put the code, it could be in a completely separate > file, or directly in `perl-mode.el` (which `cperl-mode.el` could > require: it's a mere 50kB compared to `cperl-mode.el`s 300kB). I am leaning towards a completely separate file, but maybe not right now. In both cases the adventurous users who're using cperl-mode directly from the repository will then need to pick two files instead of one. If, one day, cperl-mode is made available via ELPA, this should not necessary require moving perl-mode to elpa as well. >> [...about discrepancies in syntax highlighting ...] > > I think such discrepancies are just misfeatures, so it would be nice to > fix them (ideally by choosing that one that seems less arbitrary). Agreed. > Using type-face for `my` or `local` doesn't seem useful, so we > should probably change them to keyword. That was my first thought as well. But then, the declarators appear in places where other languages have their types. And then again, there are (various, of course) Perl extensions which provide a type system for Perl, so you can write "my Str $string" or even "my ArrayRef[Int] $ref". I guess I need to *see* it for some time to find out whether in "my Str" both parts should have the same color or better shouldn't. >> CPerl mode abuses type-face for builtin functions, I >> wonder how much stir it makes if this is changed. > > Try it ;-) > Unsurprisingly, I vote for using the font-lock-builtin-face for them. I agree. CPerl mode uses different faces for "overridable" and "nonoverridable" builtins, but this distinction isn't that valuable these days (and it also changes between Perl versions). I've also received feedback that this distinction should go away. IIRC the non-"standard" colors of CPerl mode occasionally annoyed people which use Emacs with many different programming languages. > [...] >> - In cperl-mode, ':' is a symbol, but a punctuation character in perl-mode. > > Ah, right, this could make it significantly harder to share code between > the two major modes. I don't think either choice is clearly superior, > but we should make them agree on the syntax-table. > >> [...] > > I suspect it can also impact other parts of the code (since it impacts > things like `forward-sexp`). I think my recommendation would be to > change `perl-mode` to agree with `cperl-mode` since `perl-mode.el` is > much smaller so the amount of breakage should be correspondingly smaller. > [ Also, from a user's point of view it's good that `C-M-x` skips over the > whole of "Foo::bar" instead of stopping after "Foo". ] Good point! For the moment I'll be a coward and avoid that decision by honing the regular expressions with that differnce in mind :) >> The two modes have different styles how they present their results, >> though. Adding new entries needs some "rearrangement" to put it into >> the right place(s) in the index. > > Then again, you could focus on making it "work well" for one of the modes > (presumably `cperl-mode`) and content yourself with "works" for the > other ;-) That's probably an acceptable way forward. Time to roll up my sleeves ... and grab some more coffee. -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages (Perl) 2021-03-22 17:32 ` Harald Jörg @ 2021-03-22 18:27 ` Stefan Monnier 2021-03-22 19:31 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-22 18:27 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel >> E.g. for an extension which doesn't collide with any other known Perl >> extension, you could imagine enabling it by default (and maybe even >> forego offering a way to disable it). > I would prefer this behavior (not being able to disable it) for things > that come with new Perl versions, but not for extensions. I wasn't recommending any particular choice. Just mentioning what I would consider as acceptable. >> And as for where to put the code, it could be in a completely separate >> file, or directly in `perl-mode.el` (which `cperl-mode.el` could >> require: it's a mere 50kB compared to `cperl-mode.el`s 300kB). > I am leaning towards a completely separate file, but maybe not right > now. In both cases the adventurous users who're using cperl-mode > directly from the repository will then need to pick two files instead of > one. If, one day, cperl-mode is made available via ELPA, this should > not necessary require moving perl-mode to elpa as well. I don't see any problem with a :core `cperl-mode` package which comes bundled with its own version of `perl-mode.el` (nor would I find it problematic to export `perl-mode.el` into its own :core GNU ELPA package). > That was my first thought as well. But then, the declarators appear in > places where other languages have their types. [ I think you use a very restricted definition of "other languages" here. It's definitely not the case for most of the statically typed languages I've used, except for C. I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ] Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages (Perl) 2021-03-22 18:27 ` Stefan Monnier @ 2021-03-22 19:31 ` Harald Jörg 2021-03-22 19:58 ` [OFFTOPIC] " Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-22 19:31 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier writes: > [...] > I don't see any problem with a :core `cperl-mode` package which comes > bundled with its own version of `perl-mode.el` (nor would I find it > problematic to export `perl-mode.el` into its own :core GNU ELPA > package). That's great! So that should be an issue at all. I've yet to get familiar with the procedures around GNU ELPA. >> That was my first thought as well. But then, the declarators appear in >> places where other languages have their types. > > [ I think you use a very restricted definition of "other languages" here. > It's definitely not the case for most of the statically typed languages > I've used, except for C. > I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ] Guilty, your honor. In the last years I've dealt with Emacs Lisp (only very recently), Perl, C, Java, JavaScript ... and before that with a dialect of PL/1, assembly languages (68000, x86, /390) ... and before that with FORTRAN, where everyone's type system seemed to be IMPLICIT INTEGER I-N. So indeed, almost no intersection with your list. -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* [OFFTOPIC] Re: Handling extensions of programming languages (Perl) 2021-03-22 19:31 ` Harald Jörg @ 2021-03-22 19:58 ` Stefan Monnier 2021-03-22 22:05 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-22 19:58 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel >>> That was my first thought as well. But then, the declarators appear in >>> places where other languages have their types. >> >> [ I think you use a very restricted definition of "other languages" here. >> It's definitely not the case for most of the statically typed languages >> I've used, except for C. >> I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ] > > Guilty, your honor. In the last years I've dealt with Emacs Lisp (only > very recently), Perl, C, Java, JavaScript ... and before that with a > dialect of PL/1, assembly languages (68000, x86, /390) ... and before > that with FORTRAN, where everyone's type system seemed to be IMPLICIT > INTEGER I-N. So indeed, almost no intersection with your list. Of those the only ones that are statically typed seem to be C, Java, Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where the type is placed at a location comparable to where `my` is placed in Perl, IMO. Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl) 2021-03-22 19:58 ` [OFFTOPIC] " Stefan Monnier @ 2021-03-22 22:05 ` Harald Jörg 2021-03-22 22:24 ` Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-22 22:05 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier <monnier@iro.umontreal.ca> writes: >>>> That was my first thought as well. But then, the declarators appear in >>>> places where other languages have their types. >>> >>> [ I think you use a very restricted definition of "other languages" here. >>> It's definitely not the case for most of the statically typed languages >>> I've used, except for C. >>> I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ] >> >> Guilty, your honor. In the last years I've dealt with Emacs Lisp (only >> very recently), Perl, C, Java, JavaScript ... and before that with a >> dialect of PL/1, assembly languages (68000, x86, /390) ... and before >> that with FORTRAN, where everyone's type system seemed to be IMPLICIT >> INTEGER I-N. So indeed, almost no intersection with your list. > > Of those the only ones that are statically typed seem to be C, Java, > Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where > the type is placed at a location comparable to where `my` is placed in > Perl, IMO. Fortran, too, unless you do the IMPLICIT trick. But anyway: Looking at the Emacs modes for Java and C, all keywords like "private" and "static" (which, similar to "my" in Perl, define scope rather than type) are in keyword-face. This would indicate that keyword-face is to be preferred for the declarators, and type-face for the types. -- Cheers, haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl) 2021-03-22 22:05 ` Harald Jörg @ 2021-03-22 22:24 ` Stefan Monnier 2021-03-22 23:43 ` Harald Jörg 0 siblings, 1 reply; 19+ messages in thread From: Stefan Monnier @ 2021-03-22 22:24 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel >> Of those the only ones that are statically typed seem to be C, Java, >> Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where >> the type is placed at a location comparable to where `my` is placed in >> Perl, IMO. > > Fortran, too, unless you do the IMPLICIT trick. OK, I must admit that my knowledge of Fortran syntax is poor, so I had done a quick search and found https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap02/declare.html which seems to suggest that the syntax is "TYPE :: VARS", which seemed different enough from "TYPE VAR". OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-) Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl) 2021-03-22 22:24 ` Stefan Monnier @ 2021-03-22 23:43 ` Harald Jörg 2021-03-23 3:49 ` [OFFTOPIC] " Stefan Monnier 0 siblings, 1 reply; 19+ messages in thread From: Harald Jörg @ 2021-03-22 23:43 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel Stefan Monnier writes: >>> Of those the only ones that are statically typed seem to be C, Java, >>> Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where >>> the type is placed at a location comparable to where `my` is placed in >>> Perl, IMO. >> >> Fortran, too, unless you do the IMPLICIT trick. > > OK, I must admit that my knowledge of Fortran syntax is poor, so I had > done a quick search and found > https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap02/declare.html > which seems to suggest that the syntax is "TYPE :: VARS", which seemed > different enough from "TYPE VAR". > > OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-) I'm happy with that! Also, I might sort of deserve half points. As I wrote, Fortran was my first computer language. I should have specified: I started with Fortran 4 and left with FORTRAN 77. Both had declarations like "INTEGER N". (https://web.stanford.edu/class/me200c/tutorial_77/). "TYPE :: VARS" came only with Fortran 90. And now I feel old. https://www.rickmurphy.net/advent/ADVENT.FT is an example of FORTRAN code from that era (the legendary Colossal Cave Adventure). Emacs, of course, supports both dialects! -- A HOLLOW VOICE SAYS "PLUGH". haj ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [OFFTOPIC] Handling extensions of programming languages (Perl) 2021-03-22 23:43 ` Harald Jörg @ 2021-03-23 3:49 ` Stefan Monnier 0 siblings, 0 replies; 19+ messages in thread From: Stefan Monnier @ 2021-03-23 3:49 UTC (permalink / raw) To: Harald Jörg; +Cc: emacs-devel >> OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-) > I'm happy with that! Also, I might sort of deserve half points. As I > wrote, Fortran was my first computer language. I should have specified: > I started with Fortran 4 and left with FORTRAN 77. Both had > declarations like "INTEGER N". > (https://web.stanford.edu/class/me200c/tutorial_77/). "TYPE :: VARS" > came only with Fortran 90. And now I feel old. Very good point. This "::" syntax indeed didn't remind me of anything, which you've now explained (my experience with Fortran is quite limited and moreover limited to FORTRAN 77). So I guess I'm forced to concede that you're up to 75%. Damn! Stefan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Handling extensions of programming languages 2021-03-21 15:48 ` Harald Jörg 2021-03-21 17:59 ` Stefan Monnier @ 2021-03-30 18:41 ` Stephen Leake 1 sibling, 0 replies; 19+ messages in thread From: Stephen Leake @ 2021-03-30 18:41 UTC (permalink / raw) To: Harald Jörg; +Cc: Stefan Monnier, emacs-devel haj@posteo.de (Harald Jörg) writes: >> For indentation, it's fundamentally harder (for the same reason that >> combining two LALR grammars doesn't necessarily give you an LALR >> grammar), so it will have to be done in a somewhat ad-hoc way. > > Indeed. Indentation needs more "context". The Gnu ELPA package 'wisi' provides a way to declare indentation in the grammar as actions; that provides all the context needed. The wisi parsers also have excellent error correction, so the grammar actions operate on a complete syntax tree (or fail utterly when the input is really bad). I have not tried to use wisi for Perl; it works for Ada and Java. This does not address your issue of extending a language with new syntax; as far as wisi is concerned, that is a new language, and needs an entirely new grammar file. This is true for any LR parser. It may not be true for a packrat parser, although the base parser would have to provide hooks in each nonterminal parsing routine. In wisi, it might be possible to extend the grammar file syntax with something like: #base_grammar <grammar file> but it would still generate separate parsers for the base and extended languages. As long as the extended language is a superset of the base language, it mostly doesn't hurt to always use the extended language parser. The ada-mode parser implements a language that is an extension of standard Ada 2012; that reduces conflicts and simplifies specifying indentation. One downside of using an extended parser; it will not report syntax errors for extended syntax in a file that is not supposed to contain any. For ada-mode this is not a significant problem; the extensions allow things that no Ada programmer would write even by mistake, and the real compiler catches them soon enough. > And as for indentation... I'd say the code in both modes needs to catch > up with current perl before we consider extensions. Maybe they could > share functions or regular expressions how to find the beginning of a > function, or how to identify closing braces which terminate a statement: > The specification for this logic comes from Perl and should be the same > for both modes. The reason I started the wisi package and WisiToken parser generator was to migrate ada-mode away from ad-hoc code to grammar based code, to support Ada 2012. To work well, the parser needs to be error correcting. SMIE is inherently more error tolerant than an LR parser without error correction, but I doubt it's good enough for indent. -- -- Stephe ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-03-30 18:41 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-19 18:53 Handling extensions of programming languages Harald Jörg 2021-03-20 17:02 ` Matt Armstrong 2021-03-20 23:40 ` Harald Jörg 2021-03-21 2:18 ` Clément Pit-Claudel 2021-03-21 11:41 ` Harald Jörg 2021-03-21 12:39 ` Stefan Monnier 2021-03-21 15:48 ` Harald Jörg 2021-03-21 17:59 ` Stefan Monnier 2021-03-22 14:08 ` Handling extensions of programming languages (Perl) Harald Jörg 2021-03-22 14:48 ` Stefan Monnier 2021-03-22 17:32 ` Harald Jörg 2021-03-22 18:27 ` Stefan Monnier 2021-03-22 19:31 ` Harald Jörg 2021-03-22 19:58 ` [OFFTOPIC] " Stefan Monnier 2021-03-22 22:05 ` Harald Jörg 2021-03-22 22:24 ` Stefan Monnier 2021-03-22 23:43 ` Harald Jörg 2021-03-23 3:49 ` [OFFTOPIC] " Stefan Monnier 2021-03-30 18:41 ` Handling extensions of programming languages Stephen Leake
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.