unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Handling extensions of programming languages
@ 2021-03-19 18:53 Harald Jörg
  2021-03-20 17:02 ` Matt Armstrong
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-19 18:53 UTC (permalink / raw)
  To: Emacs Developer List

Hello List,

today I'm looking for advice or hints how to deal with a task for CPerl
mode which might have been solved for other programming languages: How
to handle extensions of the language.  That's not about user-defined
functions, but about extensions that change what needs to be included in
imenu, or which affect highlighting (cumbersome but straightforward) and
indentation (tricky).

   * Is it a good idea to implement each of them as a minor mode which
     only makes sense in CPerl mode buffers?

   * Or should the extensions be loaded by a command from CPerl mode?

   * Should that be one multi-file package or should each extension go
     into a package of its own?  Or even a mixture of both, to allow
     contributions from ELPA and Non-GNU ELPA?

   * Are there templates or conventions to follow (beyond the rules how
     to build packages, I'm aware of these)?

Background: In Perl, adding new syntax to the language is easy enough so
that many developers have done this and published their work as
extension modules on CPAN.  Some of these extensions have become very
popular, some are quite exotic.  Occasionally they are competing with
each other for the same keywords, but with different syntax.
Eventually, popular keywords might make it into the Perl core, with yet
another syntax.

My first approach was to keep all the code in one place and evaluate all
the font-lock and indenting variables at runtime, as buffer-local
variables, for the different versions.  This works to some extent for
highlightingq, but fails if an extension needs different logic for
indentation.
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-19 18:53 Handling extensions of programming languages Harald Jörg
@ 2021-03-20 17:02 ` Matt Armstrong
  2021-03-20 23:40   ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Matt Armstrong @ 2021-03-20 17:02 UTC (permalink / raw)
  To: Harald Jörg, Emacs Developer List

haj@posteo.de (Harald Jörg) writes:

> today I'm looking for advice or hints how to deal with a task for
> CPerl mode which might have been solved for other programming
> languages: How to handle extensions of the language.

[...]

> Background: In Perl, adding new syntax to the language is easy enough
> so that many developers have done this

[...]

> My first approach was to keep all the code in one place and evaluate
> all the font-lock and indenting variables at runtime, as buffer-local
> variables, for the different versions.  This works to some extent for
> highlightingq, but fails if an extension needs different logic for
> indentation.

I'm not an expert in this topic it pertains to Emacs itself, but I've
always editor and development tools interesting and so have paid
attention to these issues over the years.

Very good Emacs support for languages with flexible syntax, which have a
high level of faithfulness to the language, or even "perfect"
faithfulness, all seem to rely on tools native to the language and
external to Emacs, usually by way of some sort of external server.

Examples: SLIME and Sly for Common Lisp, https://www.racket-mode.com/
for Racket, and, to a lesser degree of functionality, every language
with LSP support, especially C++ (which is known to be effectively
impossible to parse faithfully without what amounts to an entire
compiler frontend).  Indentation (formatting) source code is part of the
LSP protocol.  The common theme seems to be using the
interpreter/compiler itself to parse, without relying on the editor to
understand the code deeply.

For a different approach, you have examples of complete or nearly
complete parsers written in Emacs Lisp.  There is at least one parser
for Javascript that was at one time fully compliant with the language
standard to the point of providing a full parse tree to Lisp
(https://elpa.gnu.org/packages/js2-mode.html).  The CEDIT package has
some complex parser technology.  cc-mode for the C family of languages
is surprisingly good.  The drawback here is that, by design, any syntax
extensions and local mini-DSLs, etc., must also have parsers written in
Emacs Lisp.  You see this issue with js2-mode, where it lags the current
language standard a bit.

(info "(ccmode)Custom Macros") is an example of how cc-mode supports a
limited form of syntax extension.

I think most modes in Emacs Lisp take a pragmatic approach, using
heuristics that get the job done most of the time without being too
computationally expensive.  The SMIE package is a generalization of this
idea, see (info "(elisp)SMIE").

I am not aware of anything like SMIE that allows for languages
extensions to be "plugged in" in a general way.

In languages that support 'embeddng' other languages in sub-sections of
code (e.g. CSS or PHP in HTML), the kinds of approaches seen at
https://www.emacswiki.org/emacs/MultipleModes have been tried.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-20 17:02 ` Matt Armstrong
@ 2021-03-20 23:40   ` Harald Jörg
  2021-03-21  2:18     ` Clément Pit-Claudel
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-20 23:40 UTC (permalink / raw)
  To: Matt Armstrong; +Cc: Emacs Developer List

Matt Armstrong <matt@rfc20.org> writes:

> I'm not an expert in this topic it pertains to Emacs itself, but I've
> always editor and development tools interesting and so have paid
> attention to these issues over the years.

Thanks for sharing your insights!

> [...]
> Very good Emacs support for languages with flexible syntax, which have a
> high level of faithfulness to the language, or even "perfect"
> faithfulness, all seem to rely on tools native to the language and
> external to Emacs, usually by way of some sort of external server.
> [...] The common theme seems to be using the interpreter/compiler
> itself to parse, without relying on the editor to understand the code
> deeply.

This is fine.  For Perl, this has some limitations since you actually
need to run parts of the code to find out whether it compiles (or, more
precise, whether it can be interpreted correctly).  This might be
undesired, e.g. for security reasons with "unknown" code.

> For a different approach, you have examples of complete or nearly
> complete parsers written in Emacs Lisp.
> [...]
> The drawback here is that, by design, any syntax extensions and
> local mini-DSLs, etc., must also have parsers written in Emacs Lisp.

Exactly!  "How hard can that be?" -- Damian Conway, in a presentation
which shows, among other tricks, a ~2000-line Perl regular expression
which matches (not actually parses) Perl code.

I *guess* that Emacs Lisp is well suited for a pragmatic/heuristic
approach, and I want to give it a try.

> (info "(ccmode)Custom Macros") is an example of how cc-mode supports a
> limited form of syntax extension.

Many thanks!  This is the sort of pointers I'm after.  I'll take a look
how this is implemented.

> I think most modes in Emacs Lisp take a pragmatic approach, using
> heuristics that get the job done most of the time without being too
> computationally expensive.  The SMIE package is a generalization of this
> idea, see (info "(elisp)SMIE").

> I am not aware of anything like SMIE that allows for languages
> extensions to be "plugged in" in a general way.

Well, I have my doubts that Perl is a good candidate for SMIE, and
trying to use SMIE in CPerl mode would be a major rewrite anyway.  I
guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to
do the job.

-- 
Cheers, and again thanks for your time,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-20 23:40   ` Harald Jörg
@ 2021-03-21  2:18     ` Clément Pit-Claudel
  2021-03-21 11:41       ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Clément Pit-Claudel @ 2021-03-21  2:18 UTC (permalink / raw)
  To: emacs-devel

On 3/20/21 7:40 PM, Harald Jörg wrote:
> Well, I have my doubts that Perl is a good candidate for SMIE, and
> trying to use SMIE in CPerl mode would be a major rewrite anyway.  I
> guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to
> do the job.

I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not sure how it relates to font-lock-add-keywords and hooks, so maybe we're not thinking of the same thing?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-21  2:18     ` Clément Pit-Claudel
@ 2021-03-21 11:41       ` Harald Jörg
  2021-03-21 12:39         ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-21 11:41 UTC (permalink / raw)
  To: Clément Pit-Claudel; +Cc: emacs-devel

Clément Pit-Claudel <cpitclaudel@gmail.com> writes:

> On 3/20/21 7:40 PM, Harald Jörg wrote:
>> Well, I have my doubts that Perl is a good candidate for SMIE, and
>> trying to use SMIE in CPerl mode would be a major rewrite anyway.  I
>> guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to
>> do the job.
>
> I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not
> sure how it relates to font-lock-add-keywords and hooks, so maybe
> we're not thinking of the same thing?

I admit that I don't know much about SMIE, so maybe I'm wrong here.
Most of Perl is pretty similar to C or Java, but there are cases where
Perl's syntax just can't be parsed statically.

About the relation to font-lock-add-keywords - let me show an example.
"Traditional" Perl has no keywords for object oriented programming, but
there are dozens of extensions which add them.  For example, with
Object::Pad you can write (I apologize for the nonsensical example):

  class Coffee::Machine extends Lawn::Mower
  {
     has $grinder :reader :writer(replace_grinder)
     method grind { ...; }
  } 

If I want to support that with CPerl mode, I need to:

 - highlight class, extends, method and some more I haven't included in
   that example as keywords.  That's where font-lock-add-keywords comes
   into play.  Also, "Dishwasher" and "clean_up" should be highlighted
   like package and sub names.

 - add "Dishwasher" and "clean_up" to the imenu index.
 
 - make sure that indentation recognizes that the closing braces end a
   statement after "class" and "method".  Perl syntax has various cases
   where it doesn't.  I guess this is the part where SMIE would help.

For the latter two tasks, I need to "hook" the logic somehow into
CPerl's implementations of `imenu-create-index-function' and the various
indentation functions.  The current indentation code in CPerl mode
is... a bit messy, and some old bugs call for attention anyway.

If, however, that same class would be defined using the Dios extension,
it would look like this:

  class Coffee::Machine is Lawn::Mower
  {
     has $.grinder is rw
     method grind { ... }
  } 

...and also offer the keywords "func" and "submethod" for stuff that
should go into imenu, and "lex" for declaring variables.  There's a
dozen or more other extensions providing OO frameworks for Perl.

So, if the Emacs support for an extension could be done by a separate
.el file, these could be developed within GNU Emacs, in GNU ELPA, but
also contributed via NonGNU elpa, MELPA or GitHub.

I am aware that probably as soon as such an extension mechanism is
available, _someone_ will publish a Perl extension which can't be
covered :)
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-21 11:41       ` Harald Jörg
@ 2021-03-21 12:39         ` Stefan Monnier
  2021-03-21 15:48           ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-21 12:39 UTC (permalink / raw)
  To: Harald Jörg; +Cc: Clément Pit-Claudel, emacs-devel

>>> Well, I have my doubts that Perl is a good candidate for SMIE, and
>>> trying to use SMIE in CPerl mode would be a major rewrite anyway.  I
>>> guess the Emacs Losp basics (font-lock-add-keyword, hooks) will have to
>>> do the job.
>>
>> I'm pretty sure SMIE would work wonderfully for Perl, but I'm also not
>> sure how it relates to font-lock-add-keywords and hooks, so maybe
>> we're not thinking of the same thing?

FWIW, I'm sure SMIE could be made to work, but I highly doubt it would
"work wonderfully" in the sense that it would likely take a fair bit of
effort to make SMIE indent Perl mode as well as the current indentation
code in `cperl-mode` or in `perl-mode`.

>   class Coffee::Machine extends Lawn::Mower
>   {
>      has $grinder :reader :writer(replace_grinder)
>      method grind { ...; }
>   } 
[...]
>  - add "Dishwasher" and "clean_up" to the imenu index.

That seems to require AI (unless you're talking about a slightly
different example than the one quoted above ;-).

>  - make sure that indentation recognizes that the closing braces end a
>    statement after "class" and "method".  Perl syntax has various cases
>    where it doesn't.  I guess this is the part where SMIE would help.

Actually, the closing brace which also closes a statement is one of the
major pain points in `sm-c-mode`, so it would be one of the parts where
you'd need extra work to make SMIE understand what's going on.

> For the latter two tasks, I need to "hook" the logic somehow into
> CPerl's implementations of `imenu-create-index-function' and the various
> indentation functions.  The current indentation code in CPerl mode
> is... a bit messy, and some old bugs call for attention anyway.

AFAIK font-lock and imenu are easy.  For font-lock there's
`font-lock-add-keywords` and for imenu, you should be able to make it
work fairly well with just `add-function` to
`imenu-create-index-function`.

For indentation, it's fundamentally harder (for the same reason that
combining two LALR grammars doesn't necessarily give you an LALR
grammar), so it will have to be done in a somewhat ad-hoc way.
I suspect that if the base mode uses SMIE, it would make it
significantly easier to add extensions (because the structure of SMIE
imposes constraints that expose the "compositional" aspect of the
grammar, in some sense), but that's not what you have to work with
currently, so you're going to have to dig into the indentation code and
try and figure out how to make it work with your extension(s) and then
how to express the changes "from outside" (e.g. by using hooks,
`add-function`, or `advice-add`; we can of course add hooks
to `cperl-mode.el` or `perl-mode.el` to make that easier).


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-21 12:39         ` Stefan Monnier
@ 2021-03-21 15:48           ` Harald Jörg
  2021-03-21 17:59             ` Stefan Monnier
  2021-03-30 18:41             ` Handling extensions of programming languages Stephen Leake
  0 siblings, 2 replies; 19+ messages in thread
From: Harald Jörg @ 2021-03-21 15:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> [...]
>>   class Coffee::Machine extends Lawn::Mower
>>   {
>>      has $grinder :reader :writer(replace_grinder)
>>      method grind { ...; }
>>   } 
> [...]
>>  - add "Dishwasher" and "clean_up" to the imenu index.
>
> That seems to require AI (unless you're talking about a slightly
> different example than the one quoted above ;-).

Ouch.  I goofed up when deleting stuff from my test file to make the
example shorter :)  I wanted to add "grind" instead of "clean_up".

But, jokes aside: I actually consider adding entries to the imenu index
_which aren't there._ In the example above, Object::Pad will silently
create the methods `grinder' and `replace_grinder'.  I think these
*should* go to imenu because if your code in another source calls
$cm->grinder you might otherwise have a hard time finding where that
routine is declared.

>>  - make sure that indentation recognizes that the closing braces end a
>>    statement after "class" and "method".  Perl syntax has various cases
>>    where it doesn't.  I guess this is the part where SMIE would help.
>
> Actually, the closing brace which also closes a statement is one of the
> major pain points in `sm-c-mode`, so it would be one of the parts where
> you'd need extra work to make SMIE understand what's going on.

Given the effort CPerl mode spends to distinguish these two I guessed
so.  There are some open bugs regarding indentation in CPerl mode
(Bug#8077, Bug#11773, Bug#28640) which I'd like to fix while on the way.

Also, a few days ago there was a discussion in the Perlmonks forum where
CPerl mode guesses horribly wrong:
https://www.perlmonks.org/?node_id=11129870

>> For the latter two tasks, I need to "hook" the logic somehow into
>> CPerl's implementations of `imenu-create-index-function' and the various
>> indentation functions.  The current indentation code in CPerl mode
>> is... a bit messy, and some old bugs call for attention anyway.
>
> AFAIK font-lock and imenu are easy.  For font-lock there's
> `font-lock-add-keywords` and for imenu, you should be able to make it
> work fairly well with just `add-function` to
> `imenu-create-index-function`.

For certain values of easy :).  But yes, that's the plan.  The font-lock
mechanism is indeed very powerful.  For Object::Pad, the keyword
declaration takes about 120 lines (in rx notation, which is rather
wordy) - mostly due to the effort to avoid false positives.

For imenu, adding regexps to the list of `or'ed expressions to search
for seems to be an alternative.  Or maybe it doesn't, if I want to add
entries which can't be easily searched for.

> For indentation, it's fundamentally harder (for the same reason that
> combining two LALR grammars doesn't necessarily give you an LALR
> grammar), so it will have to be done in a somewhat ad-hoc way.

Indeed.  Indentation needs more "context".

> I suspect that if the base mode uses SMIE, it would make it
> significantly easier to add extensions (because the structure of SMIE
> imposes constraints that expose the "compositional" aspect of the
> grammar, in some sense), but that's not what you have to work with
> currently, so you're going to have to dig into the indentation code and
> try and figure out how to make it work with your extension(s) and then
> how to express the changes "from outside" (e.g. by using hooks,
> `add-function`, or `advice-add`; we can of course add hooks
> to `cperl-mode.el` or `perl-mode.el` to make that easier).

Your last parens touch another interesting aspect: Can that stuff be
used by cperl-mode.el _and_ perl-mode.el?

Well, as it turns out, the font-lock stuff "works" for both.  It looks a
bit weird with Perl mode because the "new" keywords like `method' have
different faces than the "old" ones like `my'.

For imenu, things are different: Perl mode uses
`imenu-generic-expression', whereas CPerl mode uses a rather complex
`imenu-create-index-function ', so that it can prepend the current
namespace to the name of functions.

And as for indentation...  I'd say the code in both modes needs to catch
up with current perl before we consider extensions.  Maybe they could
share functions or regular expressions how to find the beginning of a
function, or how to identify closing braces which terminate a statement:
The specification for this logic comes from Perl and should be the same
for both modes.
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-21 15:48           ` Harald Jörg
@ 2021-03-21 17:59             ` Stefan Monnier
  2021-03-22 14:08               ` Handling extensions of programming languages (Perl) Harald Jörg
  2021-03-30 18:41             ` Handling extensions of programming languages Stephen Leake
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-21 17:59 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

> But, jokes aside: I actually consider adding entries to the imenu index
> _which aren't there._ In the example above, Object::Pad will silently
> create the methods `grinder' and `replace_grinder'.  I think these
> *should* go to imenu because if your code in another source calls
> $cm->grinder you might otherwise have a hard time finding where that
> routine is declared.

I don't see any problem with that.  You could even argue that they
*are* there.

>>> For the latter two tasks, I need to "hook" the logic somehow into
>>> CPerl's implementations of `imenu-create-index-function' and the various
>>> indentation functions.  The current indentation code in CPerl mode
>>> is... a bit messy, and some old bugs call for attention anyway.
>> AFAIK font-lock and imenu are easy.  For font-lock there's
>> `font-lock-add-keywords` and for imenu, you should be able to make it
>> work fairly well with just `add-function` to
>> `imenu-create-index-function`.
> For certain values of easy :).

I meant "easy" in the sense that once you've figured out how to match
those constructs and how to put the right face on the various parts,
adding it modularly (e.g. from a minor mode) should be reasonably easy,
because it shouldn't interact in too complex ways with the rest of the
font-lock rules.

> Your last parens touch another interesting aspect: Can that stuff be
> used by cperl-mode.el _and_ perl-mode.el?

For imenu and font-lock, I can't see why not.

> Well, as it turns out, the font-lock stuff "works" for both.  It looks a
> bit weird with Perl mode because the "new" keywords like `method' have
> different faces than the "old" ones like `my'.

I'm not sure why that would be: AFAICT, both `perl-mode` and
`cperl-mode` highlight keywords (like `sub`, `if`, `for`, ...) using the
`font-lock-keyword-face` (they generally use fairly different faces, but
this is a part where they agree ;-).

> For imenu, things are different: Perl mode uses
> `imenu-generic-expression', whereas CPerl mode uses a rather complex
> `imenu-create-index-function ', so that it can prepend the current
> namespace to the name of functions.

If you code uses `add-function` on `imenu-create-index-function` it
should work in both cases (`perl-mode` simply keeps the default value
of `imenu-create-index-function` which is the function that implements
`imenu-generic-expression`).

> And as for indentation...  I'd say the code in both modes needs to catch
> up with current perl before we consider extensions.  Maybe they could
> share functions or regular expressions how to find the beginning of a
> function, or how to identify closing braces which terminate a statement:
> The specification for this logic comes from Perl and should be the same
> for both modes.

Consolidation between the two modes is progress, so you got my vote.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages (Perl)
  2021-03-21 17:59             ` Stefan Monnier
@ 2021-03-22 14:08               ` Harald Jörg
  2021-03-22 14:48                 ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-22 14:08 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[I've added "Perl" to the subject since this is going specific]

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> [...]

>> Your last parens touch another interesting aspect: Can that stuff be
>> used by cperl-mode.el _and_ perl-mode.el?
>
> For imenu and font-lock, I can't see why not.

Nice.  How would the set of shared functions be distributed?  In a new,
separate file which both modes `require'?  That file would then also be
part of a CPerl distribution on ELPA, I guess?  In my opinion, it would
make sense in any case to use the "perl" prefix for shared stuff.

>> Well, as it turns out, the font-lock stuff "works" for both.  It looks a
>> bit weird with Perl mode because the "new" keywords like `method' have
>> different faces than the "old" ones like `my'.
>
> I'm not sure why that would be: AFAICT, both `perl-mode` and
> `cperl-mode` highlight keywords (like `sub`, `if`, `for`, ...) using the
> `font-lock-keyword-face` (they generally use fairly different faces, but
> this is a part where they agree ;-).

Overall they agree, but there are differences in details (some might
even be unintended).  For new keywords and syntax there's indeed no need
to use different faces, but they should be somewhat consistent to
existing highlighting.  Some results from first tests and debugging:

 - Declarators (like "my") are type-face in perl-mode, keyword-face in
   cperl-mode.  I noticed this because the new "has" is fontified by
   perl-mode (though it isn't part of Perl) and the additions don't
   override it.  CPerl mode abuses type-face for builtin functions, I
   wonder how much stir it makes if this is changed.

 - Names of packages are not fontified in perl-mode when they are `use'd
   or `require'd (on closer inspection, this is probably unintended: The
   first capture group is either an empty string or punctuation/space
   and should be shy).

 - In cperl-mode, ':' is a symbol, but a punctuation character in
   perl-mode.  This makes interpretation of "\\_<" different.  Perhaps
   changing cperl-mode's syntax table to making ':' punctuation would be
   the way to go - but punctuation also has its deficits for perl-mode, as
   apparent in "package Foo::Bar", so i would need more work.  I haven't
   investigated further.

>> For imenu, things are different: Perl mode uses
>> `imenu-generic-expression', whereas CPerl mode uses a rather complex
>> `imenu-create-index-function ', so that it can prepend the current
>> namespace to the name of functions.
>
> If you code uses `add-function` on `imenu-create-index-function` it
> should work in both cases (`perl-mode` simply keeps the default value
> of `imenu-create-index-function` which is the function that implements
> `imenu-generic-expression`).

Ah, yes, of course.  I didn't think of that (nor read the docs).

The two modes have different styles how they present their results,
though.  Adding new entries needs some "rearrangement" to put it into
the right place(s) in the index.

>> And as for indentation...  I'd say the code in both modes needs to catch
>> up with current perl before we consider extensions.  Maybe they could
>> share functions or regular expressions how to find the beginning of a
>> function, or how to identify closing braces which terminate a statement:
>> The specification for this logic comes from Perl and should be the same
>> for both modes.
>
> Consolidation between the two modes is progress, so you got my vote.

Thanks!
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages (Perl)
  2021-03-22 14:08               ` Handling extensions of programming languages (Perl) Harald Jörg
@ 2021-03-22 14:48                 ` Stefan Monnier
  2021-03-22 17:32                   ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-22 14:48 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

>> For imenu and font-lock, I can't see why not.
> Nice.

Beware: I might just be blinded by optimism.

> How would the set of shared functions be distributed?

Good question.  I guess it largely depends on the size, the way you
intend to distribute it, the possible bad interaction with other Perl
extensions, ...

E.g. for an extension which doesn't collide with any other known Perl
extension, you could imagine enabling it by default (and maybe even
forego offering a way to disable it).

I think the most natural/convenient form of an extension that can be
enabled or not would be as a minor mode.

And as for where to put the code, it could be in a completely separate
file, or directly in `perl-mode.el` (which `cperl-mode.el` could
require: it's a mere 50kB compared to `cperl-mode.el`s 300kB).

> Overall they agree, but there are differences in details (some might
> even be unintended).  For new keywords and syntax there's indeed no need
> to use different faces, but they should be somewhat consistent to
> existing highlighting.  Some results from first tests and debugging:
>
>  - Declarators (like "my") are type-face in perl-mode, keyword-face in
>    cperl-mode.  I noticed this because the new "has" is fontified by
>    perl-mode (though it isn't part of Perl) and the additions don't
>    override it.

I think such discrepancies are just misfeatures, so it would be nice to
fix them (ideally by choosing that one that seems less arbitrary).
Using type-face for `my` or `local` doesn't seem useful, so we
should probably change them to keyword.

>    CPerl mode abuses type-face for builtin functions, I
>    wonder how much stir it makes if this is changed.

Try it ;-)
Unsurprisingly, I vote for using the font-lock-builtin-face for them.

>  - Names of packages are not fontified in perl-mode when they are `use'd
>    or `require'd (on closer inspection, this is probably unintended: The
>    first capture group is either an empty string or punctuation/space
>    and should be shy).

Sounds like a bug, indeed.

>  - In cperl-mode, ':' is a symbol, but a punctuation character in perl-mode.

Ah, right, this could make it significantly harder to share code between
the two major modes.  I don't think either choice is clearly superior,
but we should make them agree on the syntax-table.

>    This makes interpretation of "\\_<" different.  Perhaps changing
>    cperl-mode's syntax table to making ':' punctuation would be the
>    way to go - but punctuation also has its deficits for perl-mode, as
>    apparent in "package Foo::Bar", so i would need more work.
>    I haven't investigated further.

I suspect it can also impact other parts of the code (since it impacts
things like `forward-sexp`).  I think my recommendation would be to
change `perl-mode` to agree with `cperl-mode` since `perl-mode.el` is
much smaller so the amount of breakage should be correspondingly smaller.
[ Also, from a user's point of view it's good that `C-M-x` skips over the
  whole of "Foo::bar" instead of stopping after "Foo".  ]

> The two modes have different styles how they present their results,
> though.  Adding new entries needs some "rearrangement" to put it into
> the right place(s) in the index.

Then again, you could focus on making it "work well" for one of the modes
(presumably `cperl-mode`) and content yourself with "works" for the
other ;-)


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages (Perl)
  2021-03-22 14:48                 ` Stefan Monnier
@ 2021-03-22 17:32                   ` Harald Jörg
  2021-03-22 18:27                     ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-22 17:32 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> For imenu and font-lock, I can't see why not.
>> Nice.
>
> Beware: I might just be blinded by optimism.

Optimism is my second name!  (My first name is "Unwarranted")

>> How would the set of shared functions be distributed?
>
> Good question.  I guess it largely depends on the size, the way you
> intend to distribute it, the possible bad interaction with other Perl
> extensions, ...
>
> E.g. for an extension which doesn't collide with any other known Perl
> extension, you could imagine enabling it by default (and maybe even
> forego offering a way to disable it).

I would prefer this behavior (not being able to disable it) for things
that come with new Perl versions, but not for extensions.  If you run an
older Perl version (which happens all the time because distributions
take time to catch up) it might behave strange, but I think that this is
ok as a reminder that "at some time you will need to change this
anyway".

For extensions it is really difficult to find out whether they might
collide with another extension.  Sometimes new extensions quickly rise
in popularity but need different handling.  A typical example is
exception handling with Try::Tiny which is lightweight and nice and all,
but it comes with the pitfall that the try block, different from all
other extensions for exception handling with try/catch/finally,
_requires_ a semicolon.

> I think the most natural/convenient form of an extension that can be
> enabled or not would be as a minor mode.

I've been hoping that this is acceptable.  It brings a lot of
infrastructure and therefore consistency for users.

> And as for where to put the code, it could be in a completely separate
> file, or directly in `perl-mode.el` (which `cperl-mode.el` could
> require: it's a mere 50kB compared to `cperl-mode.el`s 300kB).

I am leaning towards a completely separate file, but maybe not right
now.  In both cases the adventurous users who're using cperl-mode
directly from the repository will then need to pick two files instead of
one.  If, one day, cperl-mode is made available via ELPA, this should
not necessary require moving perl-mode to elpa as well.

>> [...about discrepancies in syntax highlighting ...]
>
> I think such discrepancies are just misfeatures, so it would be nice to
> fix them (ideally by choosing that one that seems less arbitrary).

Agreed.

> Using type-face for `my` or `local` doesn't seem useful, so we
> should probably change them to keyword.

That was my first thought as well.  But then, the declarators appear in
places where other languages have their types.  And then again, there
are (various, of course) Perl extensions which provide a type system
for Perl, so you can write "my Str $string" or even "my ArrayRef[Int] $ref".

I guess I need to *see* it for some time to find out whether in "my Str"
both parts should have the same color or better shouldn't.

>>    CPerl mode abuses type-face for builtin functions, I
>>    wonder how much stir it makes if this is changed.
>
> Try it ;-)
> Unsurprisingly, I vote for using the font-lock-builtin-face for them.

I agree.  CPerl mode uses different faces for "overridable" and
"nonoverridable" builtins, but this distinction isn't that valuable
these days (and it also changes between Perl versions).  I've also
received feedback that this distinction should go away.  IIRC the
non-"standard" colors of CPerl mode occasionally annoyed people which
use Emacs with many different programming languages.

> [...]
>>  - In cperl-mode, ':' is a symbol, but a punctuation character in perl-mode.
>
> Ah, right, this could make it significantly harder to share code between
> the two major modes.  I don't think either choice is clearly superior,
> but we should make them agree on the syntax-table.
>
>> [...]
>
> I suspect it can also impact other parts of the code (since it impacts
> things like `forward-sexp`).  I think my recommendation would be to
> change `perl-mode` to agree with `cperl-mode` since `perl-mode.el` is
> much smaller so the amount of breakage should be correspondingly smaller.
> [ Also, from a user's point of view it's good that `C-M-x` skips over the
>   whole of "Foo::bar" instead of stopping after "Foo".  ]

Good point!  For the moment I'll be a coward and avoid that decision by
honing the regular expressions with that differnce in mind :)

>> The two modes have different styles how they present their results,
>> though.  Adding new entries needs some "rearrangement" to put it into
>> the right place(s) in the index.
>
> Then again, you could focus on making it "work well" for one of the modes
> (presumably `cperl-mode`) and content yourself with "works" for the
> other ;-)

That's probably an acceptable way forward.

Time to roll up my sleeves ... and grab some more coffee.
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages (Perl)
  2021-03-22 17:32                   ` Harald Jörg
@ 2021-03-22 18:27                     ` Stefan Monnier
  2021-03-22 19:31                       ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-22 18:27 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

>> E.g. for an extension which doesn't collide with any other known Perl
>> extension, you could imagine enabling it by default (and maybe even
>> forego offering a way to disable it).
> I would prefer this behavior (not being able to disable it) for things
> that come with new Perl versions, but not for extensions.

I wasn't recommending any particular choice.
Just mentioning what I would consider as acceptable.

>> And as for where to put the code, it could be in a completely separate
>> file, or directly in `perl-mode.el` (which `cperl-mode.el` could
>> require: it's a mere 50kB compared to `cperl-mode.el`s 300kB).
> I am leaning towards a completely separate file, but maybe not right
> now.  In both cases the adventurous users who're using cperl-mode
> directly from the repository will then need to pick two files instead of
> one.  If, one day, cperl-mode is made available via ELPA, this should
> not necessary require moving perl-mode to elpa as well.

I don't see any problem with a :core `cperl-mode` package which comes
bundled with its own version of `perl-mode.el` (nor would I find it
problematic to export `perl-mode.el` into its own :core GNU ELPA
package).

> That was my first thought as well.  But then, the declarators appear in
> places where other languages have their types.

[ I think you use a very restricted definition of "other languages" here.
  It's definitely not the case for most of the statically typed languages
  I've used, except for C.
  I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ]


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages (Perl)
  2021-03-22 18:27                     ` Stefan Monnier
@ 2021-03-22 19:31                       ` Harald Jörg
  2021-03-22 19:58                         ` [OFFTOPIC] " Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-22 19:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier writes:

> [...]
> I don't see any problem with a :core `cperl-mode` package which comes
> bundled with its own version of `perl-mode.el` (nor would I find it
> problematic to export `perl-mode.el` into its own :core GNU ELPA
> package).

That's great!  So that should be an issue at all.  I've yet to get
familiar with the procedures around GNU ELPA.

>> That was my first thought as well.  But then, the declarators appear in
>> places where other languages have their types.
>
> [ I think you use a very restricted definition of "other languages" here.
>   It's definitely not the case for most of the statically typed languages
>   I've used, except for C.
>   I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ]

Guilty, your honor.  In the last years I've dealt with Emacs Lisp (only
very recently), Perl, C, Java, JavaScript ... and before that with a
dialect of PL/1, assembly languages (68000, x86, /390) ... and before
that with FORTRAN, where everyone's type system seemed to be IMPLICIT
INTEGER I-N.  So indeed, almost no intersection with your list.
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [OFFTOPIC] Re: Handling extensions of programming languages (Perl)
  2021-03-22 19:31                       ` Harald Jörg
@ 2021-03-22 19:58                         ` Stefan Monnier
  2021-03-22 22:05                           ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-22 19:58 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

>>> That was my first thought as well.  But then, the declarators appear in
>>> places where other languages have their types.
>>
>> [ I think you use a very restricted definition of "other languages" here.
>>   It's definitely not the case for most of the statically typed languages
>>   I've used, except for C.
>>   I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ]
>
> Guilty, your honor.  In the last years I've dealt with Emacs Lisp (only
> very recently), Perl, C, Java, JavaScript ... and before that with a
> dialect of PL/1, assembly languages (68000, x86, /390) ... and before
> that with FORTRAN, where everyone's type system seemed to be IMPLICIT
> INTEGER I-N.  So indeed, almost no intersection with your list.

Of those the only ones that are statically typed seem to be C, Java,
Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where
the type is placed at a location comparable to where `my` is placed in
Perl, IMO.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl)
  2021-03-22 19:58                         ` [OFFTOPIC] " Stefan Monnier
@ 2021-03-22 22:05                           ` Harald Jörg
  2021-03-22 22:24                             ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-22 22:05 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>>> That was my first thought as well.  But then, the declarators appear in
>>>> places where other languages have their types.
>>>
>>> [ I think you use a very restricted definition of "other languages" here.
>>>   It's definitely not the case for most of the statically typed languages
>>>   I've used, except for C.
>>>   I'm thinking of OCaml, SML, Haskell, Agda, Coq, Modula-2, Pascal, Ada, ... ]
>>
>> Guilty, your honor.  In the last years I've dealt with Emacs Lisp (only
>> very recently), Perl, C, Java, JavaScript ... and before that with a
>> dialect of PL/1, assembly languages (68000, x86, /390) ... and before
>> that with FORTRAN, where everyone's type system seemed to be IMPLICIT
>> INTEGER I-N.  So indeed, almost no intersection with your list.
>
> Of those the only ones that are statically typed seem to be C, Java,
> Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where
> the type is placed at a location comparable to where `my` is placed in
> Perl, IMO.

Fortran, too, unless you do the IMPLICIT trick.

But anyway: Looking at the Emacs modes for Java and C, all keywords like
"private" and "static" (which, similar to "my" in Perl, define scope
rather than type) are in keyword-face.  This would indicate that
keyword-face is to be preferred for the declarators, and type-face for
the types.
-- 
Cheers,
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl)
  2021-03-22 22:05                           ` Harald Jörg
@ 2021-03-22 22:24                             ` Stefan Monnier
  2021-03-22 23:43                               ` Harald Jörg
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Monnier @ 2021-03-22 22:24 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

>> Of those the only ones that are statically typed seem to be C, Java,
>> Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where
>> the type is placed at a location comparable to where `my` is placed in
>> Perl, IMO.
>
> Fortran, too, unless you do the IMPLICIT trick.

OK, I must admit that my knowledge of Fortran syntax is poor, so I had
done a quick search and found
https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap02/declare.html
which seems to suggest that the syntax is "TYPE :: VARS", which seemed
different enough from "TYPE VAR".

OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-)


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Re: Handling extensions of programming languages (Perl)
  2021-03-22 22:24                             ` Stefan Monnier
@ 2021-03-22 23:43                               ` Harald Jörg
  2021-03-23  3:49                                 ` [OFFTOPIC] " Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Harald Jörg @ 2021-03-22 23:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier writes:

>>> Of those the only ones that are statically typed seem to be C, Java,
>>> Fortran and PL/1; and AFAICT only 50% (C and Java) use a syntax where
>>> the type is placed at a location comparable to where `my` is placed in
>>> Perl, IMO.
>>
>> Fortran, too, unless you do the IMPLICIT trick.
>
> OK, I must admit that my knowledge of Fortran syntax is poor, so I had
> done a quick search and found
> https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap02/declare.html
> which seems to suggest that the syntax is "TYPE :: VARS", which seemed
> different enough from "TYPE VAR".
>
> OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-)

I'm happy with that!  Also, I might sort of deserve half points.  As I
wrote, Fortran was my first computer language.  I should have specified:
I started with Fortran 4 and left with FORTRAN 77.  Both had
declarations like "INTEGER N".
(https://web.stanford.edu/class/me200c/tutorial_77/).  "TYPE :: VARS"
came only with Fortran 90.  And now I feel old.

https://www.rickmurphy.net/advent/ADVENT.FT is an example of FORTRAN
code from that era (the legendary Colossal Cave Adventure).

Emacs, of course, supports both dialects!
-- 
A HOLLOW VOICE SAYS "PLUGH".
haj



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [OFFTOPIC] Handling extensions of programming languages (Perl)
  2021-03-22 23:43                               ` Harald Jörg
@ 2021-03-23  3:49                                 ` Stefan Monnier
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2021-03-23  3:49 UTC (permalink / raw)
  To: Harald Jörg; +Cc: emacs-devel

>> OK, I'll grant you half points for Fortran, so you're up to 62.5% ;-)
> I'm happy with that!  Also, I might sort of deserve half points.  As I
> wrote, Fortran was my first computer language.  I should have specified:
> I started with Fortran 4 and left with FORTRAN 77.  Both had
> declarations like "INTEGER N".
> (https://web.stanford.edu/class/me200c/tutorial_77/).  "TYPE :: VARS"
> came only with Fortran 90.  And now I feel old.

Very good point.  This "::" syntax indeed didn't remind me of anything,
which you've now explained (my experience with Fortran is quite limited
and moreover limited to FORTRAN 77).

So I guess I'm forced to concede that you're up to 75%.  Damn!


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Handling extensions of programming languages
  2021-03-21 15:48           ` Harald Jörg
  2021-03-21 17:59             ` Stefan Monnier
@ 2021-03-30 18:41             ` Stephen Leake
  1 sibling, 0 replies; 19+ messages in thread
From: Stephen Leake @ 2021-03-30 18:41 UTC (permalink / raw)
  To: Harald Jörg; +Cc: Stefan Monnier, emacs-devel

haj@posteo.de (Harald Jörg) writes:

>> For indentation, it's fundamentally harder (for the same reason that
>> combining two LALR grammars doesn't necessarily give you an LALR
>> grammar), so it will have to be done in a somewhat ad-hoc way.
>
> Indeed.  Indentation needs more "context".

The Gnu ELPA package 'wisi' provides a way to declare indentation in the
grammar as actions; that provides all the context needed.

The wisi parsers also have excellent error correction, so the grammar
actions operate on a complete syntax tree (or fail utterly when the
input is really bad).

I have not tried to use wisi for Perl; it works for Ada and Java.

This does not address your issue of extending a language with new
syntax; as far as wisi is concerned, that is a new language, and needs
an entirely new grammar file. This is true for any LR parser.
It may not be true for a packrat parser, although the base parser would
have to provide hooks in each nonterminal parsing routine.

In wisi, it might be possible to extend the grammar file syntax with
something like:

#base_grammar <grammar file>

but it would still generate separate parsers for the base and extended
languages.

As long as the extended language is a superset of the base language, it
mostly doesn't hurt to always use the extended language parser. The
ada-mode parser implements a language that is an extension of standard
Ada 2012; that reduces conflicts and simplifies specifying indentation.

One downside of using an extended parser; it will not report syntax
errors for extended syntax in a file that is not supposed to contain
any. For ada-mode this is not a significant problem; the extensions
allow things that no Ada programmer would write even by mistake, and the
real compiler catches them soon enough.

> And as for indentation...  I'd say the code in both modes needs to catch
> up with current perl before we consider extensions.  Maybe they could
> share functions or regular expressions how to find the beginning of a
> function, or how to identify closing braces which terminate a statement:
> The specification for this logic comes from Perl and should be the same
> for both modes.

The reason I started the wisi package and WisiToken parser generator was
to migrate ada-mode away from ad-hoc code to grammar based code, to
support Ada 2012. To work well, the parser needs to be error correcting.
SMIE is inherently more error tolerant than an LR parser without error
correction, but I doubt it's good enough for indent.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-30 18:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 18:53 Handling extensions of programming languages Harald Jörg
2021-03-20 17:02 ` Matt Armstrong
2021-03-20 23:40   ` Harald Jörg
2021-03-21  2:18     ` Clément Pit-Claudel
2021-03-21 11:41       ` Harald Jörg
2021-03-21 12:39         ` Stefan Monnier
2021-03-21 15:48           ` Harald Jörg
2021-03-21 17:59             ` Stefan Monnier
2021-03-22 14:08               ` Handling extensions of programming languages (Perl) Harald Jörg
2021-03-22 14:48                 ` Stefan Monnier
2021-03-22 17:32                   ` Harald Jörg
2021-03-22 18:27                     ` Stefan Monnier
2021-03-22 19:31                       ` Harald Jörg
2021-03-22 19:58                         ` [OFFTOPIC] " Stefan Monnier
2021-03-22 22:05                           ` Harald Jörg
2021-03-22 22:24                             ` Stefan Monnier
2021-03-22 23:43                               ` Harald Jörg
2021-03-23  3:49                                 ` [OFFTOPIC] " Stefan Monnier
2021-03-30 18:41             ` Handling extensions of programming languages Stephen Leake

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).