Guidelines for the "symbol" syntax class

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Guidelines for the "symbol" syntax class
@ 2016-01-03  5:09 Dmitry Gutov
  2016-01-03 22:56 ` John Wiegley
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-03  5:09 UTC (permalink / raw)
  To: emacs-devel; +Cc: Stefan Monnier

Hi all and Stefan,

I intend to make some changes to the syntax of `:' in ruby-mode, and I'm 
wondering how far should that change go. I can remove it from the syntax 
table, but still apply it via syntax-propertize-function in other cases, 
see below.

Do we have any solid guidelines for that?

Context: my two main uses of the notion of symbol are 1) "all symbols in 
all buffer" completion candidates, 2) filtering the results of 
xref-find-references by checking that the match begins and ends at a 
symbol boundary. Currently, both of these features don't work well in 
ruby-mode.

First, "M::C" is interpreted as one symbol. If I just search for 
references to "C", this won't match. And vice versa, this qualified name 
usually corresponds to the definition like this:

module M
   class C
   end
end

so if I search for references to "M::C", this won't match either. So `:' 
should simply become "punctuation". Then the simplest approach will 
leave to false positives, but no false negatives.

There is another way `:' is used in Ruby: Ruby Symbols (I'm going to 
mention those only using a capital S, to distinguish). Which is like 
weird syntax for interned strings, but they're often used to refer to 
method names: for introspection, or when defining a method dynamically, 
or to dispatch a call dynamically. Examples:

class C
   def foo
   end
end

C.instance_method(:foo) # => #<UnboundMethod: C#foo>

class C
   define_method(:foo) do
     3
   end
end

C.new.send(:foo) # => 3

Consequently, if somewhere in my Ruby program there's a method foo_bar, 
it might be beneficial to be able to complete a Symbol :fo to :foo_bar 
as well, or for xref-find-references, when looking for references to 
this method, include the usages of Symbol :foo_bar.

Or take this example:

class C
   # attr_reader is a macro, kinda.
   # Define a method C#foo that simply returns the value
   # of the instance variable with the same name:
   attr_reader :foo

   def initialize(foo)
     # Assign that instance variable.
     @foo = foo
   end

   def do_something
     # Call the previously defined method (parens are optional)
     # and then call a method on the returned value:
     foo.do_something_amazing
   end
end

After writing the attr_reader call, it would be handy if I could use the 
name of the symbol in completion when writing the name of the argument, 
and the name of the variable (so there's also a question of whether @ 
should have the "symbol" syntax; it currently doesn't). And then later, 
when calling the method.

Another argument in favor of not having `:' be symbol constituents in 
Symbol literals is that we have two ways to write Hash (associative 
array) literal with Symbol keys:

{:key => value} and {key: value},

where the latter is syntactic sugar for the former. If `:' is not a 
symbol constituents, we won't have two superficially "different" symbols 
in the buffer, and the "find references" search will easily find both.

Or, should I stop trying to make the simplest general approaches work in 
ruby-mode, and write a dedicated xref backend for Ruby? One that would 
use etags and Grep, but use a bit smarter filtering.

What should company-dabbrev-code do? Should it use 
dabbrev-abbrev-char-regexp, which ruby-mode will then set?

Should both company-dabbrev-code and ruby-mode make use of 
dabbrev-abbrev-skip-leading-regexp? Note that it still won't help to 
avoid making {:key and {key: look like different symbols.

And if I do all that, what *will* be the purpose of making `:' remain 
symbol constituents inside Symbol literals?

Thanks all,
especially to those who've read all this ;-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-03  5:09 Guidelines for the "symbol" syntax class Dmitry Gutov
@ 2016-01-03 22:56 ` John Wiegley
  2016-01-04  0:46   ` Dmitry Gutov
  0 siblings, 1 reply; 13+ messages in thread
From: John Wiegley @ 2016-01-03 22:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, emacs-devel

>>>>> Dmitry Gutov <dgutov@yandex.ru> writes:

> Or, should I stop trying to make the simplest general approaches work in
> ruby-mode, and write a dedicated xref backend for Ruby? One that would use
> etags and Grep, but use a bit smarter filtering.

Does removing ':' from the symbol class for ruby solve all of your problems,
and create no new ones? :)

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-03 22:56 ` John Wiegley
@ 2016-01-04  0:46   ` Dmitry Gutov
  2016-01-04  0:51     ` Stefan Monnier
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  0:46 UTC (permalink / raw)
  To: emacs-devel, Stefan Monnier

On 01/04/2016 12:56 AM, John Wiegley wrote:

> Does removing ':' from the symbol class for ruby solve all of your problems,
> and create no new ones? :)

I wish.

With fewer (hopefully none) false negatives returned by 
xref-collect-references, there will come more false positive matches.

One false negative comes to mind: if `:' is not a symbol constituent, 
and the user searches for all references to `:foo', using the current 
implementation, they will get none. The default input suggested by 
xref-find-references will be `foo', though (when point is on `:foo').

In completion, if we continue to simply collect all symbols from all 
buffers, the user will start typing a method name , try completion and 
will get offered the names of all method and Symbols they ever typed 
anywhere, in the code. While Symbols are used to refer to method names, 
they're also used for method keyword arguments (in Ruby 2.0+), and you 
even often see them in business logic. For instance, in most web 
applications there will be Symbols :username, :password, :account_id, 
and so on, referring to the HTTP request parameters. So, there will be a 
lot of false positives here as well.

I don't know how to fight that, except by using a smarter program, one 
that loads the application and/or parses the code, etc, but there will 
always be some use cases that are not handled by "smart" logic already 
written, and being able to write a quick-and-simple solution is often handy.

Further, I'm sure there are a lot of third-party packages out there, 
some of them language-agnostic, which deal with source code and use the 
notion of a symbol.

One example that I do use is `easy-kill'. If my cursor is at the end of 
a Symbol :foo, currently calling this command will select `:foo', 
including the colon, which is handy to be able to copy and paste, or 
kill, that value as a whole. If `:' is no longer a symbol constituent, 
either I'll have to live with always additionally typing or deleting 
these colons in that kind of situations, or will have to provide the 
"boundary of thing" info to easy-kill additionally somehow.

That's why I'm asking if there are any existing guidelines, formal or 
informal, that I can take into consideration. That might also inform 
changes to xref-collection-references and company-dabbrev-code; not just 
ruby-mode.

Would anyone care for another wall of text?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  0:46   ` Dmitry Gutov
@ 2016-01-04  0:51     ` Stefan Monnier
  2016-01-04  0:58       ` Dmitry Gutov
  2016-01-04  1:13       ` John Yates
  2016-01-04  0:55     ` John Wiegley
       [not found]     ` <CAJnXXogonsWpqadNpX0BijzoiztorYP1d=b31seBfvGVBwwT_Q@mail.gmail.com>
  2 siblings, 2 replies; 13+ messages in thread
From: Stefan Monnier @ 2016-01-04  0:51 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

With things like Foo:Bar (or Foo.bar or what have you), there are indeed
conflicting definitions of "symbol" and there's usually not a single one
that works everywhere.  IOW you can't expect Emacs's notion of "symbol"
to cover all the use cases.  More specifically, Emacs's notion of symbol
can only be used as a stepping stone on which to construct the things
you need, on a case by case basis.

        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  0:46   ` Dmitry Gutov
  2016-01-04  0:51     ` Stefan Monnier
@ 2016-01-04  0:55     ` John Wiegley
  2016-01-04  1:14       ` Dmitry Gutov
       [not found]     ` <CAJnXXogonsWpqadNpX0BijzoiztorYP1d=b31seBfvGVBwwT_Q@mail.gmail.com>
  2 siblings, 1 reply; 13+ messages in thread
From: John Wiegley @ 2016-01-04  0:55 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Stefan Monnier, emacs-devel

>>>>> Dmitry Gutov <dgutov@yandex.ru> writes:

> That's why I'm asking if there are any existing guidelines, formal or
> informal, that I can take into consideration. That might also inform changes
> to xref-collection-references and company-dabbrev-code; not just ruby-mode.

> Would anyone care for another wall of text?

I suppose my informal guideline is to implement a strategy that works best for
the mode you want to derive information from, and to not expect syntax classes
to be a capable enough interface. I'd expect Ruby symbols to include ":",
personally. A::B is the qualified name of a symbol -- although "B" is
technically an unqualified symbol in its own right within that qualified name.

Better yet, define a more general API that all modes can use, since many modes
struggle with these same issues (imenu, thing-at-pt, dabbrev, etc). This
echoes back to our long IDE thread. Perhaps we need layered, semantically-
defined classes, such that a given text position might occur within many such
layers (for example, selection might choose B, A::B, or A::B.foo, depending on
how many times I smash the "select current" key).

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  0:51     ` Stefan Monnier
@ 2016-01-04  0:58       ` Dmitry Gutov
  2016-01-04  1:13       ` John Yates
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  0:58 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 01/04/2016 02:51 AM, Stefan Monnier wrote:
> With things like Foo:Bar (or Foo.bar or what have you), there are indeed
> conflicting definitions of "symbol" and there's usually not a single one
> that works everywhere.

But as a major mode author, I *do* have to pick one. That's the main 
choice I'm asking about.

Splitting Foo::Bar into two symbols is a done decision (`::' is a scope 
resolution operator anyway). Still undecided whether the symbol-at-point 
at instance variable (@var) should include the @ sign, and whether the 
symbol-at-point at a Symbol literal (:symbol) should include the colon. 
And what to do about syntax-sugared Symbol literals (symbol: value).

> IOW you can't expect Emacs's notion of "symbol"
> to cover all the use cases.  More specifically, Emacs's notion of symbol
> can only be used as a stepping stone on which to construct the things
> you need, on a case by case basis.

Naturally, the use cases left suboptimal by the eventual choice would 
need to be handled specially.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  0:51     ` Stefan Monnier
  2016-01-04  0:58       ` Dmitry Gutov
@ 2016-01-04  1:13       ` John Yates
  2016-01-04  1:18         ` Dmitry Gutov
  1 sibling, 1 reply; 13+ messages in thread
From: John Yates @ 2016-01-04  1:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, Dmitry Gutov

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On Sun, Jan 3, 2016 at 7:51 PM, Stefan Monnier <monnier@iro.umontreal.ca>
wrote:

> IOW you can't expect Emacs's notion of "symbol"
> to cover all the use cases.  More specifically, Emacs's notion of symbol
> can only be used as a stepping stone on which to construct the things
> you need, on a case by case basis.
>

I interpret this as "Emacs supplies only a basic notion of symbol".

Since xref inches closer to understanding the semantics of the user's
programming language it might want to introduce some new abstraction
for a of a qualified name.  These come in two flavors

- object qualified
- namespace or package qualified

Trying to jigger emacs' symbol notion to cover qualified names as provided
in contemporary languages is likely to be a source of  continuing complaints
and frustration.

/john

[-- Attachment #2: Type: text/html, Size: 1325 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  0:55     ` John Wiegley
@ 2016-01-04  1:14       ` Dmitry Gutov
  2016-01-04  2:56         ` Stefan Monnier
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  1:14 UTC (permalink / raw)
  To: emacs-devel, Stefan Monnier

On 01/04/2016 02:55 AM, John Wiegley wrote:

> I suppose my informal guideline is to implement a strategy that works best for
> the mode you want to derive information from, and to not expect syntax classes
> to be a capable enough interface. I'd expect Ruby symbols to include ":",
> personally.

In c++-mode, `std::cout' is two separate symbols, so I'm going to follow 
that model.

> A::B is the qualified name of a symbol

Yes, and methods have qualified names like A::B#foo or A::B.bar, but we 
don't make `#' or `.' symbol constituents.

> -- although "B" is
> technically an unqualified symbol in its own right within that qualified name.

Yup. "B" is the name of a constant set on the module/class A. This is 
relevant because we can reference A::B from code lexically inside A (or 
even inside A::C) by its base name (B). And it's impossible to know 
whether the referenced constant (classes are constants, BTW) is B, A::B 
or A::C::B without runtime information, or parsing the whole project and 
its dependencies.

> Better yet, define a more general API that all modes can use, since many modes
> struggle with these same issues (imenu, thing-at-pt, dabbrev, etc). This
> echoes back to our long IDE thread.

Ouch. We do need to release 25.1 sometime. And I want 
xref-find-references to work okay-ish in ruby-mode by then.

There *are* some variables already in Emacs that I might have to use, 
and maybe I'm missing some of them. E.g. 
dabbrev-abbrev-skip-leading-regexp and find-tag-default-function (should 
xref-collect-references use find-tag-default-function?).

> Perhaps we need layered, semantically-
> defined classes, such that a given text position might occur within many such
> layers (for example, selection might choose B, A::B, or A::B.foo, depending on
> how many times I smash the "select current" key).

easy-kill defines a hierarchy of things (though a simplistic one), which 
works like you describe. How to apply that idea to dabbrev-expand and 
xref-find-references is not immediately obvious to me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  1:13       ` John Yates
@ 2016-01-04  1:18         ` Dmitry Gutov
       [not found]           ` <CAJnXXog5fO_h5UNnVR67EJtT+u7+G-BVMFV3FnJgK=weGj0m_w@mail.gmail.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  1:18 UTC (permalink / raw)
  To: John Yates, Stefan Monnier; +Cc: emacs-devel

On 01/04/2016 03:13 AM, John Yates wrote:

> Trying to jigger emacs' symbol notion to cover qualified names as provided
> in contemporary languages is likely to be a source of  continuing complaints
> and frustration.

I'm not trying to cover qualified names here. In many languages, it's 
impossible to find out the qualified name of the type or method at point 
without parsing the whole application with its dependencies (and 
sometimes you have to run it anyway).

As far as xref is concerned, qualified symbol names are currently an 
implementation detail: some backends might operate them under the 
covers, but the API stays ignorant.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
       [not found]           ` <CAJnXXog5fO_h5UNnVR67EJtT+u7+G-BVMFV3FnJgK=weGj0m_w@mail.gmail.com>
@ 2016-01-04  2:01             ` Dmitry Gutov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  2:01 UTC (permalink / raw)
  To: John Yates; +Cc: emacs-devel

(Cc-ing emacs-devel)

On 01/04/2016 03:46 AM, John Yates wrote:

> I think that you are confusing issues of syntax and symbol resolution.

I'd put the question this way: should the symbol correspond more to an 
atomic expression in a given language, or should it be the "name" of the 
identifier or atom denoted by the expression.

To give a distant example: in Perl an PHP, you usually declare and use a 
variable by prefixing its name with $. Should $ be a symbol constituent? 
Both perl-mode and cperl-mode say no.

>   emacs' notion of symbol is purely syntactic.

And that's the model I'm trying to work in. Again, I'm not trying to 
determine qualified names.

> Starting with the the
> current symbol collection framework you could build a purely syntactic
> model of qualified names that should cover a very large set of
> contemporary languages.

I'm not sure how you think I could do that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
       [not found]         ` <CAJnXXojy1b6LUdXcC+cDVPYT-OJMXCE8m8yqObE9oUYwU_PGbg@mail.gmail.com>
@ 2016-01-04  2:34           ` Dmitry Gutov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  2:34 UTC (permalink / raw)
  To: John Yates; +Cc: emacs-devel

On 01/04/2016 04:21 AM, John Yates wrote:
> On Sun, Jan 3, 2016 at 8:35 PM, Dmitry Gutov <dgutov@yandex.ru
> <mailto:dgutov@yandex.ru>> wrote:
>  >
>  > => [:<=>, :==, :===, :eql?, :hash, :casecmp, :+, :*, :%, :[], :[]=,
> :insert, :length, :size, :bytesize, :empty?, ...]
>
> It seems similar to C++'s operator keyword.  Am I getting closer?  Is
> the colon required to abut the subsequent characters or can one write :
> <=> (note intervening space)?

No space allowed. See this usage example:

irb(main):003:0> "abc" == "def"
=> false
irb(main):004:0> String.instance_method(:==)
=> #<UnboundMethod: String#==>
irb(main):005:0> "abc".size
=> 3
irb(main):006:0> String.instance_method(:size)
=> #<UnboundMethod: String#size>
irb(main):007:0> "abc".method(:size)
=> #<Method: String#size>
irb(main):008:0> "abc".method(:size).call
=> 3

You can have Symbols with any name, though. So they are not tied to 
methods, variables or anything.

Here's a good explanation: 
http://www.randomhacks.net/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/, 
in particular, comparison #6 rings true: Ruby Symbols are similar to 
Lisp symbols. In Lisp, one can reference a dynamically bound variable, 
or call a function, using a symbol:

(let ((sym 'car)) (funcall sym nil))

or even

(let ((sym 'car)) (funcall (symbol-function sym) nil))

which is similar to "abc".method(:size).call I've tried above.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  1:14       ` Dmitry Gutov
@ 2016-01-04  2:56         ` Stefan Monnier
  2016-01-04  3:47           ` Dmitry Gutov
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Monnier @ 2016-01-04  2:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> In c++-mode, `std::cout' is two separate symbols, so I'm going to follow
> that model.

>> A::B is the qualified name of a symbol

> Yes, and methods have qualified names like A::B#foo or A::B.bar, but we
> don't make `#' or `.' symbol constituents.

Right, I think in general you'll be better off to err on the side of
having Emacs symbols be "too short" (and hence having to grow them by
combining Emacs symbols with surrounding chars or surrounding Emacs
symbols) rather than having Emacs symbols be "too long" (and hence
having to parse the inside of symbols rather than treat them as "atomic"
identifiers).

        Stefan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Guidelines for the "symbol" syntax class
  2016-01-04  2:56         ` Stefan Monnier
@ 2016-01-04  3:47           ` Dmitry Gutov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Gutov @ 2016-01-04  3:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 01/04/2016 04:56 AM, Stefan Monnier wrote:

> Right, I think in general you'll be better off to err on the side of
> having Emacs symbols be "too short" (and hence having to grow them by
> combining Emacs symbols with surrounding chars or surrounding Emacs
> symbols) rather than having Emacs symbols be "too long" (and hence
> having to parse the inside of symbols rather than treat them as "atomic"
> identifiers).

Thanks, Stefan. I guess I'll try doing that for both constants and 
Symbols, and will handle ':' in Symbol literals like perl-mode does with 
'$' or '@' (not sure of the reason for the difference between the two, yet).

And easy-kill works fine with it, because the default "thing" it 
interacts with is actually sexps, not symbols.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-01-04  3:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-03  5:09 Guidelines for the "symbol" syntax class Dmitry Gutov
2016-01-03 22:56 ` John Wiegley
2016-01-04  0:46   ` Dmitry Gutov
2016-01-04  0:51     ` Stefan Monnier
2016-01-04  0:58       ` Dmitry Gutov
2016-01-04  1:13       ` John Yates
2016-01-04  1:18         ` Dmitry Gutov
     [not found]           ` <CAJnXXog5fO_h5UNnVR67EJtT+u7+G-BVMFV3FnJgK=weGj0m_w@mail.gmail.com>
2016-01-04  2:01             ` Dmitry Gutov
2016-01-04  0:55     ` John Wiegley
2016-01-04  1:14       ` Dmitry Gutov
2016-01-04  2:56         ` Stefan Monnier
2016-01-04  3:47           ` Dmitry Gutov
     [not found]     ` <CAJnXXogonsWpqadNpX0BijzoiztorYP1d=b31seBfvGVBwwT_Q@mail.gmail.com>
     [not found]       ` <5689CC5C.4000408@yandex.ru>
     [not found]         ` <CAJnXXojy1b6LUdXcC+cDVPYT-OJMXCE8m8yqObE9oUYwU_PGbg@mail.gmail.com>
2016-01-04  2:34           ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).