bug#22241: 25.0.50; etags Ruby parser problems

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

From: Dmitry Gutov <dgutov@yandex.ru>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 22241@debbugs.gnu.org
Subject: bug#22241: 25.0.50; etags Ruby parser problems
Date: Sat, 23 Jan 2016 21:23:57 +0300	[thread overview]
Message-ID: <56A3C53D.1050408@yandex.ru> (raw)
In-Reply-To: <83si1o45g1.fsf@gnu.org>

On 01/23/2016 07:38 PM, Eli Zaretskii wrote:

> I don't speak Ruby.  So please give a more detailed spec for the
> features you want added.  I wrote some questions below, but I'm quite
> sure there are more questions I should ask, but don't know about.  So
> please provide as complete specification for each feature as you
> possibly can, TIA.

There's no actual up-to-date language spec, and when in doubt, I fire up 
the REPL and try things out (and forget many of the results afterwards). 
So there's no "detailed spec" in my head. Let me just try my best 
answering your questions, for now.

>> - Constants are not indexed.
>
> What is the full syntax of a "constant"?  Is it just
>
>    IDENTIFIER "=" INTEGER-NUMBER

Pretty much. IDENTIFIER should be ALL_CAPS, or CamelCase, with 
underscores allowed.

INTEGER-NUMBER should be just EXPRESSION, because it can be any 
expression, possibly a multiline one.

CamelCase constants usually are assigned some "anonymous class" value, 
like in the following example:

SpecialError = Class.new(StandardError)

(Which is a metaprogramming-y way to define the class SpecialError).

But you probably shouldn't worry about ALL_CAPS vs CamelCase distinction 
here, and just treat them the same.

> ?  Is whitespace significant?  What about newlines?

No spaces around "=" is fine. Spaces can also be replaced by tabs. A 
newline before "=" is not allowed.

>> - Class methods (def self.foo) are given the wrong name ("self."
>>    shouldn't be included).
>
> Is it enough to remove a single "self.", case-sensitive, at the
> beginning of an identifier?  Can there be more than one, like
> "self.self.SOMETHING"?

One one "self." is allowed. When you remove it, you should record that 
SOMETHING is a method defined on the current class (or module). In Java 
terms, say, it would be like "static" method.

The upshot is, it can be called on the class itself, but not on its 
instance:

irb(main):001:0> class C
irb(main):002:1> def self.foo
irb(main):003:2> 3
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> C.foo
=> 3
irb(main):007:0> C.new.foo
NoMethodError: undefined method `foo' for #<C:0x000000020141e8>

So the qualified name of that method should be "C.foo", as opposed to 
"C#foo" for an instance method.

> Your other example, i.e.
>
>    def ModuleExample.singleton_module_method
>
> indicates that anything up to and including the period should be
> removed, is that correct?

More or less. This is an "explicit syntax", which is equivalent to using 
"self.". These two declarations are equivalent:

module ModuleExample
   def ModuleExample.foo
   end
end

module ModuleExample
   def self.foo
   end
end

> Is there only one, or can there be many?

There can be only one dot there. There could be a method resolution 
operator (::) in there, I suppose, but I'm not sure if you want to add 
support for that right now, or ever.

> Should they all be removed for an unqualified name?

Yes.

>> - "class << self" blocks are given a separate entry.
>
> What should be done instead?  Can't a class be named "<<"?

A class cannot be named "<<". You should not add that line to the index, 
but record that the method definitions inside the following scope are 
defined on the current class or module. These are equivalent:

class C
   def self.foo
   end
end

class C
   class << self
     def foo
     end
   end
end

>> - Qualified tag names are never generated.
>
> (Etags never promised qualified names except for C and derived
> languages, and also in Java.)

OK, that would be a nice bonus, but we can live without it. ctags 
doesn't define qualified names either.

Without qualified names, I suppose you should treat

def self.foo
end

and

def foo
end

and

def Class.foo
end

the same. Only record those as "foo".

> How to know when a module's or a class's scope ends?  Is it enough to
> count "end" lines?

Hmm, maybe? I'm guessing etags doesn't really handle heredoc syntax, or 
multiline strings defined with percent literals (examples here: 
https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#.22Here_document.22_notation)

The result shouldn't be too bad if you do that, anyway. Except:

> Can I assume that "end" will always appear by
> itself on a line?

Unfortunately, no. It can also be on the same line, after a semicolon 
(or on any other line, I suppose, but nobody writes Ruby like that). 
Examples:

class SpecialError < StandardError; end

or

class MyStruct < Struct.new(:a, :b, :c); end

(One could also stick a method definition inside that, but I haven't 
seen that in practice yet). So, either:

- 'end' is on a separate line (after ^[ \t]*).
- class/module Name[< ]...; end$

'end' can also be followed by "# some comment" in both cases.

> Can I disregard indentation of "end" (and of
> everything else) when I determine where a scope begins and ends?

Probably, yes.

Indentation is not significant in Ruby, but heredocs can mess up the 
detection of 'end' keywords, so we could use indentation as a way to 
detect where each scope ends. But if etags doesn't normally do that, 
let's not go there now.

>> A
>> A::B
>> A::B::ABC
>> A::B#foo!
>> A::B.bar?
>> A::B.qux=
>
> Why did 'foo!' get a '#' instead of a '.', as for '_bar'?

It's common to use '#' in the qualified names of instance methods, in 
Java, Ruby and JS docstrings. '.' is used for class methods (static 
methods, in Java), or methods defined on other singleton objects.

Examples:

http://usejsdoc.org/tags-inline-link.html (search for '#' there)
http://stackoverflow.com/questions/5915992/javadoc-writing-links-to-methods
http://docs.ruby-lang.org/en/2.1.0/RDoc/Markup.html#class-RDoc::Markup-label-Links 
(the documentation also says to use ":: for class methods", but let's 
not do that)

 > Why doesn't
 > "class << self" count as a class scope, and add something to qualified
 > names?

It just served to turn 'qux=' into a class (static) method.

>> should become (the unqualified version):
>>
>> A
>> foo
>> bar=
>> tee
>> tee=
>> qux
>>
>> All attr_* methods can take a variable number of arguments. The parser
>> should take each argument, check that it's a symbol and not a variable
>> (starts with :), and if so, record the corresponding method name.
>
> Why did 'bar' and 'tee' git a '=' appended?

Because 'attr_writer :bar' effectively expands to

def bar=(val)
   @bar = val
end

and 'attr_accessor :tee' expands into

def tee
   @tee
end

def tee=(val)
   @tee = val
end

> Are there any other such "append rules"?

There are other macros (any code can define a macro), but let's not 
worry about them now.

next prev parent reply	other threads:[~2016-01-23 18:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-26  3:59 bug#22241: 25.0.50; etags Ruby parser problems Dmitry Gutov
2015-12-26  4:13 ` Dmitry Gutov
2015-12-26  4:34 ` Dmitry Gutov
2016-01-23 16:38 ` Eli Zaretskii
2016-01-23 18:23   ` Dmitry Gutov [this message]
2016-01-23 18:59     ` Eli Zaretskii
2016-01-23 19:29       ` Dmitry Gutov
2016-01-23 20:48         ` Eli Zaretskii
2016-01-23 21:43           ` Dmitry Gutov
2016-01-24 15:44             ` Eli Zaretskii
2016-01-30 12:21               ` Eli Zaretskii
2016-01-30 22:06                 ` Dmitry Gutov
2016-01-31  3:37                   ` Eli Zaretskii
2016-01-31  5:43                     ` Dmitry Gutov
2016-01-31 18:11                       ` Eli Zaretskii
2016-02-01  8:40                         ` Dmitry Gutov
2016-02-02 18:16                           ` Eli Zaretskii
2016-02-02 19:59                             ` Dmitry Gutov
2016-02-03 16:26                               ` Eli Zaretskii
2016-02-03 23:21                                 ` Dmitry Gutov
2016-02-04  3:43                                   ` Eli Zaretskii
2016-02-04  8:24                                     ` Dmitry Gutov
2016-02-04 17:24                                       ` Eli Zaretskii
2016-02-04 20:06                                         ` Dmitry Gutov
2016-01-31 18:01                     ` Eli Zaretskii
2016-02-01  8:24                       ` Dmitry Gutov
2016-02-02 18:13                         ` Eli Zaretskii
2016-01-30 10:52     ` Eli Zaretskii
2016-01-30 16:43       ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A3C53D.1050408@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=22241@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).