unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier
@ 2023-01-02  4:52 Mohammed Sadiq
  2023-01-02 12:15 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mohammed Sadiq @ 2023-01-02  4:52 UTC (permalink / raw)
  To: 60484

Short tokens are not identified as type_identifier in GNU Emacs
c-ts-mode, but does work fine with tree-sitter playground[0].

Say for example, 'a_type' in an empty buffer is identified as a
type_identifier in tree-sitter playground, but not in c-ts-mode,
while say, some longer tokens like 'window_type' is identified as
type_identifier.


[0] https://tree-sitter.github.io/tree-sitter/playground


In GNU Emacs 29.0.60 (build 5, x86_64-pc-linux-gnu, GTK+ Version
  3.24.35, cairo version 1.16.0) of 2023-01-02 built on purism
Repository revision: 2569ede9c496bb060e0b88428cb541088aaba1f9
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 
11.0.12101004
System Description: Debian GNU/Linux bookworm/sid





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier
  2023-01-02  4:52 bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier Mohammed Sadiq
@ 2023-01-02 12:15 ` Eli Zaretskii
  2023-01-02 12:43   ` Mohammed Sadiq
  2023-01-02 22:41 ` Yuan Fu
  2023-01-08  0:57 ` Yuan Fu
  2 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-01-02 12:15 UTC (permalink / raw)
  To: Mohammed Sadiq; +Cc: 60484

> Date: Mon, 02 Jan 2023 10:22:09 +0530
> From: Mohammed Sadiq <sadiq@sadiqpk.org>
> 
> Short tokens are not identified as type_identifier in GNU Emacs
> c-ts-mode, but does work fine with tree-sitter playground[0].
> 
> Say for example, 'a_type' in an empty buffer is identified as a
> type_identifier in tree-sitter playground, but not in c-ts-mode,
> while say, some longer tokens like 'window_type' is identified as
> type_identifier.

Where is it written that FOO_type is a type identifier?  is this
something new in some recent C Standard?  Or is it just a popular
convention?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier
  2023-01-02 12:15 ` Eli Zaretskii
@ 2023-01-02 12:43   ` Mohammed Sadiq
  2023-01-02 12:45     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Mohammed Sadiq @ 2023-01-02 12:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60484

On 2023-01-02 17:45, Eli Zaretskii wrote:
>> Date: Mon, 02 Jan 2023 10:22:09 +0530
>> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>> 
>> Short tokens are not identified as type_identifier in GNU Emacs
>> c-ts-mode, but does work fine with tree-sitter playground[0].
>> 
>> Say for example, 'a_type' in an empty buffer is identified as a
>> type_identifier in tree-sitter playground, but not in c-ts-mode,
>> while say, some longer tokens like 'window_type' is identified as
>> type_identifier.
> 
> Where is it written that FOO_type is a type identifier?  is this
> something new in some recent C Standard?  Or is it just a popular
> convention?

'a_type' was just a made up example, it can be any valid token, say
'g_file', or whatever.  I was pointing out a disparity in handling of
some token in c-ts-mode and tree-sitter: tree-sitter identifiers a 6
byte length token as an identifier, but c-ts-mode requires it to be
at least 11 byte sized for custom types.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier
  2023-01-02 12:43   ` Mohammed Sadiq
@ 2023-01-02 12:45     ` Eli Zaretskii
  2023-01-02 13:20       ` Mohammed Sadiq
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2023-01-02 12:45 UTC (permalink / raw)
  To: Mohammed Sadiq; +Cc: 60484

> Date: Mon, 02 Jan 2023 18:13:34 +0530
> From: Mohammed Sadiq <sadiq@sadiqpk.org>
> Cc: 60484@debbugs.gnu.org
> 
> On 2023-01-02 17:45, Eli Zaretskii wrote:
> >> Date: Mon, 02 Jan 2023 10:22:09 +0530
> >> From: Mohammed Sadiq <sadiq@sadiqpk.org>
> >> 
> >> Short tokens are not identified as type_identifier in GNU Emacs
> >> c-ts-mode, but does work fine with tree-sitter playground[0].
> >> 
> >> Say for example, 'a_type' in an empty buffer is identified as a
> >> type_identifier in tree-sitter playground, but not in c-ts-mode,
> >> while say, some longer tokens like 'window_type' is identified as
> >> type_identifier.
> > 
> > Where is it written that FOO_type is a type identifier?  is this
> > something new in some recent C Standard?  Or is it just a popular
> > convention?
> 
> 'a_type' was just a made up example, it can be any valid token, say
> 'g_file', or whatever.  I was pointing out a disparity in handling of
> some token in c-ts-mode and tree-sitter: tree-sitter identifiers a 6
> byte length token as an identifier, but c-ts-mode requires it to be
> at least 11 byte sized for custom types.

I'm not sure I see a problem here.  It sounds like different
heuristics to me.  Nothing says that g_file is a type, only its
parsing can tell.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier
  2023-01-02 12:45     ` Eli Zaretskii
@ 2023-01-02 13:20       ` Mohammed Sadiq
  0 siblings, 0 replies; 7+ messages in thread
From: Mohammed Sadiq @ 2023-01-02 13:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60484

On 2023-01-02 18:15, Eli Zaretskii wrote:
>> Date: Mon, 02 Jan 2023 18:13:34 +0530
>> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>> Cc: 60484@debbugs.gnu.org
>> 
>> On 2023-01-02 17:45, Eli Zaretskii wrote:
>> >> Date: Mon, 02 Jan 2023 10:22:09 +0530
>> >> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>> >>
>> >> Short tokens are not identified as type_identifier in GNU Emacs
>> >> c-ts-mode, but does work fine with tree-sitter playground[0].
>> >>
>> >> Say for example, 'a_type' in an empty buffer is identified as a
>> >> type_identifier in tree-sitter playground, but not in c-ts-mode,
>> >> while say, some longer tokens like 'window_type' is identified as
>> >> type_identifier.
>> >
>> > Where is it written that FOO_type is a type identifier?  is this
>> > something new in some recent C Standard?  Or is it just a popular
>> > convention?
>> 
>> 'a_type' was just a made up example, it can be any valid token, say
>> 'g_file', or whatever.  I was pointing out a disparity in handling of
>> some token in c-ts-mode and tree-sitter: tree-sitter identifiers a 6
>> byte length token as an identifier, but c-ts-mode requires it to be
>> at least 11 byte sized for custom types.
> 
> I'm not sure I see a problem here.  It sounds like different
> heuristics to me.  Nothing says that g_file is a type, only its
> parsing can tell.

well, c-ts-mode uses tree-sitter-c under the hood, so it's not supposed 
to
behave differently.

Anyway, my use case is that I use ';' after a token to convert the 
preceding
token (if type) to a camel case and insert a *.

So: typing g_file; shall be converted to GFile * (but it works only if 
g_file
is identified as a type identifier, which is not the case now). I 
already does
'_' on SPC and S-SPC shall insert a real space.  Sorry for that my use 
case
could be a really weird one. ;)


cheers,
Mohammed Sadiq





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified  as type_identifier
  2023-01-02  4:52 bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier Mohammed Sadiq
  2023-01-02 12:15 ` Eli Zaretskii
@ 2023-01-02 22:41 ` Yuan Fu
  2023-01-08  0:57 ` Yuan Fu
  2 siblings, 0 replies; 7+ messages in thread
From: Yuan Fu @ 2023-01-02 22:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 60484, Mohammed Sadiq


Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Mon, 02 Jan 2023 18:13:34 +0530
>> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>> Cc: 60484@debbugs.gnu.org
>> 
>> On 2023-01-02 17:45, Eli Zaretskii wrote:
>> >> Date: Mon, 02 Jan 2023 10:22:09 +0530
>> >> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>> >> 
>> >> Short tokens are not identified as type_identifier in GNU Emacs
>> >> c-ts-mode, but does work fine with tree-sitter playground[0].
>> >> 
>> >> Say for example, 'a_type' in an empty buffer is identified as a
>> >> type_identifier in tree-sitter playground, but not in c-ts-mode,
>> >> while say, some longer tokens like 'window_type' is identified as
>> >> type_identifier.
>> > 
>> > Where is it written that FOO_type is a type identifier?  is this
>> > something new in some recent C Standard?  Or is it just a popular
>> > convention?
>> 
>> 'a_type' was just a made up example, it can be any valid token, say
>> 'g_file', or whatever.  I was pointing out a disparity in handling of
>> some token in c-ts-mode and tree-sitter: tree-sitter identifiers a 6
>> byte length token as an identifier, but c-ts-mode requires it to be
>> at least 11 byte sized for custom types.
>
> I'm not sure I see a problem here.  It sounds like different
> heuristics to me.  Nothing says that g_file is a type, only its
> parsing can tell.

The parse tree of a buffer with only a_type in it is this:

(translation_unit (ERROR (identifier)))

So tree-sitter-c parses it as a parse error instead of a type. I suppose
the difference is due to different version of tree-sitter-c used by
Emacs (the latest) and the tree-sitter playground website? Maybe the
playground is using an older version. The "cutoff" point for the
playground version seems to be 5 bytes: a_typ is considered an error but
a_type a type.

Yuan





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#60484: 29.0.60; c-ts-mode: short tokens are not identified  as type_identifier
  2023-01-02  4:52 bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier Mohammed Sadiq
  2023-01-02 12:15 ` Eli Zaretskii
  2023-01-02 22:41 ` Yuan Fu
@ 2023-01-08  0:57 ` Yuan Fu
  2 siblings, 0 replies; 7+ messages in thread
From: Yuan Fu @ 2023-01-08  0:57 UTC (permalink / raw)
  To: 60484-done; +Cc: sadiq


Yuan Fu <casouri@gmail.com> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> Date: Mon, 02 Jan 2023 18:13:34 +0530
>>> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>>> Cc: 60484@debbugs.gnu.org
>>> 
>>> On 2023-01-02 17:45, Eli Zaretskii wrote:
>>> >> Date: Mon, 02 Jan 2023 10:22:09 +0530
>>> >> From: Mohammed Sadiq <sadiq@sadiqpk.org>
>>> >> 
>>> >> Short tokens are not identified as type_identifier in GNU Emacs
>>> >> c-ts-mode, but does work fine with tree-sitter playground[0].
>>> >> 
>>> >> Say for example, 'a_type' in an empty buffer is identified as a
>>> >> type_identifier in tree-sitter playground, but not in c-ts-mode,
>>> >> while say, some longer tokens like 'window_type' is identified as
>>> >> type_identifier.
>>> > 
>>> > Where is it written that FOO_type is a type identifier?  is this
>>> > something new in some recent C Standard?  Or is it just a popular
>>> > convention?
>>> 
>>> 'a_type' was just a made up example, it can be any valid token, say
>>> 'g_file', or whatever.  I was pointing out a disparity in handling of
>>> some token in c-ts-mode and tree-sitter: tree-sitter identifiers a 6
>>> byte length token as an identifier, but c-ts-mode requires it to be
>>> at least 11 byte sized for custom types.
>>
>> I'm not sure I see a problem here.  It sounds like different
>> heuristics to me.  Nothing says that g_file is a type, only its
>> parsing can tell.
>
> The parse tree of a buffer with only a_type in it is this:
>
> (translation_unit (ERROR (identifier)))
>
> So tree-sitter-c parses it as a parse error instead of a type. I suppose
> the difference is due to different version of tree-sitter-c used by
> Emacs (the latest) and the tree-sitter playground website? Maybe the
> playground is using an older version. The "cutoff" point for the
> playground version seems to be 5 bytes: a_typ is considered an error but
> a_type a type.
>
> Yuan

Since it’s a parser problem, I’m closing this.

Yuan





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-01-08  0:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-02  4:52 bug#60484: 29.0.60; c-ts-mode: short tokens are not identified as type_identifier Mohammed Sadiq
2023-01-02 12:15 ` Eli Zaretskii
2023-01-02 12:43   ` Mohammed Sadiq
2023-01-02 12:45     ` Eli Zaretskii
2023-01-02 13:20       ` Mohammed Sadiq
2023-01-02 22:41 ` Yuan Fu
2023-01-08  0:57 ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).