* bug#75154: 31.0.50; java-ts-mode. Issues with Indentation
2025-01-02 1:23 ` Stefan Kangas
@ 2025-01-05 11:29 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 4+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:29 UTC (permalink / raw)
To: Stefan Kangas, Eli Zaretskii; +Cc: 75154, Artem
Hello, Artem!
>
> I've identified several issues with indentation in java-ts-mode where it
> doesn't work as expected. Let me first clarify
> what I consider "expected" behavior - there are things that are "GOOD",
> "BAD but allowed", and "Opinion-based.
>
Thanks for taking the time for this.
> Here are the specific problems:
>
> 1) Chaining Methods in the Stream API and Lambda Expressions
> Example 1:
> class Foo {
> void Foo() {
> List<Customer> customers = customer
> | <-- Actual (BAD)
> | <-- Expected (GOOD) (8 spaces)
> }
> }
> When continuing the statement
> List<Customer> customers = customer
> .stream
> and then adding closing parentheses
> List<Customer> customers = customer
> .stream() <-- The method filter will move to 4 spaces automatically
> .filter <-- without parentheses
> .filter() <-- closing bracet and method moving to 4 spaces
>
> This behavior is problematic. In java-mode, this does not occur.IntelliJ
> IDEA uses 8 spaces for method chains,which makes the code more readable.
> While some examples use 2 spaces (e.g., for web snippets),in production
> environments, 8 spaces are more common. This should ideally be customizable for end users.
>
Yes, while I do agree, this is kind of expected. Let me explain:
class Foo {
void Foo() {
List<Customer> customers = customer
| <-- Actual (BAD)
| <-- Expected (GOOD) (8 spaces)
}
}
This case boils down to two things. Personally I disagree with intellij
and the 8 indent case, but that is a stylistic thing, and not very
interesting, compared to the other thing. IIRC when I implemented this
there are some ambiguities when we get these unfinished statements.
Given your snippet here the parse tree is:
```
(program
(class_declaration class name: (identifier)
body:
(class_body {
(method_declaration type: (void_type) name: (identifier)
parameters: (formal_parameters ( ))
body:
(block {
(local_variable_declaration
type:
(generic_type (type_identifier)
(type_arguments < (type_identifier) >))
declarator: (variable_declarator name: (identifier) = value: (identifier))
type: ;)
}))
})))
```
The interesting thing here is the `declarator` line, that ends in
identifier. That identifier is the symol `customer`. To know that we
have to indent here isn't really easy, at least not without interfering
with other constructs. The reason is that while you know that you want
to continue the chain, the parser actually ERRORs:
Given this code,
class Foo {
void Foo() {
List<Customer> customers = customer
.
}
}
the parse tree is
(program
(ERROR class (identifier) { type: (void_type) name: (identifier)
parameters: (formal_parameters ( ))
{
(generic_type (type_identifier)
(type_arguments < (type_identifier) >))
name: (identifier) = (identifier) . } }))
While this is probably catchable, the "workaround" I started using is to
end the statement with a semi, then RET:
```
class Foo {
void Foo() {
List<Customer> customers = customer
|; // Point is now at |
}
}
```
Now we have a valid parse tree, indented correctly, and can continue
chaining. I agree that this very much is a hack, but frankly I just
started doing that and forgot about it.
We could try to implement some acrobatics to fix this, but I'm too
limited for time right now to prioritize this one in particular. Feel
free to explore a patch that I can review!
> Moreover, the current indentation is hardcoded and doesn’t allow
> flexibility. For instance, in python-ts-mode, pressing TAB allows you to
> adjust constructions more freely.
> This level of flexibility would be beneficial in this context
>
> This inflexibility prevents writing common patterns like:
> @Override
> protected void configure(HttpSecurity http) throws Exception {
> http
> .authorizeRequests()
> .antMatchers("/admin/**").hasAuthority("ADMIN")
> .antMatchers("/user").hasAnyAuthority("USER", "ADMIN")
> .antMatchers("/", "/index").permitAll()
> Example 2:
>
> The following looks correct -
>
> public class FloodFill {
> public static void main(String[] args) {
> List<Foo> stream = students.stream()
> .filter(item -> {
> return item.getValue() > 100 &&
> item.isActive();
> })
> .map()
> .collect();
> }
> }
>
> But java-ts-mode produces:
>
> public class FloodFill {
> public static void main(String[] args) {
> List<Foo> stream = students.stream()
> .filter(item -> {
> return item.getValue() > 100 &&
> item.isActive();
> })
Yes, this really annoys me too, so I should try to fix this. I don't
remember from the top of my head what the problem was, but it forced me
either to omit the curlies, or just extract the body into a function and
pass it. But we should absolutely fix this. I believe there were several
difficulties here, where some are related to incomplete statements, and
some is related to treesit.el and the java parser idiosyncrasies. But
this is a valid bug in an of itself. If not too much of a hassle, could
you create a separate bugreport just for this case?
> .map()
> .collect();
> }
> }
>
> 2) Inner Classes
> Example 1:
> public class Outer {
> class Inner {| <-- cursor here moves Inner class unexpectedly
>
> }
> }
> Example 2:
> public class Outer {
> class Inner { // ???
> | <-- cursor here.
> }
> }
>
> Why does this happen? I did not request this behavior. While Example 1
> demonstrates bad code style, it is technically valid. Such "magical"
> formatting should be handled by a Java formatter, not by Emacs or
> Tree-sitter rules.
> IntelliJ IDEA does not apply such formatting; it leaves this task to the formatter.
>
This is due to electric indent. You could disable it yourself in your
config or just live with it. Emacs is by convention quite heavy handed
in trying to keep a consistent indent style, almost to the order of
acting like a formatter, rather than a simple indent offset
calculator. I also was confused when I started using Emacs, but now I
actually like this, as it feels more deterministic than other editors,
for better or worse.
> Example 3:
> public class Outer {
> class Inner{
> void foo(){
> }|<--start position. RET
> |<-- expected position
> |<-- actual
> }
> }
>
> If Inner class has incorrect indentation, subsequent code will also be incorrectly indented.
>
I would rather inverse the statement, and say that subsequent code is
correctly indented, but the preceding code is ignored. As the correct
code is
```
public class Outer {
class Inner{
void foo(){
}
}
}
```
Indentation at column 8 is expected here, IMO. I'm not sure we should
try to work around clearly "incorrectly" indented code.
> 3) for, if, else if, while, do-while without braces
> public class While {
> {
> while ()
> | <-- Expected
> | <-- Actual
> }
> }
> Although this is bad coding style, it’s allowed and compiles
> correctly.
This looks like several issues to me:
1. The block isn't handled correctly, as in it misses a rule. Line two
should be indented 4 spaces, but is indented 0 spaces on my system,
right?
2. The parse tree returns an ERROR, so I think this likely is a parser
bug rather than emacs bug to begin with. Though I believe the case
should be handled when the parser supports this.
```
(program
(class_declaration
(modifiers public)
class name: (identifier)
body:
(class_body {
(block {
(ERROR while ( ))
})
})))
```
Could you supply a separate bug report for this one?
>
> 4) Java 15 text blocks
> Text blocks are not properly handled
>
> public class TextBlocks {
> System.out.println(ctx.fetch("""
> SELECT table_schema, count(*)
> FROM information_schema.tables
> """));
> }
This isn't valid java is it? I changed it to
```
public class TextBlocks {
public void foo() {
System.out.println(ctx.fetch("""
SELECT table_schema, count(*)
FROM information_schema.tables
"""));
}
}
```
>
> - Triple quotes handling (should electric-pair-mode be enhanced?)
Yes, I agree.
> - Text block alignment is opinion-based and should be adjustable with
> TAB
I usually indent stuff like this marking the region then using C-x i
then <left> or <right>
> - New SQL expressions should be sticky
What do you mean here? Align the newlines to the previous line?
> It seems such multiline strings also do not work well, for example:
> "'The time has come,' the Walrus said,\n" +
Not sure I understand what you mean here
>
>
> 5) Broken Syntax Highlighting
>
> public class Outer {
> HELLO EMACS <-- Write something here
> class Inner{ <-- This class will not be highlighted
> void foo(){
> }
>
> }
> }
> Tree-sitter should ignore such uncommon cases to maintain syntax highlighting
This I believe is out of scope, but could be an issue for tree-sitter
upstream. We have to deal with the parse tree we get. How does neovim or
other editors handle this?
>
> 6) Multiple Parameters in Methods
> Example 1
> public record StudentRecord(
> String firstName,
> String lastName,
> Long studentId,
> String email)
What is wrong here?
> Example 2
> public String filterData(@RequestParam(required = false) String name,
> @RequestParam(required = false) String name,
> @RequestParam(required = false) Integer age
> )
> java-ts-mode fails to handle these cases correctly.
>
How so?
> Desired Fontification (Out of the Box):
> - Annotations (@Annotations)
> - Diamond Brackets (<>)
> - Constants, Static Variables, Enum Variables should be highlighted with
> distinct colors and optionally italic font
> - Unused Variables or Classes (Grayed Out)
This feels more like opinions rather than errors, but feature requests
are always welcome, of course. You could try to add patches for some of
these that I can review?
> - Unused variables, unused classes, etc., highlighted in gray. Not sure if this can be achieved
> with Tree-sitter. Anyway, with Flymake + Eglot, it currently works in a somewhat clunky manner.
>
This isn't possible with tree sitter. How is eglot clunky here? I'm kind
of satisfied, at least when the lsp actually marks these.
> Example
> public class TextBlocks {
> enum AnEnum { CONST1, CONST2 } <-- No effect for unused AnEnum.
> public static final String HELLO ="HELLO";
>
> public static void main(String[] args) {
> int i = 0; <-- Flymake identifies as unused but looks unpolished.
> System.out.println(HELLO);
> }
> }
>
This isn't treesit related, but could be a report for flymake
maintainers. We need to get the information from the server, though.
> Overall:
>
> I may have missed some aspects, but as it stands, Emacs is not
> comfortable for Java development with these issues.
>
Can you provide some sort of priority here? Most feel cosmetic, but
there are some real bugs here. One issue I also want, but is not quite
sure how to fix yet is the inline sql. We could try to do some multi
mode sql syntax highlighting here, possibly.
As another data point - I use Emacs for java development as the sole
emacs user on my team in a sea of Intellij users, and I don't really
share the view that it isn't comfortable for Java development. There is
a quirk here and there, sure, but I'm just as productive as everyone
else there. For me the biggest issue is the one case you mentioned here
with nested blocks inside of a chain of methods.
> Some information that might be helpful:
>
> - https://github.com/dakrone/eos/blob/dd8aa3a25b496397dd0162d229de571989668619/eos-java.org?plain=1#L30
> .Not sure why this Elasticsearch developer created so many custom rules for indentation.
> - https://github.com/Michael-Allan/Java_Mode_Tamed A major Java mode
> with sensible fontification.
This is c mode related, so not easy to relate to the treesit mode. But I
wasn't aware of this
> - JetBrains Intellij IDEA Community Edition
> - File>Settings>Editor>Color Scheme>Java Java for understanding which colors are needed and what is missing.
> - Editor>Color Scheme>Java>Code Style>Java for indentation settings.
>
I'm not sure we should consider what Intellij does as a factual
source. Though it is for now the canonical java editor, who knows for
how long, and it feels time consuming to jump after a moving target.
Hope some of these comments are helpful, feel free to either provide
patches or disagree :-)
Theo
^ permalink raw reply [flat|nested] 4+ messages in thread