unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#38104: 27.0.50; elixir-mode fontification is very slow
@ 2019-11-07 15:40 Dmitry Gutov
  2019-11-26 16:26 ` Dmitry Gutov
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Gutov @ 2019-11-07 15:40 UTC (permalink / raw)
  To: 38104

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

I haven't been able to track this to a particular component (e.g. a
regexp) for now, but font-lock-fontify-region is now considerably slower
than it was in Emacs 26 (at least at revision cb8fb597e5bf4f14).

To reproduce: install elixir-mode (e.g. from MELPA Stable):

(add-to-list 'package-archives
              '("melpa-stable" . "https://stable.melpa.org/packages/") t)

M-x list-packages, install elixir-mode.

Savet the attached tiny.__ex__ as tiny.ex.

Visit tiny.ex.

Eval: (benchmark 1 '(font-lock-fontify-region (point-min) (point-max))).

"Elapsed time: 0.158824s"

With larger files, the times are much longer.

I had a break from Elixir, so I noticed this only now.

In GNU Emacs 27.0.50 (build 11, x86_64-pc-linux-gnu, GTK+ Version 3.24.8)
  of 2019-11-05 built on potemkin
Repository revision: dd19cc3aa16ccc441a8a2bfcdeb3005a6eef2543
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
System Description: Ubuntu 19.04

[-- Attachment #2: tiny.__ex__ --]
[-- Type: text/plain, Size: 162 bytes --]

# module_name.ex
defmodule ModuleName do
  def message({to, %IncidentMessage{title: _t, body: msg}}) do
    msg
    |> message_body
    |> send_message
  end
end

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-07 15:40 bug#38104: 27.0.50; elixir-mode fontification is very slow Dmitry Gutov
@ 2019-11-26 16:26 ` Dmitry Gutov
  2019-11-26 16:30   ` Dmitry Gutov
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dmitry Gutov @ 2019-11-26 16:26 UTC (permalink / raw)
  To: 38104; +Cc: Mattias Engdegård

I did a 'git bisect', and it came down to:

   commit 2ed71227c626c6cfdc684948644ccf3d9eaeb15b
   Author: Mattias Engdegård <mattiase@acm.org>
   Date:   Wed Sep 25 14:29:50 2019 -0700

       New rx implementation

Mattias, could you look into it?

elixir-mode does use rx, heavily. Albeit with a thin wrapper.

To be clear, elixir-mode is quite unusable now.

On 07.11.2019 17:40, Dmitry Gutov wrote:
> I haven't been able to track this to a particular component (e.g. a
> regexp) for now, but font-lock-fontify-region is now considerably slower
> than it was in Emacs 26 (at least at revision cb8fb597e5bf4f14).
> 
> To reproduce: install elixir-mode (e.g. from MELPA Stable):
> 
> (add-to-list 'package-archives
>               '("melpa-stable" . "https://stable.melpa.org/packages/") t)
> 
> M-x list-packages, install elixir-mode.
> 
> Savet the attached tiny.__ex__ as tiny.ex.
> 
> Visit tiny.ex.
> 
> Eval: (benchmark 1 '(font-lock-fontify-region (point-min) (point-max))).
> 
> "Elapsed time: 0.158824s"
> 
> With larger files, the times are much longer.
> 
> I had a break from Elixir, so I noticed this only now.
> 
> In GNU Emacs 27.0.50 (build 11, x86_64-pc-linux-gnu, GTK+ Version 3.24.8)
>   of 2019-11-05 built on potemkin
> Repository revision: dd19cc3aa16ccc441a8a2bfcdeb3005a6eef2543
> Repository branch: master
> Windowing system distributor 'The X.Org Foundation', version 11.0.12004000
> System Description: Ubuntu 19.04






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-26 16:26 ` Dmitry Gutov
@ 2019-11-26 16:30   ` Dmitry Gutov
  2019-11-26 16:59   ` Mattias Engdegård
  2019-11-26 19:32   ` Mattias Engdegård
  2 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2019-11-26 16:30 UTC (permalink / raw)
  To: 38104; +Cc: Mattias Engdegård

On 26.11.2019 18:26, Dmitry Gutov wrote:
> elixir-mode does use rx, heavily. Albeit with a thin wrapper.

And one more thing: this wrapper makes use of rx-constituents.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-26 16:26 ` Dmitry Gutov
  2019-11-26 16:30   ` Dmitry Gutov
@ 2019-11-26 16:59   ` Mattias Engdegård
  2019-11-26 17:03     ` Dmitry Gutov
  2019-11-26 19:32   ` Mattias Engdegård
  2 siblings, 1 reply; 7+ messages in thread
From: Mattias Engdegård @ 2019-11-26 16:59 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 38104

26 nov. 2019 kl. 17.26 skrev Dmitry Gutov <dgutov@yandex.ru>:

> Mattias, could you look into it?

Thanks, will have a look. By the way, when byte-compiling I get complaints about 'looking-back' being called with too few arguments in elixir-smie.el; may want to do something about that.






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-26 16:59   ` Mattias Engdegård
@ 2019-11-26 17:03     ` Dmitry Gutov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2019-11-26 17:03 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 38104

On 26.11.2019 18:59, Mattias Engdegård wrote:
> By the way, when byte-compiling I get complaints about 'looking-back' being called with too few arguments in elixir-smie.el; may want to do something about that.

Yes, I have a few other pending patches for that project as well.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-26 16:26 ` Dmitry Gutov
  2019-11-26 16:30   ` Dmitry Gutov
  2019-11-26 16:59   ` Mattias Engdegård
@ 2019-11-26 19:32   ` Mattias Engdegård
  2019-11-27 21:58     ` Dmitry Gutov
  2 siblings, 1 reply; 7+ messages in thread
From: Mattias Engdegård @ 2019-11-26 19:32 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 38104

26 nov. 2019 kl. 17.26 skrev Dmitry Gutov <dgutov@yandex.ru>:

> elixir-mode does use rx, heavily. Albeit with a thin wrapper.

As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el, we have

      (identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
                          (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
                          (optional (or "?" "!"))))

First, this regex is suboptimal: the first character of an identifier should occur exactly once, or you get bad backtracking behaviour. Just remove the one-or-more construct:

      (identifiers . ,(rx (any "A-Z" "a-z" "_")
                          (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
                          (optional (or "?" "!"))))

This definition is then used in several places, but two in particular are of interest to us:

    ;; Module attributes
    (,(elixir-rx (and "@" (1+ identifiers)))

The construct (1+ identifiers) was perhaps meant to match multiple identifiers, but it doesn't (no separator); it just matches an identifier in several ways, which again leads to bad backtracking behaviour.
The same problem here:

    ;; Map keys
    (,(elixir-rx (group (and (one-or-more identifiers) ":")) space)

Remove the 1+ and one-or-more and it's fast again.

Why did this "work" with the old rx implementation? Because that code had a nasty bug: it does not bracket definitions in rx-constituents properly. Example:

(let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
  (rx-to-string '(1+ hello) t))
=> "HELLO+"

The new rx implementation does not suffer from this bug.

The result in your case is that the old rx, when translating (1+ identifiers), only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in

"[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"

which is a lot faster, since only the final [!?] is repeated twice (and it probably doesn't match very often).







^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#38104: 27.0.50; elixir-mode fontification is very slow
  2019-11-26 19:32   ` Mattias Engdegård
@ 2019-11-27 21:58     ` Dmitry Gutov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Gutov @ 2019-11-27 21:58 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 38104-done

Hi Mattias,

On 26.11.2019 21:32, Mattias Engdegård wrote:

> As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el, we have
> 
>        (identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
>                            (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
>                            (optional (or "?" "!"))))
> 
> First, this regex is suboptimal: the first character of an identifier should occur exactly once, or you get bad backtracking behaviour. Just remove the one-or-more construct:
> 
>        (identifiers . ,(rx (any "A-Z" "a-z" "_")
>                            (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
>                            (optional (or "?" "!"))))
> 
> This definition is then used in several places, but two in particular are of interest to us:
> 
>      ;; Module attributes
>      (,(elixir-rx (and "@" (1+ identifiers)))
> 
> The construct (1+ identifiers) was perhaps meant to match multiple identifiers, but it doesn't (no separator); it just matches an identifier in several ways, which again leads to bad backtracking behaviour.
> The same problem here:
> 
>      ;; Map keys
>      (,(elixir-rx (group (and (one-or-more identifiers) ":")) space)
> 
> Remove the 1+ and one-or-more and it's fast again.

That makes a lot of sense. I removed these one-or-more's and 1+ (and a 
few others), and it became fast again.

I'll send a patch upstream. Thanks for your help!

(Looking at the tracker, they have a minor version of this change 
submitted already).

> Why did this "work" with the old rx implementation? Because that code had a nasty bug: it does not bracket definitions in rx-constituents properly. Example:
> 
> (let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
>    (rx-to-string '(1+ hello) t))
> => "HELLO+"
> 
> The new rx implementation does not suffer from this bug.
> 
> The result in your case is that the old rx, when translating (1+ identifiers), only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in
> 
> "[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"
> 
> which is a lot faster, since only the final [!?] is repeated twice (and it probably doesn't match very often).

It's funny to think how someone probably beaten the current code into 
submission by trial and error.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-11-27 21:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07 15:40 bug#38104: 27.0.50; elixir-mode fontification is very slow Dmitry Gutov
2019-11-26 16:26 ` Dmitry Gutov
2019-11-26 16:30   ` Dmitry Gutov
2019-11-26 16:59   ` Mattias Engdegård
2019-11-26 17:03     ` Dmitry Gutov
2019-11-26 19:32   ` Mattias Engdegård
2019-11-27 21:58     ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).