unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
@ 2022-08-16 14:33 Dmitry Gutov
  2022-08-16 16:54 ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 14:33 UTC (permalink / raw)
  To: 57245

Branching this off from the discussion in bug#56682.

Prerequisite: Have an XML file that is 20 MB in size, and doesn't have
long lines.

Or follow steps 1-3 to create one.

1. wget -o large-file.xml 
https://updates.drupal.org/release-history/drupal/current
2. M-% /> RET ^J/> RET (to break up the long line into smaller pieces)
3. Select the contents of the file and copy them over and over for 99
times. Alternatively, copy them 9 times, then select the result, and
copy it 9 times as well. Save the buffer.

(To try to keep XML valid -- not sure if necessary -- you can only
perform the copying operation on the contents of the <releases> tag. But
that's probably not important. I did that, though.)

4. Kill the buffer and re-visit it again. Press M->.
5. Note the delay.

Here's the profiler output:

         1397  95% - command-execute
         1397  95%  - call-interactively
         1338  91%   - funcall-interactively
         1331  90%    - end-of-buffer
         1327  90%     - recenter
         1327  90%      - jit-lock-function
         1327  90%       - jit-lock-fontify-now
         1327  90%        - jit-lock--run-functions
         1327  90%         - run-hook-wrapped
         1327  90%          - #<compiled -0x14ecf3ff276f01c3>
         1327  90%           - font-lock-fontify-region
         1327  90%            - font-lock-default-fontify-region
         1327  90%             - nxml-extend-region
          845  57%              - skip-syntax-forward
          845  57%               - internal--syntax-propertize
          845  57%                - syntax-propertize
          845  57%                 - nxml-syntax-propertize
          845  57%                  - sgml-syntax-propertize
          842  57%                   - #<compiled 0x1894bdc3ad4ca90>
          479  32%                      sgml--syntax-propertize-ppss
            3   0%                     syntax-ppss
          482  32%              - nxml-move-outside-backwards
          482  32%               - nxml-inside-start
          482  32%                  syntax-ppss
            7   0%    + execute-extended-command
           59   4%   + byte-code
           59   4% + ...
           10   0% + timer-event-handler


In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 
3.24.20, cairo version 1.16.0)
  of 2022-08-16 built on potemkin
Repository revision: 81ff64d3ca8d6e43e976f209399d2a0e9b4a7dd8
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12013000
System Description: Ubuntu 20.04.4 LTS





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 14:33 bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow Dmitry Gutov
@ 2022-08-16 16:54 ` Eli Zaretskii
  2022-08-16 18:40   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-08-16 19:32   ` Dmitry Gutov
  0 siblings, 2 replies; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-16 16:54 UTC (permalink / raw)
  To: Dmitry Gutov, Stefan Monnier; +Cc: 57245

> Date: Tue, 16 Aug 2022 17:33:58 +0300
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> Prerequisite: Have an XML file that is 20 MB in size, and doesn't have
> long lines.
> 
> Or follow steps 1-3 to create one.
> 
> 1. wget -o large-file.xml 
> https://updates.drupal.org/release-history/drupal/current
> 2. M-% /> RET ^J/> RET (to break up the long line into smaller pieces)
> 3. Select the contents of the file and copy them over and over for 99
> times. Alternatively, copy them 9 times, then select the result, and
> copy it 9 times as well. Save the buffer.
> 
> (To try to keep XML valid -- not sure if necessary -- you can only
> perform the copying operation on the contents of the <releases> tag. But
> that's probably not important. I did that, though.)
> 
> 4. Kill the buffer and re-visit it again. Press M->.
> 5. Note the delay.
> 
> Here's the profiler output:
> 
>          1397  95% - command-execute
>          1397  95%  - call-interactively
>          1338  91%   - funcall-interactively
>          1331  90%    - end-of-buffer
>          1327  90%     - recenter
>          1327  90%      - jit-lock-function
>          1327  90%       - jit-lock-fontify-now
>          1327  90%        - jit-lock--run-functions
>          1327  90%         - run-hook-wrapped
>          1327  90%          - #<compiled -0x14ecf3ff276f01c3>
>          1327  90%           - font-lock-fontify-region
>          1327  90%            - font-lock-default-fontify-region
>          1327  90%             - nxml-extend-region
>           845  57%              - skip-syntax-forward
>           845  57%               - internal--syntax-propertize
>           845  57%                - syntax-propertize
>           845  57%                 - nxml-syntax-propertize
>           845  57%                  - sgml-syntax-propertize
>           842  57%                   - #<compiled 0x1894bdc3ad4ca90>
>           479  32%                      sgml--syntax-propertize-ppss
>             3   0%                     syntax-ppss
>           482  32%              - nxml-move-outside-backwards
>           482  32%               - nxml-inside-start
>           482  32%                  syntax-ppss
>             7   0%    + execute-extended-command
>            59   4%   + byte-code
>            59   4% + ...
>            10   0% + timer-event-handler
> 

Thanks.  It looks like some problem in nXML mode or in syntax.c or in
how nXML uses the syntax stuff.  Maybe the code there is simply not
scalable.

Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
here?  What the above profile doesn't show is that this code creates
tons of garbage, so GC is called a lot, and adds its share of
slowdown.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 16:54 ` Eli Zaretskii
@ 2022-08-16 18:40   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-08-16 18:59     ` Eli Zaretskii
  2022-08-16 19:32   ` Dmitry Gutov
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 18:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, Dmitry Gutov

> Thanks.  It looks like some problem in nXML mode or in syntax.c or in
> how nXML uses the syntax stuff.  Maybe the code there is simply not
> scalable.
>
> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> here?

IIRC the sgml/xml/nxml code for syntax-propertize is fairly
costly, indeed.  It can probably be reimplemented in a more efficient
way, but someone will/would have to sit down and think about how to
do that.

[ Some `syntax-propertize-function`s (like cperl-mode's) have been
  "hacked" in an unsatisfactory way by taking
  existing-but-not-fully-understood code and wrapping it so as to make
  it usable for `syntax-propertize-function`.  The result works but it
  could suffer from inefficiencies due to the fact that it's used in
  a different context from the one for which it was designed.
  Some of nXML's code suffers from similar issues, so I thought maybe
  that would be part of the problem, but AFAICT those don't have any
  impact in this case.  ]

IOW, I think the issue is that our syntax tables aren't a good fit for
SGML's syntax, so we just have to work harder than for other modes.

> What the above profile doesn't show is that this code creates
> tons of garbage, so GC is called a lot, and adds its share of
> slowdown.

Hmm... I wonder where this might come from.  As you say, the profile
doesn't show it: 95% in command-execute suggests less than 6% spent in
the GC.


        Stefan






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 18:40   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 18:59     ` Eli Zaretskii
  0 siblings, 0 replies; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-16 18:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 57245, dgutov

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Dmitry Gutov <dgutov@yandex.ru>,  57245@debbugs.gnu.org
> Date: Tue, 16 Aug 2022 14:40:33 -0400
> 
> > What the above profile doesn't show is that this code creates
> > tons of garbage, so GC is called a lot, and adds its share of
> > slowdown.
> 
> Hmm... I wonder where this might come from.

Run the recipe with a watchpoint on consing_until_gc, and you will
see.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 16:54 ` Eli Zaretskii
  2022-08-16 18:40   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 19:32   ` Dmitry Gutov
  2022-08-16 20:22     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-08-17 11:24     ` Eli Zaretskii
  1 sibling, 2 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 19:32 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: 57245

On 16.08.2022 19:54, Eli Zaretskii wrote:
> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> here?

nxml-syntax-propertize might well be heavier than average, but the delay 
scales linearly with the size of the file.

Which seems to be exactly the behavior the "font-lock narrowing" was 
supposed to guard from?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 19:32   ` Dmitry Gutov
@ 2022-08-16 20:22     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-08-16 20:49       ` Dmitry Gutov
  2022-08-17 11:24     ` Eli Zaretskii
  1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 20:22 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, 57245

Dmitry Gutov [2022-08-16 22:32:23] wrote:
> On 16.08.2022 19:54, Eli Zaretskii wrote:
>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>> here?
> nxml-syntax-propertize might well be heavier than average, but the delay
> scales linearly with the size of the file.

Indeed, it should be linear.

> Which seems to be exactly the behavior the "font-lock narrowing" was
> supposed to guard from?

Not sure which narrowing you're referring to.
The "locked narrowing" introduced by Gregory is only installed in the
presence of long lines.  It's (currently) not used for large files
(unless they contain long lines, that is).


        Stefan






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 20:22     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 20:49       ` Dmitry Gutov
  2022-08-16 21:45         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 20:49 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 57245

On 16.08.2022 23:22, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> Dmitry Gutov [2022-08-16 22:32:23] wrote:
>> On 16.08.2022 19:54, Eli Zaretskii wrote:
>>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>>> here?
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
> Indeed, it should be linear.
> 
>> Which seems to be exactly the behavior the "font-lock narrowing" was
>> supposed to guard from?
> Not sure which narrowing you're referring to.
> The "locked narrowing" introduced by Gregory is only installed in the
> presence of long lines.  It's (currently) not used for large files
> (unless they contain long lines, that is).

I guess that's the problem here.

The font-lock narrowing (if it's indeed the method we're going to use to 
speed up its performance) shouldn't be conditioned on the presence of 
long lines.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 20:49       ` Dmitry Gutov
@ 2022-08-16 21:45         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-08-16 22:20           ` Dmitry Gutov
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 21:45 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, 57245

> The font-lock narrowing (if it's indeed the method we're going to use to
> speed up its performance) shouldn't be conditioned on the presence of
> long lines.

font-lock does suffer from long lines, so the current code's handling of
font-lock makes some sense.  But indeed, we all agree it's not
sufficient because it only handles the long lines problem, and we still
need to tackle the case of large buffers, which is related but different.


        Stefan






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 21:45         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 22:20           ` Dmitry Gutov
  2022-08-17 11:36             ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 22:20 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, 57245

On 17.08.2022 00:45, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>> The font-lock narrowing (if it's indeed the method we're going to use to
>> speed up its performance) shouldn't be conditioned on the presence of
>> long lines.
> font-lock does suffer from long lines

Perhaps with when some specific rules are used? Like MATCH-ANCHORED, one 
instance of which I deleted from js-mode a few days ago.

Otherwise, syntax-wholeline-max seems to be doing its job fine: if I 
comment out the narrowing code in handle_fontified_prop (or switch to 
the branch I posted previously), two XML files -- one with long lines 
and one without (the files differ only by addition of newlines) -- show 
approximately the same delay on M->.






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 19:32   ` Dmitry Gutov
  2022-08-16 20:22     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-17 11:24     ` Eli Zaretskii
  2022-08-17 12:14       ` Dmitry Gutov
  2022-08-17 13:20       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 11:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 57245, monnier

> Date: Tue, 16 Aug 2022 22:32:23 +0300
> Cc: 57245@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> On 16.08.2022 19:54, Eli Zaretskii wrote:
> > Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> > here?
> 
> nxml-syntax-propertize might well be heavier than average, but the delay 
> scales linearly with the size of the file.

Which is generally not a good scaling factor, especially if the
coefficient is quite large (as it seems to be in this case).

> Which seems to be exactly the behavior the "font-lock narrowing" was 
> supposed to guard from?

No.  It wasn't supposed to fix modes that foolishly scan the buffer
from BOB to point.  It was supposed to fix modes which scan from the
beginning of line, and that is (a) only a problem when lines are very
long, and (b) much harder to solve in the mode itself, because
font-lock very frequently uses anchored regexps and otherwise likes to
start from BOL, and syntax processing also likes starting from BOL.

Btw, does nXML and/or sgml-mode use libxml for their analysis?  If
not, why not? wouldn't that be faster (and possibly more accurate)?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-16 22:20           ` Dmitry Gutov
@ 2022-08-17 11:36             ` Eli Zaretskii
  2022-08-17 11:46               ` Dmitry Gutov
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 11:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 57245, monnier

> Date: Wed, 17 Aug 2022 01:20:38 +0300
> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I 
> comment out the narrowing code in handle_fontified_prop (or switch to 
> the branch I posted previously), two XML files -- one with long lines 
> and one without (the files differ only by addition of newlines) -- show 
> approximately the same delay on M->.

Doesn't syntax-wholeline-max only affect long lines?  Because I don't
think I see its effect in the XML file where lines were broken by
newlines, and then the file was duplicated 100 times.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 11:36             ` Eli Zaretskii
@ 2022-08-17 11:46               ` Dmitry Gutov
  2022-08-17 12:16                 ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 11:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, monnier

On 17.08.2022 14:36, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 01:20:38 +0300
>> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
>> comment out the narrowing code in handle_fontified_prop (or switch to
>> the branch I posted previously), two XML files -- one with long lines
>> and one without (the files differ only by addition of newlines) -- show
>> approximately the same delay on M->.
> 
> Doesn't syntax-wholeline-max only affect long lines?  Because I don't
> think I see its effect in the XML file where lines were broken by
> newlines, and then the file was duplicated 100 times.

Its purpose is to handle the slowdown which occurred specifically on 
long lines because of 
font-lock-extend-region-functions/syntax-propertize-extend-region-functions. 
Now that it works -- I don't see any particular slowdowns on long lines, 
even with narrowing disabled.

And the performance of M-> depends solely on the size of a file. In my 
XML test files, at least.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 11:24     ` Eli Zaretskii
@ 2022-08-17 12:14       ` Dmitry Gutov
  2022-08-17 12:20         ` Eli Zaretskii
  2022-08-17 13:20       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, monnier

On 17.08.2022 14:24, Eli Zaretskii wrote:
>> Date: Tue, 16 Aug 2022 22:32:23 +0300
>> Cc: 57245@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> On 16.08.2022 19:54, Eli Zaretskii wrote:
>>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>>> here?
>>
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
> 
> Which is generally not a good scaling factor, especially if the
> coefficient is quite large (as it seems to be in this case).

Someone can work on the coefficient, but any accurate parser has to scan 
the buffer from the beginning. At least once.

Migration to tree-sitter might give us a better coefficient later, but 
the principle will remain.

>> Which seems to be exactly the behavior the "font-lock narrowing" was
>> supposed to guard from?
> 
> No.  It wasn't supposed to fix modes that foolishly scan the buffer
> from BOB to point.

You might want to choose words better.

> It was supposed to fix modes which scan from the
> beginning of line, and that is (a) only a problem when lines are very
> long, and (b) much harder to solve in the mode itself, because
> font-lock very frequently uses anchored regexps and otherwise likes to
> start from BOL, and syntax processing also likes starting from BOL.

syntax-wholelines-max handles that problem.

Though it might depend on what you mean by "anchored regexps".

> Btw, does nXML and/or sgml-mode use libxml for their analysis?  If
> not, why not? wouldn't that be faster (and possibly more accurate)?

Might be "a simple matter of coding".

But we do need syntax-propertize to run, so that the user commands can 
rely on proper syntax information in the buffer. It remains to be seen 
whether xml-parse-region is a good base for nxml-syntax-propertize, and 
how much of a performance improvement it can bring (with all the string 
marshaling around).

nxml also probably handles invalid documents better, which might or 
might not be important.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 11:46               ` Dmitry Gutov
@ 2022-08-17 12:16                 ` Eli Zaretskii
  2022-08-17 12:30                   ` Dmitry Gutov
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:16 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 57245, monnier

> Date: Wed, 17 Aug 2022 14:46:46 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> On 17.08.2022 14:36, Eli Zaretskii wrote:
> >> Date: Wed, 17 Aug 2022 01:20:38 +0300
> >> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
> >> From: Dmitry Gutov <dgutov@yandex.ru>
> >>
> >> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
> >> comment out the narrowing code in handle_fontified_prop (or switch to
> >> the branch I posted previously), two XML files -- one with long lines
> >> and one without (the files differ only by addition of newlines) -- show
> >> approximately the same delay on M->.
> > 
> > Doesn't syntax-wholeline-max only affect long lines?
> 
> Its purpose is to handle the slowdown which occurred specifically on 
> long lines because of 
> font-lock-extend-region-functions/syntax-propertize-extend-region-functions. 
> Now that it works -- I don't see any particular slowdowns on long lines, 
> even with narrowing disabled.
> 
> And the performance of M-> depends solely on the size of a file. In my 
> XML test files, at least.

Is that a yes?  Because if it is, then what does this have to do with
the issue of nXML not being scalable enough?






^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 12:14       ` Dmitry Gutov
@ 2022-08-17 12:20         ` Eli Zaretskii
  2022-08-17 12:40           ` Dmitry Gutov
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:20 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 57245, monnier

> Date: Wed, 17 Aug 2022 15:14:07 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> >> Which seems to be exactly the behavior the "font-lock narrowing" was
> >> supposed to guard from?
> > 
> > No.  It wasn't supposed to fix modes that foolishly scan the buffer
> > from BOB to point.
> 
> You might want to choose words better.

I did.

> > It was supposed to fix modes which scan from the
> > beginning of line, and that is (a) only a problem when lines are very
> > long, and (b) much harder to solve in the mode itself, because
> > font-lock very frequently uses anchored regexps and otherwise likes to
> > start from BOL, and syntax processing also likes starting from BOL.
> 
> syntax-wholelines-max handles that problem.

Only for syntax-related stuff.  And we have yet to see whether it's a
good enough job: that feature is too young to be sure.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 12:16                 ` Eli Zaretskii
@ 2022-08-17 12:30                   ` Dmitry Gutov
  2022-08-17 12:33                     ` Eli Zaretskii
  0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, monnier

On 17.08.2022 15:16, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 14:46:46 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 17.08.2022 14:36, Eli Zaretskii wrote:
>>>> Date: Wed, 17 Aug 2022 01:20:38 +0300
>>>> Cc: Eli Zaretskii<eliz@gnu.org>,57245@debbugs.gnu.org
>>>> From: Dmitry Gutov<dgutov@yandex.ru>
>>>>
>>>> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
>>>> comment out the narrowing code in handle_fontified_prop (or switch to
>>>> the branch I posted previously), two XML files -- one with long lines
>>>> and one without (the files differ only by addition of newlines) -- show
>>>> approximately the same delay on M->.
>>> Doesn't syntax-wholeline-max only affect long lines?
>> Its purpose is to handle the slowdown which occurred specifically on
>> long lines because of
>> font-lock-extend-region-functions/syntax-propertize-extend-region-functions.
>> Now that it works -- I don't see any particular slowdowns on long lines,
>> even with narrowing disabled.
>>
>> And the performance of M-> depends solely on the size of a file. In my
>> XML test files, at least.
> Is that a yes?

Yes, it's a "yes".

Stefan said:

 > font-lock does suffer from long lines, so the current code's handling
 > of font-lock makes some sense

Meaning that narrowing around font-lock on long lines makes sense.

And I replied that no, syntax-wholelines-max should be dealing with long 
lines issues in font-lock already.

 >  Because if it is, then what does this have to do with
 > the issue of nXML not being scalable enough?

Narrowing around font-lock shouldn't be conditioned on the presence of 
long lines. It either should be done unconditionally (with larger 
radius, I guess), or not at all.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 12:30                   ` Dmitry Gutov
@ 2022-08-17 12:33                     ` Eli Zaretskii
  2022-08-17 12:46                       ` Dmitry Gutov
  0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:33 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 57245, monnier

> Date: Wed, 17 Aug 2022 15:30:22 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
>  >  Because if it is, then what does this have to do with
>  > the issue of nXML not being scalable enough?
> 
> Narrowing around font-lock shouldn't be conditioned on the presence of 
> long lines. It either should be done unconditionally (with larger 
> radius, I guess), or not at all.

Yes, you already said that, and I don't agree (and explained why).
Now, can we please agree to disagree and move on?





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 12:20         ` Eli Zaretskii
@ 2022-08-17 12:40           ` Dmitry Gutov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, monnier

On 17.08.2022 15:20, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 15:14:07 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>>>> Which seems to be exactly the behavior the "font-lock narrowing" was
>>>> supposed to guard from?
>>> No.  It wasn't supposed to fix modes that foolishly scan the buffer
>>> from BOB to point.
>> You might want to choose words better.
> I did.
> 
>>> It was supposed to fix modes which scan from the
>>> beginning of line, and that is (a) only a problem when lines are very
>>> long, and (b) much harder to solve in the mode itself, because
>>> font-lock very frequently uses anchored regexps and otherwise likes to
>>> start from BOL, and syntax processing also likes starting from BOL.
>> syntax-wholelines-max handles that problem.
> Only for syntax-related stuff.

font-lock-extend-region-wholelines uses that variable too.

> And we have yet to see whether it's a
> good enough job: that feature is too young to be sure.

Same goes for the long-line-narrowing business.

And for us to be sure, people would need to be able to try it and report 
problems. But as long as handle_fontified_props creates a narrowing with 
~5000 char radius, syntax-wholelines-max isn't even given a chance to do 
its job.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 12:33                     ` Eli Zaretskii
@ 2022-08-17 12:46                       ` Dmitry Gutov
  0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, monnier

On 17.08.2022 15:33, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 15:30:22 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>>   >  Because if it is, then what does this have to do with
>>   > the issue of nXML not being scalable enough?
>>
>> Narrowing around font-lock shouldn't be conditioned on the presence of
>> long lines. It either should be done unconditionally (with larger
>> radius, I guess), or not at all.
> Yes, you already said that, and I don't agree (and explained why).
> Now, can we please agree to disagree and move on?

I don't think you explained that, no.

If you're referring to the previous discussions, this [bug report] is 
the first time I have put forward this particular suggestion.

So you couldn't have addressed it before that.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
  2022-08-17 11:24     ` Eli Zaretskii
  2022-08-17 12:14       ` Dmitry Gutov
@ 2022-08-17 13:20       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-17 13:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 57245, Dmitry Gutov

>> nxml-syntax-propertize might well be heavier than average, but the delay 
>> scales linearly with the size of the file.
> Which is generally not a good scaling factor, especially if the
> coefficient is quite large (as it seems to be in this case).

For most languages, this is the minimum scaling factor that allows the
result to be correct in all cases.  So, as a general rule, it should be
considered as a good scaling factor, I think (when seen as a judgment
on the implementation quality of a major mode).

Obviously, that won't work well in really large buffers, but to a large
extent that should be blamed on the language rather than its major mode.

For this reason, we need to add hacks/heuristics (e.g. not highlighting,
accepting occasional broken highlighting, delaying highlighting,
younameit) if we want to be able to handle such large buffers in
a timely fashion.


        Stefan






^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-08-17 13:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16 14:33 bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow Dmitry Gutov
2022-08-16 16:54 ` Eli Zaretskii
2022-08-16 18:40   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 18:59     ` Eli Zaretskii
2022-08-16 19:32   ` Dmitry Gutov
2022-08-16 20:22     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 20:49       ` Dmitry Gutov
2022-08-16 21:45         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 22:20           ` Dmitry Gutov
2022-08-17 11:36             ` Eli Zaretskii
2022-08-17 11:46               ` Dmitry Gutov
2022-08-17 12:16                 ` Eli Zaretskii
2022-08-17 12:30                   ` Dmitry Gutov
2022-08-17 12:33                     ` Eli Zaretskii
2022-08-17 12:46                       ` Dmitry Gutov
2022-08-17 11:24     ` Eli Zaretskii
2022-08-17 12:14       ` Dmitry Gutov
2022-08-17 12:20         ` Eli Zaretskii
2022-08-17 12:40           ` Dmitry Gutov
2022-08-17 13:20       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).