* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
@ 2022-08-16 14:33 Dmitry Gutov
2022-08-16 16:54 ` Eli Zaretskii
0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 14:33 UTC (permalink / raw)
To: 57245
Branching this off from the discussion in bug#56682.
Prerequisite: Have an XML file that is 20 MB in size, and doesn't have
long lines.
Or follow steps 1-3 to create one.
1. wget -o large-file.xml
https://updates.drupal.org/release-history/drupal/current
2. M-% /> RET ^J/> RET (to break up the long line into smaller pieces)
3. Select the contents of the file and copy them over and over for 99
times. Alternatively, copy them 9 times, then select the result, and
copy it 9 times as well. Save the buffer.
(To try to keep XML valid -- not sure if necessary -- you can only
perform the copying operation on the contents of the <releases> tag. But
that's probably not important. I did that, though.)
4. Kill the buffer and re-visit it again. Press M->.
5. Note the delay.
Here's the profiler output:
1397 95% - command-execute
1397 95% - call-interactively
1338 91% - funcall-interactively
1331 90% - end-of-buffer
1327 90% - recenter
1327 90% - jit-lock-function
1327 90% - jit-lock-fontify-now
1327 90% - jit-lock--run-functions
1327 90% - run-hook-wrapped
1327 90% - #<compiled -0x14ecf3ff276f01c3>
1327 90% - font-lock-fontify-region
1327 90% - font-lock-default-fontify-region
1327 90% - nxml-extend-region
845 57% - skip-syntax-forward
845 57% - internal--syntax-propertize
845 57% - syntax-propertize
845 57% - nxml-syntax-propertize
845 57% - sgml-syntax-propertize
842 57% - #<compiled 0x1894bdc3ad4ca90>
479 32% sgml--syntax-propertize-ppss
3 0% syntax-ppss
482 32% - nxml-move-outside-backwards
482 32% - nxml-inside-start
482 32% syntax-ppss
7 0% + execute-extended-command
59 4% + byte-code
59 4% + ...
10 0% + timer-event-handler
In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version
3.24.20, cairo version 1.16.0)
of 2022-08-16 built on potemkin
Repository revision: 81ff64d3ca8d6e43e976f209399d2a0e9b4a7dd8
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12013000
System Description: Ubuntu 20.04.4 LTS
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 14:33 bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow Dmitry Gutov
@ 2022-08-16 16:54 ` Eli Zaretskii
2022-08-16 18:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 19:32 ` Dmitry Gutov
0 siblings, 2 replies; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-16 16:54 UTC (permalink / raw)
To: Dmitry Gutov, Stefan Monnier; +Cc: 57245
> Date: Tue, 16 Aug 2022 17:33:58 +0300
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> Prerequisite: Have an XML file that is 20 MB in size, and doesn't have
> long lines.
>
> Or follow steps 1-3 to create one.
>
> 1. wget -o large-file.xml
> https://updates.drupal.org/release-history/drupal/current
> 2. M-% /> RET ^J/> RET (to break up the long line into smaller pieces)
> 3. Select the contents of the file and copy them over and over for 99
> times. Alternatively, copy them 9 times, then select the result, and
> copy it 9 times as well. Save the buffer.
>
> (To try to keep XML valid -- not sure if necessary -- you can only
> perform the copying operation on the contents of the <releases> tag. But
> that's probably not important. I did that, though.)
>
> 4. Kill the buffer and re-visit it again. Press M->.
> 5. Note the delay.
>
> Here's the profiler output:
>
> 1397 95% - command-execute
> 1397 95% - call-interactively
> 1338 91% - funcall-interactively
> 1331 90% - end-of-buffer
> 1327 90% - recenter
> 1327 90% - jit-lock-function
> 1327 90% - jit-lock-fontify-now
> 1327 90% - jit-lock--run-functions
> 1327 90% - run-hook-wrapped
> 1327 90% - #<compiled -0x14ecf3ff276f01c3>
> 1327 90% - font-lock-fontify-region
> 1327 90% - font-lock-default-fontify-region
> 1327 90% - nxml-extend-region
> 845 57% - skip-syntax-forward
> 845 57% - internal--syntax-propertize
> 845 57% - syntax-propertize
> 845 57% - nxml-syntax-propertize
> 845 57% - sgml-syntax-propertize
> 842 57% - #<compiled 0x1894bdc3ad4ca90>
> 479 32% sgml--syntax-propertize-ppss
> 3 0% syntax-ppss
> 482 32% - nxml-move-outside-backwards
> 482 32% - nxml-inside-start
> 482 32% syntax-ppss
> 7 0% + execute-extended-command
> 59 4% + byte-code
> 59 4% + ...
> 10 0% + timer-event-handler
>
Thanks. It looks like some problem in nXML mode or in syntax.c or in
how nXML uses the syntax stuff. Maybe the code there is simply not
scalable.
Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
here? What the above profile doesn't show is that this code creates
tons of garbage, so GC is called a lot, and adds its share of
slowdown.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 16:54 ` Eli Zaretskii
@ 2022-08-16 18:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 18:59 ` Eli Zaretskii
2022-08-16 19:32 ` Dmitry Gutov
1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 18:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, Dmitry Gutov
> Thanks. It looks like some problem in nXML mode or in syntax.c or in
> how nXML uses the syntax stuff. Maybe the code there is simply not
> scalable.
>
> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> here?
IIRC the sgml/xml/nxml code for syntax-propertize is fairly
costly, indeed. It can probably be reimplemented in a more efficient
way, but someone will/would have to sit down and think about how to
do that.
[ Some `syntax-propertize-function`s (like cperl-mode's) have been
"hacked" in an unsatisfactory way by taking
existing-but-not-fully-understood code and wrapping it so as to make
it usable for `syntax-propertize-function`. The result works but it
could suffer from inefficiencies due to the fact that it's used in
a different context from the one for which it was designed.
Some of nXML's code suffers from similar issues, so I thought maybe
that would be part of the problem, but AFAICT those don't have any
impact in this case. ]
IOW, I think the issue is that our syntax tables aren't a good fit for
SGML's syntax, so we just have to work harder than for other modes.
> What the above profile doesn't show is that this code creates
> tons of garbage, so GC is called a lot, and adds its share of
> slowdown.
Hmm... I wonder where this might come from. As you say, the profile
doesn't show it: 95% in command-execute suggests less than 6% spent in
the GC.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 16:54 ` Eli Zaretskii
2022-08-16 18:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 19:32 ` Dmitry Gutov
2022-08-16 20:22 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-17 11:24 ` Eli Zaretskii
1 sibling, 2 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 19:32 UTC (permalink / raw)
To: Eli Zaretskii, Stefan Monnier; +Cc: 57245
On 16.08.2022 19:54, Eli Zaretskii wrote:
> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> here?
nxml-syntax-propertize might well be heavier than average, but the delay
scales linearly with the size of the file.
Which seems to be exactly the behavior the "font-lock narrowing" was
supposed to guard from?
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 19:32 ` Dmitry Gutov
@ 2022-08-16 20:22 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 20:49 ` Dmitry Gutov
2022-08-17 11:24 ` Eli Zaretskii
1 sibling, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 20:22 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, 57245
Dmitry Gutov [2022-08-16 22:32:23] wrote:
> On 16.08.2022 19:54, Eli Zaretskii wrote:
>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>> here?
> nxml-syntax-propertize might well be heavier than average, but the delay
> scales linearly with the size of the file.
Indeed, it should be linear.
> Which seems to be exactly the behavior the "font-lock narrowing" was
> supposed to guard from?
Not sure which narrowing you're referring to.
The "locked narrowing" introduced by Gregory is only installed in the
presence of long lines. It's (currently) not used for large files
(unless they contain long lines, that is).
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 20:22 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 20:49 ` Dmitry Gutov
2022-08-16 21:45 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 20:49 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, 57245
On 16.08.2022 23:22, Stefan Monnier via Bug reports for GNU Emacs, the
Swiss army knife of text editors wrote:
> Dmitry Gutov [2022-08-16 22:32:23] wrote:
>> On 16.08.2022 19:54, Eli Zaretskii wrote:
>>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>>> here?
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
> Indeed, it should be linear.
>
>> Which seems to be exactly the behavior the "font-lock narrowing" was
>> supposed to guard from?
> Not sure which narrowing you're referring to.
> The "locked narrowing" introduced by Gregory is only installed in the
> presence of long lines. It's (currently) not used for large files
> (unless they contain long lines, that is).
I guess that's the problem here.
The font-lock narrowing (if it's indeed the method we're going to use to
speed up its performance) shouldn't be conditioned on the presence of
long lines.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 20:49 ` Dmitry Gutov
@ 2022-08-16 21:45 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 22:20 ` Dmitry Gutov
0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-16 21:45 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Eli Zaretskii, 57245
> The font-lock narrowing (if it's indeed the method we're going to use to
> speed up its performance) shouldn't be conditioned on the presence of
> long lines.
font-lock does suffer from long lines, so the current code's handling of
font-lock makes some sense. But indeed, we all agree it's not
sufficient because it only handles the long lines problem, and we still
need to tackle the case of large buffers, which is related but different.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 21:45 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-16 22:20 ` Dmitry Gutov
2022-08-17 11:36 ` Eli Zaretskii
0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-16 22:20 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Eli Zaretskii, 57245
On 17.08.2022 00:45, Stefan Monnier via Bug reports for GNU Emacs, the
Swiss army knife of text editors wrote:
>> The font-lock narrowing (if it's indeed the method we're going to use to
>> speed up its performance) shouldn't be conditioned on the presence of
>> long lines.
> font-lock does suffer from long lines
Perhaps with when some specific rules are used? Like MATCH-ANCHORED, one
instance of which I deleted from js-mode a few days ago.
Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
comment out the narrowing code in handle_fontified_prop (or switch to
the branch I posted previously), two XML files -- one with long lines
and one without (the files differ only by addition of newlines) -- show
approximately the same delay on M->.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 22:20 ` Dmitry Gutov
@ 2022-08-17 11:36 ` Eli Zaretskii
2022-08-17 11:46 ` Dmitry Gutov
0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 11:36 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 57245, monnier
> Date: Wed, 17 Aug 2022 01:20:38 +0300
> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
> comment out the narrowing code in handle_fontified_prop (or switch to
> the branch I posted previously), two XML files -- one with long lines
> and one without (the files differ only by addition of newlines) -- show
> approximately the same delay on M->.
Doesn't syntax-wholeline-max only affect long lines? Because I don't
think I see its effect in the XML file where lines were broken by
newlines, and then the file was duplicated 100 times.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 11:36 ` Eli Zaretskii
@ 2022-08-17 11:46 ` Dmitry Gutov
2022-08-17 12:16 ` Eli Zaretskii
0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 11:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, monnier
On 17.08.2022 14:36, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 01:20:38 +0300
>> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
>> comment out the narrowing code in handle_fontified_prop (or switch to
>> the branch I posted previously), two XML files -- one with long lines
>> and one without (the files differ only by addition of newlines) -- show
>> approximately the same delay on M->.
>
> Doesn't syntax-wholeline-max only affect long lines? Because I don't
> think I see its effect in the XML file where lines were broken by
> newlines, and then the file was duplicated 100 times.
Its purpose is to handle the slowdown which occurred specifically on
long lines because of
font-lock-extend-region-functions/syntax-propertize-extend-region-functions.
Now that it works -- I don't see any particular slowdowns on long lines,
even with narrowing disabled.
And the performance of M-> depends solely on the size of a file. In my
XML test files, at least.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 11:46 ` Dmitry Gutov
@ 2022-08-17 12:16 ` Eli Zaretskii
2022-08-17 12:30 ` Dmitry Gutov
0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:16 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 57245, monnier
> Date: Wed, 17 Aug 2022 14:46:46 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 17.08.2022 14:36, Eli Zaretskii wrote:
> >> Date: Wed, 17 Aug 2022 01:20:38 +0300
> >> Cc: Eli Zaretskii <eliz@gnu.org>, 57245@debbugs.gnu.org
> >> From: Dmitry Gutov <dgutov@yandex.ru>
> >>
> >> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
> >> comment out the narrowing code in handle_fontified_prop (or switch to
> >> the branch I posted previously), two XML files -- one with long lines
> >> and one without (the files differ only by addition of newlines) -- show
> >> approximately the same delay on M->.
> >
> > Doesn't syntax-wholeline-max only affect long lines?
>
> Its purpose is to handle the slowdown which occurred specifically on
> long lines because of
> font-lock-extend-region-functions/syntax-propertize-extend-region-functions.
> Now that it works -- I don't see any particular slowdowns on long lines,
> even with narrowing disabled.
>
> And the performance of M-> depends solely on the size of a file. In my
> XML test files, at least.
Is that a yes? Because if it is, then what does this have to do with
the issue of nXML not being scalable enough?
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 12:16 ` Eli Zaretskii
@ 2022-08-17 12:30 ` Dmitry Gutov
2022-08-17 12:33 ` Eli Zaretskii
0 siblings, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:30 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, monnier
On 17.08.2022 15:16, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 14:46:46 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 17.08.2022 14:36, Eli Zaretskii wrote:
>>>> Date: Wed, 17 Aug 2022 01:20:38 +0300
>>>> Cc: Eli Zaretskii<eliz@gnu.org>,57245@debbugs.gnu.org
>>>> From: Dmitry Gutov<dgutov@yandex.ru>
>>>>
>>>> Otherwise, syntax-wholeline-max seems to be doing its job fine: if I
>>>> comment out the narrowing code in handle_fontified_prop (or switch to
>>>> the branch I posted previously), two XML files -- one with long lines
>>>> and one without (the files differ only by addition of newlines) -- show
>>>> approximately the same delay on M->.
>>> Doesn't syntax-wholeline-max only affect long lines?
>> Its purpose is to handle the slowdown which occurred specifically on
>> long lines because of
>> font-lock-extend-region-functions/syntax-propertize-extend-region-functions.
>> Now that it works -- I don't see any particular slowdowns on long lines,
>> even with narrowing disabled.
>>
>> And the performance of M-> depends solely on the size of a file. In my
>> XML test files, at least.
> Is that a yes?
Yes, it's a "yes".
Stefan said:
> font-lock does suffer from long lines, so the current code's handling
> of font-lock makes some sense
Meaning that narrowing around font-lock on long lines makes sense.
And I replied that no, syntax-wholelines-max should be dealing with long
lines issues in font-lock already.
> Because if it is, then what does this have to do with
> the issue of nXML not being scalable enough?
Narrowing around font-lock shouldn't be conditioned on the presence of
long lines. It either should be done unconditionally (with larger
radius, I guess), or not at all.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 12:30 ` Dmitry Gutov
@ 2022-08-17 12:33 ` Eli Zaretskii
2022-08-17 12:46 ` Dmitry Gutov
0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:33 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 57245, monnier
> Date: Wed, 17 Aug 2022 15:30:22 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> > Because if it is, then what does this have to do with
> > the issue of nXML not being scalable enough?
>
> Narrowing around font-lock shouldn't be conditioned on the presence of
> long lines. It either should be done unconditionally (with larger
> radius, I guess), or not at all.
Yes, you already said that, and I don't agree (and explained why).
Now, can we please agree to disagree and move on?
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 12:33 ` Eli Zaretskii
@ 2022-08-17 12:46 ` Dmitry Gutov
0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, monnier
On 17.08.2022 15:33, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 15:30:22 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> > Because if it is, then what does this have to do with
>> > the issue of nXML not being scalable enough?
>>
>> Narrowing around font-lock shouldn't be conditioned on the presence of
>> long lines. It either should be done unconditionally (with larger
>> radius, I guess), or not at all.
> Yes, you already said that, and I don't agree (and explained why).
> Now, can we please agree to disagree and move on?
I don't think you explained that, no.
If you're referring to the previous discussions, this [bug report] is
the first time I have put forward this particular suggestion.
So you couldn't have addressed it before that.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-16 19:32 ` Dmitry Gutov
2022-08-16 20:22 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-08-17 11:24 ` Eli Zaretskii
2022-08-17 12:14 ` Dmitry Gutov
2022-08-17 13:20 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 2 replies; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 11:24 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 57245, monnier
> Date: Tue, 16 Aug 2022 22:32:23 +0300
> Cc: 57245@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 16.08.2022 19:54, Eli Zaretskii wrote:
> > Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
> > here?
>
> nxml-syntax-propertize might well be heavier than average, but the delay
> scales linearly with the size of the file.
Which is generally not a good scaling factor, especially if the
coefficient is quite large (as it seems to be in this case).
> Which seems to be exactly the behavior the "font-lock narrowing" was
> supposed to guard from?
No. It wasn't supposed to fix modes that foolishly scan the buffer
from BOB to point. It was supposed to fix modes which scan from the
beginning of line, and that is (a) only a problem when lines are very
long, and (b) much harder to solve in the mode itself, because
font-lock very frequently uses anchored regexps and otherwise likes to
start from BOL, and syntax processing also likes starting from BOL.
Btw, does nXML and/or sgml-mode use libxml for their analysis? If
not, why not? wouldn't that be faster (and possibly more accurate)?
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 11:24 ` Eli Zaretskii
@ 2022-08-17 12:14 ` Dmitry Gutov
2022-08-17 12:20 ` Eli Zaretskii
2022-08-17 13:20 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, monnier
On 17.08.2022 14:24, Eli Zaretskii wrote:
>> Date: Tue, 16 Aug 2022 22:32:23 +0300
>> Cc: 57245@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> On 16.08.2022 19:54, Eli Zaretskii wrote:
>>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>>> here?
>>
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
>
> Which is generally not a good scaling factor, especially if the
> coefficient is quite large (as it seems to be in this case).
Someone can work on the coefficient, but any accurate parser has to scan
the buffer from the beginning. At least once.
Migration to tree-sitter might give us a better coefficient later, but
the principle will remain.
>> Which seems to be exactly the behavior the "font-lock narrowing" was
>> supposed to guard from?
>
> No. It wasn't supposed to fix modes that foolishly scan the buffer
> from BOB to point.
You might want to choose words better.
> It was supposed to fix modes which scan from the
> beginning of line, and that is (a) only a problem when lines are very
> long, and (b) much harder to solve in the mode itself, because
> font-lock very frequently uses anchored regexps and otherwise likes to
> start from BOL, and syntax processing also likes starting from BOL.
syntax-wholelines-max handles that problem.
Though it might depend on what you mean by "anchored regexps".
> Btw, does nXML and/or sgml-mode use libxml for their analysis? If
> not, why not? wouldn't that be faster (and possibly more accurate)?
Might be "a simple matter of coding".
But we do need syntax-propertize to run, so that the user commands can
rely on proper syntax information in the buffer. It remains to be seen
whether xml-parse-region is a good base for nxml-syntax-propertize, and
how much of a performance improvement it can bring (with all the string
marshaling around).
nxml also probably handles invalid documents better, which might or
might not be important.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 12:14 ` Dmitry Gutov
@ 2022-08-17 12:20 ` Eli Zaretskii
2022-08-17 12:40 ` Dmitry Gutov
0 siblings, 1 reply; 20+ messages in thread
From: Eli Zaretskii @ 2022-08-17 12:20 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 57245, monnier
> Date: Wed, 17 Aug 2022 15:14:07 +0300
> Cc: 57245@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> >> Which seems to be exactly the behavior the "font-lock narrowing" was
> >> supposed to guard from?
> >
> > No. It wasn't supposed to fix modes that foolishly scan the buffer
> > from BOB to point.
>
> You might want to choose words better.
I did.
> > It was supposed to fix modes which scan from the
> > beginning of line, and that is (a) only a problem when lines are very
> > long, and (b) much harder to solve in the mode itself, because
> > font-lock very frequently uses anchored regexps and otherwise likes to
> > start from BOL, and syntax processing also likes starting from BOL.
>
> syntax-wholelines-max handles that problem.
Only for syntax-related stuff. And we have yet to see whether it's a
good enough job: that feature is too young to be sure.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 12:20 ` Eli Zaretskii
@ 2022-08-17 12:40 ` Dmitry Gutov
0 siblings, 0 replies; 20+ messages in thread
From: Dmitry Gutov @ 2022-08-17 12:40 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, monnier
On 17.08.2022 15:20, Eli Zaretskii wrote:
>> Date: Wed, 17 Aug 2022 15:14:07 +0300
>> Cc:57245@debbugs.gnu.org,monnier@iro.umontreal.ca
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>>>> Which seems to be exactly the behavior the "font-lock narrowing" was
>>>> supposed to guard from?
>>> No. It wasn't supposed to fix modes that foolishly scan the buffer
>>> from BOB to point.
>> You might want to choose words better.
> I did.
>
>>> It was supposed to fix modes which scan from the
>>> beginning of line, and that is (a) only a problem when lines are very
>>> long, and (b) much harder to solve in the mode itself, because
>>> font-lock very frequently uses anchored regexps and otherwise likes to
>>> start from BOL, and syntax processing also likes starting from BOL.
>> syntax-wholelines-max handles that problem.
> Only for syntax-related stuff.
font-lock-extend-region-wholelines uses that variable too.
> And we have yet to see whether it's a
> good enough job: that feature is too young to be sure.
Same goes for the long-line-narrowing business.
And for us to be sure, people would need to be able to try it and report
problems. But as long as handle_fontified_props creates a narrowing with
~5000 char radius, syntax-wholelines-max isn't even given a chance to do
its job.
^ permalink raw reply [flat|nested] 20+ messages in thread
* bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
2022-08-17 11:24 ` Eli Zaretskii
2022-08-17 12:14 ` Dmitry Gutov
@ 2022-08-17 13:20 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 20+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-08-17 13:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57245, Dmitry Gutov
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
> Which is generally not a good scaling factor, especially if the
> coefficient is quite large (as it seems to be in this case).
For most languages, this is the minimum scaling factor that allows the
result to be correct in all cases. So, as a general rule, it should be
considered as a good scaling factor, I think (when seen as a judgment
on the implementation quality of a major mode).
Obviously, that won't work well in really large buffers, but to a large
extent that should be blamed on the language rather than its major mode.
For this reason, we need to add hacks/heuristics (e.g. not highlighting,
accepting occasional broken highlighting, delaying highlighting,
younameit) if we want to be able to handle such large buffers in
a timely fashion.
Stefan
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2022-08-17 13:20 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-16 14:33 bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow Dmitry Gutov
2022-08-16 16:54 ` Eli Zaretskii
2022-08-16 18:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 18:59 ` Eli Zaretskii
2022-08-16 19:32 ` Dmitry Gutov
2022-08-16 20:22 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 20:49 ` Dmitry Gutov
2022-08-16 21:45 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-08-16 22:20 ` Dmitry Gutov
2022-08-17 11:36 ` Eli Zaretskii
2022-08-17 11:46 ` Dmitry Gutov
2022-08-17 12:16 ` Eli Zaretskii
2022-08-17 12:30 ` Dmitry Gutov
2022-08-17 12:33 ` Eli Zaretskii
2022-08-17 12:46 ` Dmitry Gutov
2022-08-17 11:24 ` Eli Zaretskii
2022-08-17 12:14 ` Dmitry Gutov
2022-08-17 12:20 ` Eli Zaretskii
2022-08-17 12:40 ` Dmitry Gutov
2022-08-17 13:20 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).