all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dmitry@gutov.dev>
To: Eli Zaretskii <eliz@gnu.org>
Cc: pot@gnu.org, 73484@debbugs.gnu.org, spwhitton@spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Mon, 7 Oct 2024 20:36:47 +0300	[thread overview]
Message-ID: <021c625b-adc9-4e19-819c-fe929583e503@gutov.dev> (raw)
In-Reply-To: <86ldyzucdd.fsf@gnu.org>

On 07/10/2024 19:05, Eli Zaretskii wrote:

> So you are comparing the speed of scanning ~60K files with the speed
> of scanning ~375K of files?  I'm not generally surprised that the
> latter takes much longer, only that the slowdown is not proportional
> to the number of scanned files.  But see below.

I forgot one thing: all .js files are actually set to be ignored there. 
And my tree is a little old, so it's 200K files total. Otherwise -- yes.

Note, however, that the time is really not proportional: 30 s vs 15 min 
is a 30x difference.

And I've been assuming that the "other" files would mostly fall in the 
non-recognized category, and most of them would only have the 2 first 
characters read (then, recognizing that those chars are not '#!', etags 
would skip the file).

> Btw, did you exclude the .git/* files from the list submitted to
> etags?

Yes, it's excluded. And the files matching the .gitignore entries are 
excluded as well.

> Here, scanning, with the unmodified etags from Emacs 30, of only those
> files with extensions in etags-regen-file-extensions takes 16.7 sec
> and produces a 80.5MB tags table, whereas scanning all the files with
> the same etags takes almost 16 min and produces 304MB tags table, of
> which more than 200MB are from files whose language is not recognized.

My result in the latter case was only 88 MB. Maybe the many .js files 
make the difference. I've put them into the "ignored" category long ago 
because most of them are used for tests, and there are a lot of those 
files, and there are generated one-long-line files.

>  From my testing, it seems like the elapsed time depends non-linearly
> on the length of the list of files submitted to etags.  For example,
> if I break the list of files in two, I get 3 min 20 sec and 1 min 40
> sec, together 5 min.  But if I submit a single list with all the files
> in those two lists, I get 14 min 30 sec.  I guess some internal
> processing etags does depends non-linearly on the number of files it
> scans.  The various loops in etags that scan all of the known files
> and/or the tags it previously found seem to confirm this hypothesis.

Makes sense! It sounds like some N^2 complexity somewhere.

> So what is the conclusion from this?  Are you saying that the long
> scan times in this large tree basically make this new no-fallbacks
> option not very useful, since we still need to carefully include or
> exclude certain files from the scan?  Or should I go ahead and install
> these changes?

I think that option will be useful, but for better benchmarks and for 
end usability as well, I think we need the N^2 thing fixed as well. 
Maybe before the rest of the changes.





  reply	other threads:[~2024-10-07 17:36 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20  9:20 etags-regen-mode: handling extensionless files Sean Whitton
2024-09-20 18:23 ` Dmitry Gutov
2024-09-22 12:02   ` Sean Whitton
2024-09-23 17:00     ` Dmitry Gutov
2024-09-25  6:21       ` Sean Whitton
2024-09-25 11:41         ` Dmitry Gutov
2024-09-25 19:27           ` bug#73484: 31.0.50; Abolishing etags-regen-file-extensions Sean Whitton
2024-09-25 22:30             ` Dmitry Gutov
2024-09-26  7:43               ` Francesco Potortì
2024-09-26 12:18                 ` Dmitry Gutov
2024-09-29  8:25               ` Eli Zaretskii
2024-09-29 10:56                 ` Eli Zaretskii
2024-09-29 17:15                   ` Francesco Potortì
2024-09-30 23:19                 ` Dmitry Gutov
2024-10-01 15:00                   ` Eli Zaretskii
2024-10-01 22:01                     ` Dmitry Gutov
2024-10-02 11:28                   ` Eli Zaretskii
2024-10-02 18:00                     ` Dmitry Gutov
2024-10-02 18:56                       ` Eli Zaretskii
2024-10-02 22:03                         ` Dmitry Gutov
2024-10-03  6:27                           ` Eli Zaretskii
2024-10-04  1:25                             ` Dmitry Gutov
2024-10-04  6:45                               ` Eli Zaretskii
2024-10-04 23:01                                 ` Dmitry Gutov
2024-10-05  7:02                                   ` Eli Zaretskii
2024-10-05 14:29                                     ` Dmitry Gutov
2024-10-05 15:27                                       ` Eli Zaretskii
2024-10-05 20:27                                         ` Dmitry Gutov
2024-10-05 16:38                                       ` Francesco Potortì
2024-10-05 17:12                                         ` Eli Zaretskii
2024-10-06  0:56                                         ` Dmitry Gutov
2024-10-06  6:22                                           ` Eli Zaretskii
2024-10-06 19:14                                             ` Dmitry Gutov
2024-10-07  2:33                                               ` Eli Zaretskii
2024-10-07  7:11                                                 ` Dmitry Gutov
2024-10-07 16:05                                                   ` Eli Zaretskii
2024-10-07 17:36                                                     ` Dmitry Gutov [this message]
2024-10-07 19:05                                                       ` Eli Zaretskii
2024-10-07 22:08                                                         ` Dmitry Gutov
2024-10-08 13:04                                                           ` Eli Zaretskii
2024-10-09 18:23                                                             ` Dmitry Gutov
2024-10-09 19:11                                                               ` Eli Zaretskii
2024-10-09 22:22                                                                 ` Dmitry Gutov
2024-10-10  5:13                                                                   ` Eli Zaretskii
2024-10-10  1:07                                                               ` Francesco Potortì
2024-10-10  5:41                                                                 ` Eli Zaretskii
2024-10-10  8:27                                                                   ` Francesco Potortì
2024-10-10  8:35                                                                     ` Eli Zaretskii
2024-10-10 14:25                                                                       ` Francesco Potortì
2024-10-10 16:28                                                                         ` Eli Zaretskii
2024-10-11 10:37                                                                           ` Francesco Potortì
2024-10-10 10:17                                                                 ` Dmitry Gutov
2024-10-10  1:39                                                               ` Francesco Potortì
2024-10-10  5:45                                                                 ` Eli Zaretskii
2024-09-25 12:10         ` etags-regen-mode: handling extensionless files Eli Zaretskii
2024-09-25 21:19           ` Francesco Potortì
2024-09-26  6:22             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=021c625b-adc9-4e19-819c-fe929583e503@gutov.dev \
    --to=dmitry@gutov.dev \
    --cc=73484@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=pot@gnu.org \
    --cc=spwhitton@spwhitton.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.