From: Eli Zaretskii <eliz@gnu.org>
To: Dmitry Gutov <dmitry@gutov.dev>
Cc: pot@gnu.org, 73484@debbugs.gnu.org, spwhitton@spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Mon, 07 Oct 2024 19:05:50 +0300 [thread overview]
Message-ID: <86ldyzucdd.fsf@gnu.org> (raw)
In-Reply-To: <b0192173-3d8a-49da-8792-521b9b486568@gutov.dev> (message from Dmitry Gutov on Mon, 7 Oct 2024 10:11:08 +0300)
> Date: Mon, 7 Oct 2024 10:11:08 +0300
> Cc: pot@gnu.org, spwhitton@spwhitton.name, 73484@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
>
> > Can you please show the etags command line in each of these two cases
> > that you are comparing?
>
> Both commands end with a '-' (scanning the list of files passed from stdin).
>
> >>> And if they don't have extensions, the code you
> >>> removed would have caused etags to scan these files anyway, looking
> >>> for Fortran or C tags. So how come the change slowed down etags so
> >>> much? What am I missing?
> >> I think it would also concern "unknown" extensions, right? Like .txt,
> >> .png and so on.
> > I have difficulty reasoning about this without knowing the command
> > lines you used. E.g., I don't understand why in one case it would
> > scan files with unknown extensions that were not scanned in the other.
>
> In one case the list is pre-filtered with etags-regen-file-extensions
> (see 'etags-regen--all-files'), in the other - it is not, and all files
> in project are passed.
So you are comparing the speed of scanning ~60K files with the speed
of scanning ~375K of files? I'm not generally surprised that the
latter takes much longer, only that the slowdown is not proportional
to the number of scanned files. But see below.
Btw, did you exclude the .git/* files from the list submitted to
etags?
Here, scanning, with the unmodified etags from Emacs 30, of only those
files with extensions in etags-regen-file-extensions takes 16.7 sec
and produces a 80.5MB tags table, whereas scanning all the files with
the same etags takes almost 16 min and produces 304MB tags table, of
which more than 200MB are from files whose language is not recognized.
From my testing, it seems like the elapsed time depends non-linearly
on the length of the list of files submitted to etags. For example,
if I break the list of files in two, I get 3 min 20 sec and 1 min 40
sec, together 5 min. But if I submit a single list with all the files
in those two lists, I get 14 min 30 sec. I guess some internal
processing etags does depends non-linearly on the number of files it
scans. The various loops in etags that scan all of the known files
and/or the tags it previously found seem to confirm this hypothesis.
So what is the conclusion from this? Are you saying that the long
scan times in this large tree basically make this new no-fallbacks
option not very useful, since we still need to carefully include or
exclude certain files from the scan? Or should I go ahead and install
these changes?
next prev parent reply other threads:[~2024-10-07 16:05 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87tteaznog.fsf@zephyr.silentflame.com>
[not found] ` <edab570c-b2fa-4162-9383-df5c8aaff251@yandex.ru>
[not found] ` <8734lrrj4e.fsf@zephyr.silentflame.com>
[not found] ` <ea10f340-9b46-4199-93fc-274c5e81ace4@yandex.ru>
[not found] ` <87o74c1ce1.fsf@zephyr.silentflame.com>
[not found] ` <b8001a72-8fc9-4e4e-a2d7-5da94a92f250@yandex.ru>
2024-09-25 19:27 ` bug#73484: 31.0.50; Abolishing etags-regen-file-extensions Sean Whitton
2024-09-25 22:30 ` Dmitry Gutov
2024-09-26 7:43 ` Francesco Potortì
2024-09-26 12:18 ` Dmitry Gutov
2024-09-29 8:25 ` Eli Zaretskii
2024-09-29 10:56 ` Eli Zaretskii
2024-09-29 17:15 ` Francesco Potortì
2024-09-30 23:19 ` Dmitry Gutov
2024-10-01 15:00 ` Eli Zaretskii
2024-10-01 22:01 ` Dmitry Gutov
2024-10-02 11:28 ` Eli Zaretskii
2024-10-02 18:00 ` Dmitry Gutov
2024-10-02 18:56 ` Eli Zaretskii
2024-10-02 22:03 ` Dmitry Gutov
2024-10-03 6:27 ` Eli Zaretskii
2024-10-04 1:25 ` Dmitry Gutov
2024-10-04 6:45 ` Eli Zaretskii
2024-10-04 23:01 ` Dmitry Gutov
2024-10-05 7:02 ` Eli Zaretskii
2024-10-05 14:29 ` Dmitry Gutov
2024-10-05 15:27 ` Eli Zaretskii
2024-10-05 20:27 ` Dmitry Gutov
2024-10-05 16:38 ` Francesco Potortì
2024-10-05 17:12 ` Eli Zaretskii
2024-10-06 0:56 ` Dmitry Gutov
2024-10-06 6:22 ` Eli Zaretskii
2024-10-06 19:14 ` Dmitry Gutov
2024-10-07 2:33 ` Eli Zaretskii
2024-10-07 7:11 ` Dmitry Gutov
2024-10-07 16:05 ` Eli Zaretskii [this message]
2024-10-07 17:36 ` Dmitry Gutov
2024-10-07 19:05 ` Eli Zaretskii
2024-10-07 22:08 ` Dmitry Gutov
2024-10-08 13:04 ` Eli Zaretskii
2024-10-09 18:23 ` Dmitry Gutov
2024-10-09 19:11 ` Eli Zaretskii
2024-10-09 22:22 ` Dmitry Gutov
2024-10-10 5:13 ` Eli Zaretskii
2024-10-10 1:07 ` Francesco Potortì
2024-10-10 5:41 ` Eli Zaretskii
2024-10-10 8:27 ` Francesco Potortì
2024-10-10 8:35 ` Eli Zaretskii
2024-10-10 14:25 ` Francesco Potortì
2024-10-10 16:28 ` Eli Zaretskii
2024-10-11 10:37 ` Francesco Potortì
2024-10-10 10:17 ` Dmitry Gutov
2024-10-10 1:39 ` Francesco Potortì
2024-10-10 5:45 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86ldyzucdd.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=73484@debbugs.gnu.org \
--cc=dmitry@gutov.dev \
--cc=pot@gnu.org \
--cc=spwhitton@spwhitton.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).