From: "Francesco Potortì" <pot@gnu.org>
To: Dmitry Gutov <dmitry@gutov.dev>
Cc: Eli Zaretskii <eliz@gnu.org>,
73484@debbugs.gnu.org, spwhitton@spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Thu, 10 Oct 2024 03:07:31 +0200 [thread overview]
Message-ID: <878quwix4c.fsf@tucano.isti.cnr.it> (raw)
In-Reply-To: <3e63f532-c6af-4923-880b-01a32cc667ec@gutov.dev> (dmitry@gutov.dev)
>Here is the nested loop, which if I comment out, makes the parse finish
>in ~20 seconds, with all the extra files (except *.js), or in 15s when
>using with new flags.
>
>diff --git a/lib-src/etags.c b/lib-src/etags.c
>index a822a823a90..331e3ffe816 100644
>--- a/lib-src/etags.c
>+++ b/lib-src/etags.c
>@@ -1697,14 +1697,14 @@ process_file_name (char *file, language *lang)
> uncompressed_name = file;
> }
>
>- /* If the canonicalized uncompressed name
>- has already been dealt with, skip it silently. */
>- for (fdp = fdhead; fdp != NULL; fdp = fdp->next)
>- {
>- assert (fdp->infname != NULL);
>- if (streq (uncompressed_name, fdp->infname))
>- goto cleanup;
>- }
>+ /* /\* If the canonicalized uncompressed name */
>+ /* has already been dealt with, skip it silently. *\/ */
>+ /* for (fdp = fdhead; fdp != NULL; fdp = fdp->next) */
>+ /* { */
>+ /* assert (fdp->infname != NULL); */
>+ /* if (streq (uncompressed_name, fdp->infname)) */
>+ /* goto cleanup; */
>+ /* } */
>
> inf = fopen (file, "r" FOPEN_BINARY);
> if (inf)
>
>This is basically a "uniqueness" operation using linear search, O(N^2).
This is only for dealing with the case when the same file exists in both compressed and uncompressed form, and we are currently hitting the second one. In that case, we should skip it. Yes, this is a uniqueness test and yes, it is O^2 in the number of file names, but I doubt that this can explain a serious slowdown.
>Is there a hash table we could use?
No, we have a hash table for C tags, and that's all. It is useful because there are 34 keywords against which most strings in a C/C++ file are compared. It makes sesns to build hash tables for other languages where a similar situation happens.
I do not think that it makes sense to build a hash table for file names given on the command line, because the number of comparisons made on those names is generally vastly inferior to the number of comparisons used to search for tags.
>> . Some files have their language identified by means other than their
>> names or extensions: those are the languages that have
>> "interpreters" defined in etags.c
The interpreter is the token what comes after #!, with The possible exception for "env", in which case the interpreter is the second token after #!
There are two O^2 test in the number of tags in C/C++ files which depend on the two options "no-line-directive" and "no-duplicates". Both options are usable to disable those checks and both are off by default because they help producing a more sane tags file and have no practical impact in most cases. Both are there because, in principle, they cause significant slowdown in huge tags files.
next prev parent reply other threads:[~2024-10-10 1:07 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87tteaznog.fsf@zephyr.silentflame.com>
[not found] ` <edab570c-b2fa-4162-9383-df5c8aaff251@yandex.ru>
[not found] ` <8734lrrj4e.fsf@zephyr.silentflame.com>
[not found] ` <ea10f340-9b46-4199-93fc-274c5e81ace4@yandex.ru>
[not found] ` <87o74c1ce1.fsf@zephyr.silentflame.com>
[not found] ` <b8001a72-8fc9-4e4e-a2d7-5da94a92f250@yandex.ru>
2024-09-25 19:27 ` bug#73484: 31.0.50; Abolishing etags-regen-file-extensions Sean Whitton
2024-09-25 22:30 ` Dmitry Gutov
2024-09-26 7:43 ` Francesco Potortì
2024-09-26 12:18 ` Dmitry Gutov
2024-09-29 8:25 ` Eli Zaretskii
2024-09-29 10:56 ` Eli Zaretskii
2024-09-29 17:15 ` Francesco Potortì
2024-09-30 23:19 ` Dmitry Gutov
2024-10-01 15:00 ` Eli Zaretskii
2024-10-01 22:01 ` Dmitry Gutov
2024-10-02 11:28 ` Eli Zaretskii
2024-10-02 18:00 ` Dmitry Gutov
2024-10-02 18:56 ` Eli Zaretskii
2024-10-02 22:03 ` Dmitry Gutov
2024-10-03 6:27 ` Eli Zaretskii
2024-10-04 1:25 ` Dmitry Gutov
2024-10-04 6:45 ` Eli Zaretskii
2024-10-04 23:01 ` Dmitry Gutov
2024-10-05 7:02 ` Eli Zaretskii
2024-10-05 14:29 ` Dmitry Gutov
2024-10-05 15:27 ` Eli Zaretskii
2024-10-05 20:27 ` Dmitry Gutov
2024-10-05 16:38 ` Francesco Potortì
2024-10-05 17:12 ` Eli Zaretskii
2024-10-06 0:56 ` Dmitry Gutov
2024-10-06 6:22 ` Eli Zaretskii
2024-10-06 19:14 ` Dmitry Gutov
2024-10-07 2:33 ` Eli Zaretskii
2024-10-07 7:11 ` Dmitry Gutov
2024-10-07 16:05 ` Eli Zaretskii
2024-10-07 17:36 ` Dmitry Gutov
2024-10-07 19:05 ` Eli Zaretskii
2024-10-07 22:08 ` Dmitry Gutov
2024-10-08 13:04 ` Eli Zaretskii
2024-10-09 18:23 ` Dmitry Gutov
2024-10-09 19:11 ` Eli Zaretskii
2024-10-09 22:22 ` Dmitry Gutov
2024-10-10 5:13 ` Eli Zaretskii
2024-10-10 1:07 ` Francesco Potortì [this message]
2024-10-10 5:41 ` Eli Zaretskii
2024-10-10 8:27 ` Francesco Potortì
2024-10-10 8:35 ` Eli Zaretskii
2024-10-10 14:25 ` Francesco Potortì
2024-10-10 16:28 ` Eli Zaretskii
2024-10-11 10:37 ` Francesco Potortì
2024-10-10 10:17 ` Dmitry Gutov
2024-10-10 1:39 ` Francesco Potortì
2024-10-10 5:45 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878quwix4c.fsf@tucano.isti.cnr.it \
--to=pot@gnu.org \
--cc=73484@debbugs.gnu.org \
--cc=dmitry@gutov.dev \
--cc=eliz@gnu.org \
--cc=spwhitton@spwhitton.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).