On Thu, Oct 10, 2024, at 3:07 AM, Francesco Potortì wrote:

>Here is the nested loop, which if I comment out, makes the parse finish
>in ~20 seconds, with all the extra files (except *.js), or in 15s when
>using with new flags.
>
>diff --git a/lib-src/etags.c b/lib-src/etags.c
>index a822a823a90..331e3ffe816 100644
>--- a/lib-src/etags.c
>+++ b/lib-src/etags.c
>@@ -1697,14 +1697,14 @@ process_file_name (char *file, language *lang)
>        uncompressed_name = file;
>      }
>
>- /* If the canonicalized uncompressed name
>-     has already been dealt with, skip it silently. */
>- for (fdp = fdhead; fdp != NULL; fdp = fdp->next)
>-    {
>-      assert (fdp->infname != NULL);
>-      if (streq (uncompressed_name, fdp->infname))
>- goto cleanup;
>-    }
>+ /* /\* If the canonicalized uncompressed name */
>+ /*    has already been dealt with, skip it silently. *\/ */
>+ /* for (fdp = fdhead; fdp != NULL; fdp = fdp->next) */
>+ /*   { */
>+ /*     assert (fdp->infname != NULL); */
>+ /*     if (streq (uncompressed_name, fdp->infname)) */
>+ /* goto cleanup; */
>+ /*   } */
>
>    inf = fopen (file, "r" FOPEN_BINARY);
>    if (inf)
>
>This is basically a "uniqueness" operation using linear search, O(N^2).

This is only for dealing with the case when the same file exists in both compressed and uncompressed form, and we are currently hitting the second one. In that case, we should skip it. Yes, this is a uniqueness test and yes, it is O^2 in the number of file names, but I doubt that this can explain a serious slowdown.

Like mentioned in a previous email, I did recompile with that step removed, and the slowdown was gone.

The whole scan went down to ~20 seconds.