Navigating an enormous code base

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* Navigating an enormous code base
@ 2022-04-26  2:31 John Yates
  2022-04-26  6:06 ` Daniel Fleischer
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: John Yates @ 2022-04-26  2:31 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

At my day job I work on a 35 year old product with literally
a few hundred million lines of codes. It is essentially a
mono-repo.

There are thousands of directory trees with Makefiles. To be
consistent with Emacs concepts, let's call these 'projects'.
Luckily projects are rarely nested, and not in the areas I work
on. There is no discipline requiring file names be unique across
projects, nor even within projects. Furthermore, within any
project there can be multiple unittest/ and/or pkgtest/
directories. Each such directory must contain a
suite_registration.cpp.

Most often I work within a single project and want to navigate to
files in the same project. Less frequently I need to find a file
in a sibling project. Unfortunately, there are ~400 siblings,
though I very rarely visit more than 10.

With recent attempts to work within Emacs' built-in tools, I
wonder how I find files in this code base. Or if Emacs lacks
adequate built-in tooling, what package(s) might help?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26  2:31 Navigating an enormous code base John Yates
@ 2022-04-26  6:06 ` Daniel Fleischer
  2022-04-26 11:03 ` Eli Zaretskii
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Daniel Fleischer @ 2022-04-26  6:06 UTC (permalink / raw)
  To: John Yates; +Cc: Help Gnu Emacs mailing list

John Yates [2022-04-25 Mon 22:31] wrote:

> With recent attempts to work within Emacs' built-in tools, I
> wonder how I find files in this code base. Or if Emacs lacks
> adequate built-in tooling, what package(s) might help?

There's projectile, which is very good:

    https://github.com/bbatsov/projectile

-- 

Daniel Fleischer



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26  2:31 Navigating an enormous code base John Yates
  2022-04-26  6:06 ` Daniel Fleischer
@ 2022-04-26 11:03 ` Eli Zaretskii
  2022-04-26 12:53   ` John Yates
  2022-04-26 12:49 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-04-27  7:59 ` Marcus Harnisch
  3 siblings, 1 reply; 24+ messages in thread
From: Eli Zaretskii @ 2022-04-26 11:03 UTC (permalink / raw)
  To: help-gnu-emacs

> From: John Yates <john@yates-sheets.org>
> Date: Mon, 25 Apr 2022 22:31:44 -0400
> 
> At my day job I work on a 35 year old product with literally
> a few hundred million lines of codes. It is essentially a
> mono-repo.
> 
> There are thousands of directory trees with Makefiles. To be
> consistent with Emacs concepts, let's call these 'projects'.
> Luckily projects are rarely nested, and not in the areas I work
> on. There is no discipline requiring file names be unique across
> projects, nor even within projects. Furthermore, within any
> project there can be multiple unittest/ and/or pkgtest/
> directories. Each such directory must contain a
> suite_registration.cpp.
> 
> Most often I work within a single project and want to navigate to
> files in the same project. Less frequently I need to find a file
> in a sibling project. Unfortunately, there are ~400 siblings,
> though I very rarely visit more than 10.
> 
> With recent attempts to work within Emacs' built-in tools, I
> wonder how I find files in this code base. Or if Emacs lacks
> adequate built-in tooling, what package(s) might help?

In what language(s) is this written?

Can you tell more about what you mean by "how I find files"?  Like
show an example or two of use cases where you need to 'find files"?

And what built-in tools did you try to solve those problems?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26 11:03 ` Eli Zaretskii
@ 2022-04-26 12:53   ` John Yates
  2022-04-26 14:06     ` Emanuel Berg via Users list for the GNU Emacs text editor
  0 siblings, 1 reply; 24+ messages in thread
From: John Yates @ 2022-04-26 12:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Help Gnu Emacs mailing list

On Tue, Apr 26, 2022 at 7:03 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> In what language(s) is this written?

Primarily C++.  But there are Makefiles, bash scripts, ad hoc text
files, xml, json, etc.

> Can you tell more about what you mean by "how I find files"?  Like
> show an example or two of use cases where you need to 'find files"?

Within the current project I want to open a specific
suite_registraction.cpp in one of multiple unittest/
or pkgtest/ directories.

In some project, could be the current project, could be a sibling
project, I want to open a specific file.

My goal is not so much navigating by symbols (I have lsp for
that). Rather it is more of a UI question. I want to use a modern
completion interface to open files by name. Supporting completion
requires that the space of possible file names be indexed and
supplied to the completion function.

As mentioned in my first post, an issue is how should the
completion UI present files with duplicate names.

Another issue is what should be the scope of file name (paths)
fed to the completion UI. Here I can imagine the following
possibilities:
* The current project
* An explicitly specified sibling project
* Within the current workspace, all projects in which I have made
  changes (perhap via after-save-hook)
* One of various pre-specified canned sets of projects

I could imagine splitting this UI into:
* Find in current project
* Find within a menu of wider contexts

> And what built-in tools did you try to solve those problems?

I have not tried any built-in tools yet.  I do have a private
wsf.el (WorkSpace Files) package that indexes as much of a
workspace as I am ever likely visit:

    https://github.com/jsyjr/wsf/blob/main/wsf.el

It is single threaded and rather slow (10 minutes to index a
workspace on a local SSD).  Currently I use ivy to browse the
index and open files.  Loading and caching the index in memory
takes 5 or 10 seconds.  (There can only be one workspace active
at any time.)  Once loaded, ivy interactivity is not great.

I am now using the whole vertico / marginalia / corfu / etc stuff
and want to ditch using ivy with my wsf.  Ideally I would like to
ditch wsf as well.  Hence this posting.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26 12:53   ` John Yates
@ 2022-04-26 14:06     ` Emanuel Berg via Users list for the GNU Emacs text editor
  0 siblings, 0 replies; 24+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2022-04-26 14:06 UTC (permalink / raw)
  To: help-gnu-emacs

John Yates wrote:

>> In what language(s) is this written?
>
> Primarily C++. But there are Makefiles, bash scripts, ad hoc
> text files, xml, json, etc.

Yeah, the exact traditional fail case for Emacs and Eli can
tell you all about it...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26  2:31 Navigating an enormous code base John Yates
  2022-04-26  6:06 ` Daniel Fleischer
  2022-04-26 11:03 ` Eli Zaretskii
@ 2022-04-26 12:49 ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-12-14  3:47   ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-04-27  7:59 ` Marcus Harnisch
  3 siblings, 1 reply; 24+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-04-26 12:49 UTC (permalink / raw)
  To: help-gnu-emacs

> Most often I work within a single project and want to navigate to
> files in the same project.

I think we have tools for that, such as `project-find-file` (and I'm
pretty sure there are several alternatives in third party packages).

> Less frequently I need to find a file in a sibling project.
> Unfortunately, there are ~400 siblings, though I very rarely visit
> more than 10.

I don't know of a tool for that, but I'd be interested.  I never have
400 siblings (more like less than 10), but I jump between siblings very
often.

There could be several notions of "sibling":
- jump from /foo/bar/main/src/hello.c to /foo/bar/stable/src/hello.c.
- jump from /foo/bar/main/src/hello.c to /foo/bar/main/test/src/hello-tests.el.
- jump from /foo/bar/main/src/hello.c to /foo/bar/main/src/hello.h.

Not sure if we'd want to treat them together or separately (we already
have code to switch between .h and .c).


        Stefan




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26 12:49 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-12-14  3:47   ` Stefan Monnier via Users list for the GNU Emacs text editor
  2022-12-14 17:55     ` Emanuel Berg
  0 siblings, 1 reply; 24+ messages in thread
From: Stefan Monnier via Users list for the GNU Emacs text editor @ 2022-12-14  3:47 UTC (permalink / raw)
  To: help-gnu-emacs

> There could be several notions of "sibling":
> - jump from /foo/bar/main/src/hello.c to /foo/bar/stable/src/hello.c.
> - jump from /foo/bar/main/src/hello.c to /foo/bar/main/test/src/hello-tests.el.
> - jump from /foo/bar/main/src/hello.c to /foo/bar/main/src/hello.h.

The function below can do all three given the appropriate hint:
- "stable" for the first.
- "-tests" for the second.
- ".h" for the third.

It can also find "/foo/bar/stable/src/hello.h" when given the hint
"stable/.h".


        Stefan


(defun my-other-files (hint file)
  (cl-assert (file-name-absolute-p file))
  (named-let loop ((fullname file)
                   (hint hint)
                   (rest nil))
    (if (file-name-directory hint)
        (let* ((hintbase (directory-file-name (file-name-directory hint)))
               (hintend (file-name-nondirectory hint)))
          (apply #'append
                 (mapcar (lambda (x)
                           (let ((default-directory
                                  (file-name-as-directory (car x)))
                                 (tail (cdr x)))
                             (mapcar
                              (lambda (x)
                                (cons (file-name-concat default-directory
                                                        (car x))
                                      (cdr x)))
                              (loop tail hintend rest))))
                         (loop fullname hintbase rest))))
      (let ((dir (file-name-directory fullname))
            (file (file-name-nondirectory fullname))
            (names '())
            (re (regexp-quote hint)))
        (when dir
          (dolist (candidate (directory-files dir nil re))
            (when (string-match re candidate)
              (let ((prefix (substring candidate 0 (match-beginning 0)))
                    (suffix (substring candidate (match-end 0))))
                (when (and (>= (length file)
                               (+ (length prefix) (length suffix)))
                           (string-prefix-p prefix file)
                           (string-suffix-p suffix file))
                  (push candidate names)))))
          (nconc
           (mapcar (lambda (name) (cons (file-name-concat dir name) rest)) names)
           (when (> (length dir)
                    (length (setq dir (directory-file-name dir))))
             (loop dir hint (if rest (file-name-concat file rest) file)))))))))




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-12-14  3:47   ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-12-14 17:55     ` Emanuel Berg
  0 siblings, 0 replies; 24+ messages in thread
From: Emanuel Berg @ 2022-12-14 17:55 UTC (permalink / raw)
  To: help-gnu-emacs

Stefan Monnier via Users list for the GNU Emacs text editor wrote:

>> There could be several notions of "sibling":
>> - jump from /foo/bar/main/src/hello.c to /foo/bar/stable/src/hello.c.
>> - jump from /foo/bar/main/src/hello.c to
>> /foo/bar/main/test/src/hello-tests.el.
>> - jump from /foo/bar/main/src/hello.c to /foo/bar/main/src/hello.h.
>
> The function below can do all three given the appropriate hint:
> - "stable" for the first.
> - "-tests" for the second.
> - ".h" for the third.
>
> It can also find "/foo/bar/stable/src/hello.h" when given the hint
> "stable/.h".

So this is the `find-file-dwim' at last?

Good ...

I would do it with several functions that all search for the
file according to their method or algorithm, then they produce
a score and the one with the highest score wins or is
tried first.

After that the file is displayed or a message "is this this
file?" and if it is, obviously it gets displayed but also that
algorithm gets a bonus since it got it right.

That way the "algorithm of algorithms" will be self-learning
and it will also be interesting to the developers, "hey, this
algorithm _never_ gets it right, what's wrong with it?"

However one should also store the exact situation (state,
hint, and correct file) in a database, so if that exact
situation appears again we already know what to do and the
desired file will just be found (a reference to it) in the
database, with no need for any algorithm to execute ...

So the order would be

1. database
2. run algorithm of algorithm
3. algorithms produce score
4. algorithm of algorithm add bonus
5. suggestion presented to user
6. algorithm of algorithm adjusts bonus
7. correct answer inserted into database

But it's up to the developer to do it anyway they like
of course ...

-- 
underground experts united
https://dataswamp.org/~incal

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-26  2:31 Navigating an enormous code base John Yates
                   ` (2 preceding siblings ...)
  2022-04-26 12:49 ` Stefan Monnier via Users list for the GNU Emacs text editor
@ 2022-04-27  7:59 ` Marcus Harnisch
  2022-04-27  8:36   ` mrf
  3 siblings, 1 reply; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-27  7:59 UTC (permalink / raw)
  To: help-gnu-emacs

GNU Global with the ggtags package (https://github.com/leoliu/ggtags) 
help me wrangle my Project with roughly 10k source files. Global has an 
option for specifying “nearness” sorting order, taking the starting 
point of teh query into account. Completion/narrowing interfaces would 
still present all candidates, but the nearer ones come out first.

I am quite happily using ggtags with Helm in this setup.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-27  7:59 ` Marcus Harnisch
@ 2022-04-27  8:36   ` mrf
  2022-04-27 16:35     ` Emanuel Berg via Users list for the GNU Emacs text editor
  0 siblings, 1 reply; 24+ messages in thread
From: mrf @ 2022-04-27  8:36 UTC (permalink / raw)
  To: Marcus Harnisch; +Cc: help-gnu-emacs


Marcus Harnisch writes:

> GNU Global with the ggtags package (https://github.com/leoliu/ggtags) 
> help me wrangle my Project with roughly 10k source files. Global has an 
> option for specifying “nearness” sorting order, taking the starting 
> point of teh query into account. Completion/narrowing interfaces would 
> still present all candidates, but the nearer ones come out first.
>
> I am quite happily using ggtags with Helm in this setup.

I prefer GNU global but also there is other options that have emacs
integration like CTags and CScope:

http://ctags.sourceforge.net/
http://cscope.sourceforge.net/



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-27  8:36   ` mrf
@ 2022-04-27 16:35     ` Emanuel Berg via Users list for the GNU Emacs text editor
  2022-04-27 22:59       ` John Yates
  0 siblings, 1 reply; 24+ messages in thread
From: Emanuel Berg via Users list for the GNU Emacs text editor @ 2022-04-27 16:35 UTC (permalink / raw)
  To: help-gnu-emacs

mrf wrote:

>> GNU Global with the ggtags package
>> (https://github.com/leoliu/ggtags) help me wrangle my
>> Project with roughly 10k source files. Global has an option
>> for specifying “nearness” sorting order, taking the
>> starting point of teh query into account.
>> Completion/narrowing interfaces would still present all
>> candidates, but the nearer ones come out first.
>>
>> I am quite happily using ggtags with Helm in this setup.
>
> I prefer GNU global but also there is other options that
> have emacs integration like CTags and CScope

What problem(s) in particular are we encountering when the code
base gets enormous?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-27 16:35     ` Emanuel Berg via Users list for the GNU Emacs text editor
@ 2022-04-27 22:59       ` John Yates
  2022-04-28  0:46         ` Emanuel Berg
  2022-04-28  6:42         ` Marcus Harnisch
  0 siblings, 2 replies; 24+ messages in thread
From: John Yates @ 2022-04-27 22:59 UTC (permalink / raw)
  To: Emanuel Berg, Help Gnu Emacs mailing list

Start with feeding a completion interface:

* Do you index on demand?  If no, then when?  And how do you
  keep the index upto date?

* Indexing the entire workspace will result in an intractably
  large index.  So how do you partition it?

* Is the user restricted to querying a single partition of the
  index?  If yes, then that feels painfully restrictive.  If no,
  how does that user indicate which partition indices to combine?
  How is combining accomplished?

* How are duplicate filenames handled?

Need I say more?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-27 22:59       ` John Yates
@ 2022-04-28  0:46         ` Emanuel Berg
  2022-04-28  6:42         ` Marcus Harnisch
  1 sibling, 0 replies; 24+ messages in thread
From: Emanuel Berg @ 2022-04-28  0:46 UTC (permalink / raw)
  To: help-gnu-emacs

John Yates wrote:

> Do you index on demand? If no, then when? And how do you
> keep the index upto date?
>
> Indexing the entire workspace [...]

What kind of index are we talking?

> How are duplicate filenames handled?

Non-duplicate paths?

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-27 22:59       ` John Yates
  2022-04-28  0:46         ` Emanuel Berg
@ 2022-04-28  6:42         ` Marcus Harnisch
  2022-04-28  7:39           ` Leo Liu
                             ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-28  6:42 UTC (permalink / raw)
  To: help-gnu-emacs

On 28/04/2022 00.59, John Yates wrote:
> Start with feeding a completion interface:
> 
> * Do you index on demand?  If no, then when?  And how do you
>    keep the index upto date?

As far as my work pattern with Global/ggtags is concerned, I create a 
full index rarely. Usually only when the repository can be expected to 
have changed significantly (pull, merge with upstream, etc).

Global can do single-file updates, which I think ggtags executes in 
‘after-save-hook’ or something, so your own changes will be tracked. 
Creating the database with the sqlite3 backend is supposed to perform 
much better with these partial updates.

Since I spend most of my time in my little niche this is sufficient for 
my purposes and even if there have been small updates outside, the 
location I will be taken to is close enough most of the time (and a 
little reminder for creating a full index).

> * Indexing the entire workspace will result in an intractably
>    large index.  So how do you partition it?

I don't bother and don't partition the project (comprising of 10k files, 
and 50+ nested subprojects)

The total size of the DBs is about 60MB (files: 2, tags: 20, references: 
38).

> * Is the user restricted to querying a single partition of the
>    index?  If yes, then that feels painfully restrictive.  If no,
>    how does that user indicate which partition indices to combine?
>    How is combining accomplished?

See above.

> * How are duplicate filenames handled?

By storing path names.

Why don't you give it a whirl and see whether this suits you?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  6:42         ` Marcus Harnisch
@ 2022-04-28  7:39           ` Leo Liu
  2022-04-28  8:38             ` Marcus Harnisch
  2022-04-28 14:30           ` John Yates
  2022-04-28 14:50           ` John Yates
  2 siblings, 1 reply; 24+ messages in thread
From: Leo Liu @ 2022-04-28  7:39 UTC (permalink / raw)
  To: help-gnu-emacs

On 2022-04-28 08:42 +0200, Marcus Harnisch wrote:
> As far as my work pattern with Global/ggtags is concerned, I create a
> full index rarely. Usually only when the repository can be expected to 
> have changed significantly (pull, merge with upstream, etc).
>
> Global can do single-file updates, which I think ggtags executes in
> ‘after-save-hook’ or something, so your own changes will be tracked. 
> Creating the database with the sqlite3 backend is supposed to perform
> much better with these partial updates.

This is true. There is a customisable threshold ggtags-oversize-limit. I
used to test ggtags regularly on the Linux kernel source and it was very
fast IOW I was not disappointed.

If source files are changed outside emacs for example by git pull,
ggtags will notice and reindex. For example if M-. jumps to a location
that is off I often M-, and then M-. again to trigger reindex
immediately.

HTH,
Leo




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  7:39           ` Leo Liu
@ 2022-04-28  8:38             ` Marcus Harnisch
  2022-04-28 10:45               ` Leo Liu
  2022-04-28 14:34               ` John Yates
  0 siblings, 2 replies; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-28  8:38 UTC (permalink / raw)
  To: help-gnu-emacs

On 28/04/2022 09.39, Leo Liu wrote:
> If source files are changed outside emacs for example by git pull,
> ggtags will notice and reindex. For example if M-. jumps to a location
> that is off I often M-, and then M-. again to trigger reindex
> immediately.

Thanks, I didn't even know that. However in my setup I use an explicit 
file list (created by a gtags-hook script) rather than having gtags go 
through all files, else a lot more files would be scanned  (unnecessarily).

Ggtags won't be able to tell if files have been renamed or added, I am 
afraid. (it might notice deletions, though)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  8:38             ` Marcus Harnisch
@ 2022-04-28 10:45               ` Leo Liu
  2022-04-28 14:34               ` John Yates
  1 sibling, 0 replies; 24+ messages in thread
From: Leo Liu @ 2022-04-28 10:45 UTC (permalink / raw)
  To: help-gnu-emacs

On 2022-04-28 10:38 +0200, Marcus Harnisch wrote:
> Thanks, I didn't even know that. However in my setup I use an explicit
> file list (created by a gtags-hook script) rather than having gtags go 
> through all files, else a lot more files would be scanned  (unnecessarily).

Yes this works too.

> Ggtags won't be able to tell if files have been renamed or added, I am
> afraid. (it might notice deletions, though)

ggtags notices outside changes by comparing mtimes of GTAGS and source
files that it visits. So yes if something is deleted only 'global -u'
can sync correctly.

Leo




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  8:38             ` Marcus Harnisch
  2022-04-28 10:45               ` Leo Liu
@ 2022-04-28 14:34               ` John Yates
  2022-04-28 14:45                 ` Marcus Harnisch
  1 sibling, 1 reply; 24+ messages in thread
From: John Yates @ 2022-04-28 14:34 UTC (permalink / raw)
  To: Marcus Harnisch; +Cc: Help Gnu Emacs mailing list

On Thu, Apr 28, 2022 at 4:38 AM Marcus Harnisch <mh-gmane@online.de> wrote:
>
> Ggtags won't be able to tell if files have been renamed or added, I am
> afraid. (it might notice deletions, though)

That is crucial.  A very large part of my work is untangling
decades of cruft (including migration from C to C++).
I do a tremendous amount of refactoring: moving and
renaming files.  I need a scheme that can handle such
churn gracefully.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28 14:34               ` John Yates
@ 2022-04-28 14:45                 ` Marcus Harnisch
  0 siblings, 0 replies; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-28 14:45 UTC (permalink / raw)
  To: help-gnu-emacs

On 28/04/2022 16.34, John Yates wrote:
> On Thu, Apr 28, 2022 at 4:38 AM Marcus Harnisch <mh-gmane@online.de> wrote:
>>
>> Ggtags won't be able to tell if files have been renamed or added, I am
>> afraid. (it might notice deletions, though)
> 
> That is crucial.  A very large part of my work is untangling
> decades of cruft (including migration from C to C++).
> I do a tremendous amount of refactoring: moving and
> renaming files.  I need a scheme that can handle such
> churn gracefully.

Actually, as Leo has pointed out, the gtags program should be able to 
notice *unless* you are using a fixed list of files like me.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  6:42         ` Marcus Harnisch
  2022-04-28  7:39           ` Leo Liu
@ 2022-04-28 14:30           ` John Yates
  2022-04-28 14:40             ` Marcus Harnisch
  2022-04-28 14:50           ` John Yates
  2 siblings, 1 reply; 24+ messages in thread
From: John Yates @ 2022-04-28 14:30 UTC (permalink / raw)
  To: Marcus Harnisch; +Cc: Help Gnu Emacs mailing list

On Thu, Apr 28, 2022 at 3:14 AM Marcus Harnisch <mh-gmane@online.de> wrote:
>
> I don't bother and don't partition the project (comprising of 10k files,
> and 50+ nested subprojects)

That's puny.  In my code base a typical 'project' (a tree rooted
at a Makefile) has ~2k files.  I work on foundational elements
so my workspaces include those components that can be
affected by my changes.  A workspace is ~220GB.  There are
400+ direct siblings of handful of projects where my efforts are
focused.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28 14:30           ` John Yates
@ 2022-04-28 14:40             ` Marcus Harnisch
  0 siblings, 0 replies; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-28 14:40 UTC (permalink / raw)
  To: help-gnu-emacs

On 28/04/2022 16.30, John Yates wrote:
> On Thu, Apr 28, 2022 at 3:14 AM Marcus Harnisch <mh-gmane@online.de> wrote:
>>
>> I don't bother and don't partition the project (comprising of 10k files,
>> and 50+ nested subprojects)
> 
> That's puny.  In my code base a typical 'project' (a tree rooted
> at a Makefile) has ~2k files.  I work on foundational elements
> so my workspaces include those components that can be
> affected by my changes.  A workspace is ~220GB.  There are
> 400+ direct siblings of handful of projects where my efforts are
> focused.

<shrug> All I can tell from this is that creating the full index will 
probably take a bit longer. Why don't you give it a try and then decide?




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28  6:42         ` Marcus Harnisch
  2022-04-28  7:39           ` Leo Liu
  2022-04-28 14:30           ` John Yates
@ 2022-04-28 14:50           ` John Yates
  2022-04-28 16:10             ` Óscar Fuentes
  2022-04-28 16:15             ` Marcus Harnisch
  2 siblings, 2 replies; 24+ messages in thread
From: John Yates @ 2022-04-28 14:50 UTC (permalink / raw)
  To: Marcus Harnisch; +Cc: Help Gnu Emacs mailing list

On Thu, Apr 28, 2022 at 3:14 AM Marcus Harnisch <mh-gmane@online.de> wrote:
>
> I don't bother and don't partition the project (comprising of 10k files,
> and 50+ nested subprojects)

In another reply I describe the size of my code base.

> > * Is the user restricted to querying a single partition of the
> >    index?  If yes, then that feels painfully restrictive.  If no,
> >    how does that user indicate which partition indices to combine?
> >    How is combining accomplished?
>
> See above.

I take that as "My project is not large enough
to prompt me to consider partitioning, therefore
why should you?".  The flawed assumption is
that your code base is comparable to mine.

> > * How are duplicate filenames handled?
>
> By storing path names.

Yes.  I have written indexing software.  I do
index the entire workspace.  I do store paths
and identify duplicate filenames in the index.
That does not make the tool pleasant:
* indexing takes minutes
* file moves and renames require reindexing
* loading the index into emacs takes ~15s
* firing up the ivy completion take ~3s
* UI presentation of duplicates is ugly
* UI is _very_ sluggish

> Why don't you give it a whirl and see whether this suits you?

Because it misses the point.  I am not looking
to navigate by tags, but rather by filename.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28 14:50           ` John Yates
@ 2022-04-28 16:10             ` Óscar Fuentes
  2022-04-28 16:15             ` Marcus Harnisch
  1 sibling, 0 replies; 24+ messages in thread
From: Óscar Fuentes @ 2022-04-28 16:10 UTC (permalink / raw)
  To: help-gnu-emacs

John Yates <john@yates-sheets.org> writes:

>> Why don't you give it a whirl and see whether this suits you?
>
> Because it misses the point.  I am not looking
> to navigate by tags, but rather by filename.

In your original message you mention millions of lines of code but not
how many files.

Here, finding a file on the LLVM repo (> 100k files) works well enough
(for me) with ido+flx. flx is quite cpu-intensive and the UI feels a bit
sluggish, specially the first time the command is invoked, but combining
ido with some other filtering method (flex, for instance) should improve
that facet while at the same time providing a reasonably effective
method of finding the file of interest.

The list of candidate paths is generated by `git ls' (which takes about
two seconds here, but you could use a finer method that concentrates on
your areas of interest.) Then finding something like

llvm/include/llvm/Support/X86TargetParser.h

`isxtph' or `x8tph' provides that file as the first candidate.

An effective usage requires knowledge about the directory structure,
understanding the filtering algorithm and some practice.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Navigating an enormous code base
  2022-04-28 14:50           ` John Yates
  2022-04-28 16:10             ` Óscar Fuentes
@ 2022-04-28 16:15             ` Marcus Harnisch
  1 sibling, 0 replies; 24+ messages in thread
From: Marcus Harnisch @ 2022-04-28 16:15 UTC (permalink / raw)
  To: help-gnu-emacs

On 28/04/2022 16.50, John Yates wrote:
> On Thu, Apr 28, 2022 at 3:14 AM Marcus Harnisch <mh-gmane@online.de> wrote:
> 
> I take that as "My project is not large enough
> to prompt me to consider partitioning, therefore
> why should you?".  The flawed assumption is
> that your code base is comparable to mine.

No assumption being made. All I can offer is sharing my experience from 
projects I work on. Nobody can save you the work of trying for yourself 
whether any suggested tool satisfies your needs in your environment.

> Because it misses the point.  I am not looking
> to navigate by tags, but rather by filename.

Arguably, the main purpose of Global is navigation by tags and 
references. However, one of the database tables stores the path names of 
all indexed files.

Turns out that my primary use pattern actually involves navigation by 
file name (‘ggtags-find-file’). In *my* project the combination of 
ggtags/Helm shows no noticeable delay.

Leo will be able to tell how this works behind the scenes, interacts 
with Emacs completion frameworks and how it might scale to larger 
projects such as yours.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2022-12-14 17:55 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-26  2:31 Navigating an enormous code base John Yates
2022-04-26  6:06 ` Daniel Fleischer
2022-04-26 11:03 ` Eli Zaretskii
2022-04-26 12:53   ` John Yates
2022-04-26 14:06     ` Emanuel Berg via Users list for the GNU Emacs text editor
2022-04-26 12:49 ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-12-14  3:47   ` Stefan Monnier via Users list for the GNU Emacs text editor
2022-12-14 17:55     ` Emanuel Berg
2022-04-27  7:59 ` Marcus Harnisch
2022-04-27  8:36   ` mrf
2022-04-27 16:35     ` Emanuel Berg via Users list for the GNU Emacs text editor
2022-04-27 22:59       ` John Yates
2022-04-28  0:46         ` Emanuel Berg
2022-04-28  6:42         ` Marcus Harnisch
2022-04-28  7:39           ` Leo Liu
2022-04-28  8:38             ` Marcus Harnisch
2022-04-28 10:45               ` Leo Liu
2022-04-28 14:34               ` John Yates
2022-04-28 14:45                 ` Marcus Harnisch
2022-04-28 14:30           ` John Yates
2022-04-28 14:40             ` Marcus Harnisch
2022-04-28 14:50           ` John Yates
2022-04-28 16:10             ` Óscar Fuentes
2022-04-28 16:15             ` Marcus Harnisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).