unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Generation of tags for the current project on the fly
@ 2018-01-12  1:02 Dmitry Gutov
  2018-01-12  9:01 ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-12  1:02 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

Here's an idea I've been working on. We generate tags for all files the 
current project contains (except the ignored ones) when the user calls 
one of the xref commands, but hasn't explicitly visited any tags table.

The result is used until they make a change in a file somewhere and save 
the buffer, then the generated table is discarded.

I think it will be helpful for new users (who don't really know how to 
generate tags), as well as people who are used to certain other editors 
performing the indexing automatically, in small-to-medium sized 
projects. With some effort, we could implement re-indexing and 
invalidation on a more granular level (so it's usable in bigger projects 
too), but transitioning to GNU Global would probably be better.

For reference, indexing the Emacs sources takes ~1.1sec here.

What do people think?

See the attached patch.

[-- Attachment #2: project-auto-tags.diff --]
[-- Type: text/x-patch, Size: 2796 bytes --]

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index a31668e1ba..b0c398ec8e 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -2109,7 +2109,9 @@ etags-xref-find-definitions-tag-order
   "Tag order used in `xref-backend-definitions' to look for definitions.")
 
 ;;;###autoload
-(defun etags--xref-backend () 'etags)
+(defun etags--xref-backend ()
+  (etags--maybe-use-project-tags)
+  'etags)
 
 (cl-defmethod xref-backend-identifier-at-point ((_backend (eql etags)))
   (find-tag--default))
@@ -2180,6 +2182,53 @@ xref-make-etags-location
     (nth 1 tag-info)))
 
 \f
+;;; Simple tags generation, with automatic invalidation
+
+(defvar etags--project-tags-file nil)
+
+(defun etags--maybe-use-project-tags ()
+  (let (proj)
+    (when (and (not (or tags-file-name
+                        tags-table-list))
+               (setq proj (project-current)))
+      (etags--project-tags-generate proj)
+      ;; Invalidate the scanned tags after any change is written to disk.
+      (add-hook 'after-save-hook #'etags--project-tags-cleanup)
+      (visit-tags-table etags--project-tags-file)
+      ;; Explicit return, this is no backend function.
+      nil)))
+
+(defun etags--project-tags-generate (proj)
+  (let* ((root (cl-find default-directory
+                        (project-roots proj)
+                        :test #'file-in-directory-p))
+         (default-directory root)
+         (files (all-completions "" (project-file-completion-table proj (list root))))
+         (etags-command (executable-find "etags"))
+         ;; FIXME: List all extensions, or wait for etags fix.
+         ;; http://lists.gnu.org/archive/html/emacs-devel/2018-01/msg00323.html
+         (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp"
+                       "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl"
+                       "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada"))
+         (file-regexp (format "\\.%s\\'" (regexp-opt extensions))))
+    (setq etags--project-tags-file (make-temp-file "emacs-project-tags-"))
+    (with-temp-buffer
+      (mapc (lambda (f)
+              (when (string-match-p file-regexp f)
+                (insert f "\n")))
+            files)
+      (shell-command-on-region (point-min) (point-max)
+                               (format "%s - -o %s" etags-command etags--project-tags-file)
+                               nil nil "*etags-project-tags-errors*" t))))
+
+(defun etags--project-tags-cleanup ()
+  (when etags--project-tags-file
+    (delete-file etags--project-tags-file)
+    (setq tags-file-name nil
+          tags-table-list nil
+          etags--project-tags-file nil))
+  (remove-hook 'after-save-hook #'etags--project-tags-cleanup))
+
 (provide 'etags)
 
 ;;; etags.el ends here

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-12  1:02 Generation of tags for the current project on the fly Dmitry Gutov
@ 2018-01-12  9:01 ` Eli Zaretskii
  2018-01-12 13:52   ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-12  9:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 12 Jan 2018 04:02:06 +0300
> 
> Here's an idea I've been working on. We generate tags for all files the 
> current project contains (except the ignored ones) when the user calls 
> one of the xref commands, but hasn't explicitly visited any tags table.
> 
> The result is used until they make a change in a file somewhere and save 
> the buffer, then the generated table is discarded.

Why discard it after the first save?  The tags table is probably still
very much valid.  I'd not discard it until either of the following
happens:

  . we fail to find a tag
  . the user visits a tags table explicitly
  . the user switches to a different project(?)

> I think it will be helpful for new users (who don't really know how to 
> generate tags), as well as people who are used to certain other editors 
> performing the indexing automatically, in small-to-medium sized 
> projects. With some effort, we could implement re-indexing and 
> invalidation on a more granular level (so it's usable in bigger projects 
> too), but transitioning to GNU Global would probably be better.

We could offer generating a tags table if we don't find one in the
tree, instead of generating it automatically.  I think this would be a
better UI and UX, especially given the time it could take to generate
TAGS (see below).

> For reference, indexing the Emacs sources takes ~1.1sec here.

Was that with cold cache or warm cache?

"make TAGS" takes about 9 sec here with a warm cache, and this is an
SSD disk.  On fencepost.gnu.org, a (somewhat slow) GNU/Linux system,
it took 12 sec with a cold cache and 4 sec with a warm cache.  And
Emacs is not a large project; I wonder what would happen in larger
ones, like GCC or glibc.

IOW, I don't think this is so fast that we could do that without user
approval.

> +         (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp"
> +                       "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl"
> +                       "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada"))
> +         (file-regexp (format "\\.%s\\'" (regexp-opt extensions))))
> +    (setq etags--project-tags-file (make-temp-file "emacs-project-tags-"))
> +    (with-temp-buffer
> +      (mapc (lambda (f)
> +              (when (string-match-p file-regexp f)
> +                (insert f "\n")))
> +            files)
> +      (shell-command-on-region (point-min) (point-max)
> +                               (format "%s - -o %s" etags-command etags--project-tags-file)
> +                               nil nil "*etags-project-tags-errors*" t))))

I don't understand why you didn't use the commonly used form:

   find . -name "*.rb" -o -name "*.js" ... | etags -o- -

Doing things the way you did raises issues with encoding of file
names, which could cause subtle problem in rare use cases.  I think
using 'find' is also faster.

More generally, I think doing this that way is not TRT, at least not
by default.  "make TAGS" in Emacs will produce a much richer tags
table than your method, because our Makefiles use regexps to augment
the automatic tagging in etags.  So I think we should first try to
invoke the TAGS target of a Makefile in the tree, if one exists, and
only use the naïve command as fallback.  And perhaps we should also
provide some customization for the command to be used (but that will
obviously not help newbies who didn't yet customize the project they
are working on).

Thanks.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-12  9:01 ` Eli Zaretskii
@ 2018-01-12 13:52   ` Dmitry Gutov
  2018-01-12 18:52     ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-12 13:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/12/18 12:01 PM, Eli Zaretskii wrote:

> Why discard it after the first save?  The tags table is probably still
> very much valid.

Indeed, it's a rough heuristic. I'm aiming for correctness here, not for 
performance.

On the other hand, code navigation and editing are often fairly distinct 
activities, you don't switch between the two too frequently. So waiting 
a second or two when going from the latter to the former shouldn't be 
too terrible.

> I'd not discard it until either of the following
> happens:
> 
>    . we fail to find a tag

Not sure about this one. We can make this customizable, of course 
(although the implementation might end up a bit convoluted), but IMO 
it's not good for the default behavior.

Failing to find a tag is a valid result (some identifiers can be absent, 
or defined somewhere else, e.g. in the libraries), and doing a rescan 
each time that happens might be more annoying.

Further, some users will call C-u xref-find-definitions, look for the 
new tag in the completion table, fail to find it there, and simply abort 
without trying the search.

>    . the user visits a tags table explicitly

That's of course, works already.

>    . the user switches to a different project(?)

It's an omission currently, but yes, I fully intend to add this.

> We could offer generating a tags table if we don't find one in the
> tree, instead of generating it automatically.

And then what? Visit it? And make the user to rescan manually every 
time? I'm fine with this as an optional behavior (and it will also be an 
improvement, of course, since generating tags is not exactly trivial for 
new users, and even many not-so-new ones), but I don't want this for the 
default.

> I think this would be a
> better UI and UX, especially given the time it could take to generate
> TAGS (see below).

Sublime Text, Atom and VS Code simply index the project code, AFAIK, 
without extra prompts. I think we should try to show a similar 
experience, even if it's not great for big projects. There are several 
directions we can improve on it, but showing the user that "yes, we can 
find-definition right away" is a good thing.

>> For reference, indexing the Emacs sources takes ~1.1sec here.
> 
> Was that with cold cache or warm cache?

Warm, probably. But that's the relevant time, isn't it? We're most 
wondering how long it will take to *reindexing* (because we're 
discussing when to do it). The first indexing will take place anyway.

> "make TAGS" takes about 9 sec here with a warm cache, and this is an
> SSD disk.

'make tags' makes 1 second on my machine, with an NVMe disk.

> On fencepost.gnu.org, a (somewhat slow) GNU/Linux system,
> it took 12 sec with a cold cache and 4 sec with a warm cache.  And
> Emacs is not a large project; I wonder what would happen in larger
> ones, like GCC or glibc.

We can try to somehow detect very large projects, and helpfully offer to 
visit a tags table instead. Anyway, M-x visit-tags-table still works.

> IOW, I don't think this is so fast that we could do that without user
> approval.

The argument here is that if the user called xref-find-definitions, it's 
better to do a (long-ish) scan and show something, instead of failing. 
They always have an option of C-g (we could also catch it and show 
helpful instructions if the process took too long).

> I don't understand why you didn't use the commonly used form:
> 
>     find . -name "*.rb" -o -name "*.js" ... | etags -o- -

Because the project API doesn't make this easy. Anyway, generating the 
full list of files is relatively fast in comparison. At most, it took 
like 30% of the whole time (and less in other cases). And we can speed 
it up further independently (e.g. using git ls-files).

> Doing things the way you did raises issues with encoding of file
> names, which could cause subtle problem in rare use cases.

Well, I haven't seen them yet, and don't really understand how they're 
going to happen. But we'll probably fix them, one way or another.

> I think
> using 'find' is also faster.

find is used under the covers. The difference is just that the 
invocations of etags are only happening later.

> More generally, I think doing this that way is not TRT, at least not
> by default.  "make TAGS" in Emacs will produce a much richer tags
> table than your method, because our Makefiles use regexps to augment
> the automatic tagging in etags.  So I think we should first try to
> invoke the TAGS target of a Makefile in the tree, if one exists, and
> only use the naïve command as fallback.

'make tags' is very much specific to Emacs. We can introduce some kind 
of protocol, of course, but my primary goal here is to improve the 
out-of-the-box behavior.

Further, the task will have to write tags to stdout: the current code 
saves the temporary tags file to /tmp, and there are reasons to do that. 
Anyway, that part shouldn't be too hard.

A possible venue for improvement is to somehow derive a multi-TAGS-files 
structure (with their dependencies) from the project information. Still 
thinking about it.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-12 13:52   ` Dmitry Gutov
@ 2018-01-12 18:52     ` Eli Zaretskii
  2018-01-14  2:05       ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-12 18:52 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Fri, 12 Jan 2018 16:52:21 +0300
> 
> On the other hand, code navigation and editing are often fairly distinct 
> activities, you don't switch between the two too frequently.

In my workflows, I do that all the time, because I don't always
remember the details of the functions I need to call in the code I'm
writing.

> >    . we fail to find a tag
> 
> Not sure about this one. We can make this customizable, of course 
> (although the implementation might end up a bit convoluted), but IMO 
> it's not good for the default behavior.
> 
> Failing to find a tag is a valid result (some identifiers can be absent, 
> or defined somewhere else, e.g. in the libraries), and doing a rescan 
> each time that happens might be more annoying.

If you maintain that scanning is fast, then the annoyance should be
minimal.

> > We could offer generating a tags table if we don't find one in the
> > tree, instead of generating it automatically.
> 
> And then what? Visit it?

No, just do what you intended, but only after an approval.  It could
be that the user thought she already visited a tags table, or some
other mistake.

> >> For reference, indexing the Emacs sources takes ~1.1sec here.
> > 
> > Was that with cold cache or warm cache?
> 
> Warm, probably. But that's the relevant time, isn't it?

Not necessarily.  The first time a tree is scanned could well be the
shortly after you start working on a project.

> We're most wondering how long it will take to *reindexing* (because
> we're discussing when to do it). The first indexing will take place
> anyway.
> 
> > "make TAGS" takes about 9 sec here with a warm cache, and this is an
> > SSD disk.
> 
> 'make tags' makes 1 second on my machine, with an NVMe disk.

I bet it will be even faster with a RAM disk.  But we shouldn't base
our decisions on such configurations, as that isn't the norm yet, I
think.

> > IOW, I don't think this is so fast that we could do that without user
> > approval.
> 
> The argument here is that if the user called xref-find-definitions, it's 
> better to do a (long-ish) scan and show something, instead of failing. 

It could be a mistake, or the user could reconsider given the
question.  We do that with visiting large files, for example.

> > I don't understand why you didn't use the commonly used form:
> > 
> >     find . -name "*.rb" -o -name "*.js" ... | etags -o- -
> 
> Because the project API doesn't make this easy. Anyway, generating the 
> full list of files is relatively fast in comparison.

Invoking 'find' will always be faster, as it's optimized for
traversing directory trees.

> > I think
> > using 'find' is also faster.
> 
> find is used under the covers. The difference is just that the 
> invocations of etags are only happening later.

No, the difference is also that in my example etags runs in parallel
with 'find', not in sequence.

> > More generally, I think doing this that way is not TRT, at least not
> > by default.  "make TAGS" in Emacs will produce a much richer tags
> > table than your method, because our Makefiles use regexps to augment
> > the automatic tagging in etags.  So I think we should first try to
> > invoke the TAGS target of a Makefile in the tree, if one exists, and
> > only use the naïve command as fallback.
> 
> 'make tags' is very much specific to Emacs.

No, TAGS is a standard target in GNU Makefile's.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-12 18:52     ` Eli Zaretskii
@ 2018-01-14  2:05       ` Dmitry Gutov
  2018-01-14 16:21         ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-14  2:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/12/18 9:52 PM, Eli Zaretskii wrote:

> In my workflows, I do that all the time, because I don't always
> remember the details of the functions I need to call in the code I'm
> writing.

Sure, but not as often as you use completion-at-point, probably. Anyway, 
what I said was an approximation/simplification. People's workflows are 
bound to be different.

>> Failing to find a tag is a valid result (some identifiers can be absent,
>> or defined somewhere else, e.g. in the libraries), and doing a rescan
>> each time that happens might be more annoying.
> 
> If you maintain that scanning is fast, then the annoyance should be
> minimal.

If scanning is fast, invalidate-on-save should be good enough. And it's 
easier to implement (already is).

>>> We could offer generating a tags table if we don't find one in the
>>> tree, instead of generating it automatically.
>>
>> And then what? Visit it?
> 
> No, just do what you intended, but only after an approval.  It could
> be that the user thought she already visited a tags table, or some
> other mistake.

OK, so if the user says yes, we "temporarily visit" to auto-generated 
tags table. Then the user saves a file and that table get invalidated 
(or via some other mechanism), and we want to index it again. Ask again?

>>>> For reference, indexing the Emacs sources takes ~1.1sec here.
>>>
>>> Was that with cold cache or warm cache?
>>
>> Warm, probably. But that's the relevant time, isn't it?
> 
> Not necessarily.  The first time a tree is scanned could well be the
> shortly after you start working on a project.

Not sure what you mean. The tree has to be scanned *sometime* at least 
once, hasn't it?

>> 'make tags' makes 1 second on my machine, with an NVMe disk.
> 
> I bet it will be even faster with a RAM disk.  But we shouldn't base
> our decisions on such configurations, as that isn't the norm yet, I
> think.

NVMe is a bus for an actual storage device, though. Anyway, 1 second and 
4 seconds are different, but not hugely different. And we haven't 
optimized everything we could, yet.

For instance, could you try to see how long takes the generation of the 
file list alone? And populating the buffer with it. But without passing 
it to etags.

>>> IOW, I don't think this is so fast that we could do that without user
>>> approval.
>>
>> The argument here is that if the user called xref-find-definitions, it's
>> better to do a (long-ish) scan and show something, instead of failing.
> 
> It could be a mistake, or the user could reconsider given the
> question.  We do that with visiting large files, for example.

That's a valid argument. On the other hand, they might not know how long 
the indexing will take anyway.

>>> I don't understand why you didn't use the commonly used form:
>>>
>>>      find . -name "*.rb" -o -name "*.js" ... | etags -o- -
>>
>> Because the project API doesn't make this easy. Anyway, generating the
>> full list of files is relatively fast in comparison.
> 
> Invoking 'find' will always be faster, as it's optimized for
> traversing directory trees.

'git ls-files' will probably be faster still.

>>> I think
>>> using 'find' is also faster.
>>
>> find is used under the covers. The difference is just that the
>> invocations of etags are only happening later.
> 
> No, the difference is also that in my example etags runs in parallel
> with 'find', not in sequence.

That's what I was trying to say.

>> 'make tags' is very much specific to Emacs.
> 
> No, TAGS is a standard target in GNU Makefile's.

OK, good to know. Two questions, then:

- Can we make it output the tags to stdout?
- Can we detect than a given Makefile has a proper TAGS target (that can 
output to stdout)?

Not sure yet how to handle the TAGS files inclusions, though.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-14  2:05       ` Dmitry Gutov
@ 2018-01-14 16:21         ` Eli Zaretskii
  2018-01-15  1:44           ` Dmitry Gutov
  2018-01-15  1:50           ` John Yates
  0 siblings, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-14 16:21 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sun, 14 Jan 2018 05:05:04 +0300
> 
> >>> We could offer generating a tags table if we don't find one in the
> >>> tree, instead of generating it automatically.
> >>
> >> And then what? Visit it?
> > 
> > No, just do what you intended, but only after an approval.  It could
> > be that the user thought she already visited a tags table, or some
> > other mistake.
> 
> OK, so if the user says yes, we "temporarily visit" to auto-generated 
> tags table. Then the user saves a file and that table get invalidated 
> (or via some other mechanism), and we want to index it again. Ask again?

No, I think asking once per project should be enough.

> >> Warm, probably. But that's the relevant time, isn't it?
> > 
> > Not necessarily.  The first time a tree is scanned could well be the
> > shortly after you start working on a project.
> 
> Not sure what you mean. The tree has to be scanned *sometime* at least 
> once, hasn't it?

I mean the first time the tags table is required might very well be at
the beginning of working on a project, at which time the project
source tree is not yet in the cache.

> For instance, could you try to see how long takes the generation of the 
> file list alone? And populating the buffer with it. But without passing 
> it to etags.

What Lisp shall I use for that?
> > Invoking 'find' will always be faster, as it's optimized for
> > traversing directory trees.
> 
> 'git ls-files' will probably be faster still.

Yes, but that only works in Git repositories.

> > No, TAGS is a standard target in GNU Makefile's.
> 
> OK, good to know. Two questions, then:
> 
> - Can we make it output the tags to stdout?

Not likely.  But you could just visit the TAGS file(s), no?

> - Can we detect than a given Makefile has a proper TAGS target (that can 
> output to stdout)?

Maybe CEDET has something, but if not, searching for ^TAGS: should be
easy.

> Not sure yet how to handle the TAGS files inclusions, though.

"make TAGS" should handle it, as it does in Emacs.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-14 16:21         ` Eli Zaretskii
@ 2018-01-15  1:44           ` Dmitry Gutov
  2018-01-15  5:37             ` Eli Zaretskii
  2018-01-15  1:50           ` John Yates
  1 sibling, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-15  1:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/14/18 7:21 PM, Eli Zaretskii wrote:

>> OK, so if the user says yes, we "temporarily visit" to auto-generated
>> tags table. Then the user saves a file and that table get invalidated
>> (or via some other mechanism), and we want to index it again. Ask again?
> 
> No, I think asking once per project should be enough.

Until the end of the current Emacs session? And ask again after restart?

What about if the user switches to a different project and then back?

> I mean the first time the tags table is required might very well be at
> the beginning of working on a project, at which time the project
> source tree is not yet in the cache.

Yes, and? The user will need it to be indexed either way, right?

There's also another optimization opportunity: performing reindexing in 
an asynchronous fashion, in the background (maybe after a timeout, too), 
after any file is changed and saved. This one comes with its own 
tradeoffs, though.

>> For instance, could you try to see how long takes the generation of the
>> file list alone? And populating the buffer with it. But without passing
>> it to etags.
> 
> What Lisp shall I use for that?

To measure the full time:

(benchmark 1 '(progn (etags--project-tags-cleanup) 
(etags--maybe-use-project-tags)))

To measure the time to generate the list of files only:

(benchmark 1 '(all-completions "" (project-file-completion-table 
(project-current) (list default-directory))))

>>> Invoking 'find' will always be faster, as it's optimized for
>>> traversing directory trees.
>>
>> 'git ls-files' will probably be faster still.
> 
> Yes, but that only works in Git repositories.

We can probably optimize for that use case these days. Git or some other 
VCS is usually in place, especially in non-toy projects.

>>> No, TAGS is a standard target in GNU Makefile's.
>>
>> OK, good to know. Two questions, then:
>>
>> - Can we make it output the tags to stdout?
> 
> Not likely.  But you could just visit the TAGS file(s), no?

Hmm, there are reasons not to do that in general, but if the way we 
generate the files is known to be "right", they mostly disappear (except 
for the implementation complexity: doing it this way and using temporary 
files in the other case will require more code).

How do we figure which files to visit? Do we just visit src/TAGS and 
expect the rest to be 'include'-d.

>> - Can we detect than a given Makefile has a proper TAGS target (that can
>> output to stdout)?
> 
> Maybe CEDET has something, but if not, searching for ^TAGS: should be
> easy.
> 
>> Not sure yet how to handle the TAGS files inclusions, though.
> 
> "make TAGS" should handle it, as it does in Emacs.

So these questions have answers, good.

Here's another one: considering the reindexing costs are not always 
negligible and depend on the size of a project, will there be actual 
benefit to using the proposed scheme in GNU projects like Emacs, GCC and 
others (those are the ones that use 'make TAGS')? Or is there a subset 
of them, at least, which we expect to benefit?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-14 16:21         ` Eli Zaretskii
  2018-01-15  1:44           ` Dmitry Gutov
@ 2018-01-15  1:50           ` John Yates
  2018-01-15  5:42             ` Eli Zaretskii
  1 sibling, 1 reply; 49+ messages in thread
From: John Yates @ 2018-01-15  1:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emacs developers, Dmitry Gutov

> > > Invoking 'find' will always be faster, as it's optimized for
> > > traversing directory trees.
> >
> > 'git ls-files' will probably be faster still.
>
> Yes, but that only works in Git repositories.

The context of this discussion is _large_ projects.  My sense is
that git's efficiency relative to other SCM technologies means
that the larger the project the higher the likelihood of use git.

Also, when talking speed ripgrep has been a revelation:

  https://github.com/BurntSushi/ripgrep

Admitted that ripgrep is written in rust but a scanner exploiting
similar ideas could change what we imagine to be a big project.

/john



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  1:44           ` Dmitry Gutov
@ 2018-01-15  5:37             ` Eli Zaretskii
  2018-01-15 18:50               ` Dmitry Gutov
  2018-01-17 11:08               ` Dmitry Gutov
  0 siblings, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-15  5:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 15 Jan 2018 04:44:58 +0300
> 
> On 1/14/18 7:21 PM, Eli Zaretskii wrote:
> 
> >> OK, so if the user says yes, we "temporarily visit" to auto-generated
> >> tags table. Then the user saves a file and that table get invalidated
> >> (or via some other mechanism), and we want to index it again. Ask again?
> > 
> > No, I think asking once per project should be enough.
> 
> Until the end of the current Emacs session? And ask again after restart?

Yes.

> What about if the user switches to a different project and then back?

Ideally, don't ask anymore about that project.

> > I mean the first time the tags table is required might very well be at
> > the beginning of working on a project, at which time the project
> > source tree is not yet in the cache.
> 
> Yes, and? The user will need it to be indexed either way, right?

Yes, but my point was that col-cache times cannot be ignored.

> There's also another optimization opportunity: performing reindexing in 
> an asynchronous fashion, in the background (maybe after a timeout, too), 
> after any file is changed and saved. This one comes with its own 
> tradeoffs, though.

Doing that asynchronously could be an automatic action , not in need
of any user confirmation.  It complicates the implementation a bit,
but perhaps not too much, so this could be a good design choice.

> To measure the full time:
> 
> (benchmark 1 '(progn (etags--project-tags-cleanup) 
> (etags--maybe-use-project-tags)))

5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
most of the time is spent by external programs, so perhaps this
doesn't matter.

> To measure the time to generate the list of files only:
> 
> (benchmark 1 '(all-completions "" (project-file-completion-table 
> (project-current) (list default-directory))))

0.95 sec with cold cache, 0.23 with warm cache.

> >> 'git ls-files' will probably be faster still.
> > 
> > Yes, but that only works in Git repositories.
> 
> We can probably optimize for that use case these days. Git or some other 
> VCS is usually in place, especially in non-toy projects.

For the projects using Git, yes.

> How do we figure which files to visit? Do we just visit src/TAGS and 
> expect the rest to be 'include'-d.

I think just visit TAGS in the directory of the source whose symbol is
requested, or maybe use locate-dominating-file to look higher in the
tree if not found in the current directory.

> Here's another one: considering the reindexing costs are not always 
> negligible and depend on the size of a project, will there be actual 
> benefit to using the proposed scheme in GNU projects like Emacs, GCC and 
> others (those are the ones that use 'make TAGS')? Or is there a subset 
> of them, at least, which we expect to benefit?

That's a good question.  But if the tags table is automatically
produced in the background, the time this takes is much less
important, and having TAGS always up to date would be a valuable
feature.  FWIW, I do "make TAGS" in every large project I start
working on seriously, so at least for me this is important.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  1:50           ` John Yates
@ 2018-01-15  5:42             ` Eli Zaretskii
  2018-01-15 15:01               ` Dmitry Gutov
  2018-01-15 16:33               ` John Yates
  0 siblings, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-15  5:42 UTC (permalink / raw)
  To: John Yates; +Cc: emacs-devel, dgutov

> From: John Yates <john@yates-sheets.org>
> Date: Sun, 14 Jan 2018 20:50:02 -0500
> Cc: Dmitry Gutov <dgutov@yandex.ru>, Emacs developers <emacs-devel@gnu.org>
> 
> > > > Invoking 'find' will always be faster, as it's optimized for
> > > > traversing directory trees.
> > >
> > > 'git ls-files' will probably be faster still.
> >
> > Yes, but that only works in Git repositories.
> 
> The context of this discussion is _large_ projects.  My sense is
> that git's efficiency relative to other SCM technologies means
> that the larger the project the higher the likelihood of use git.

That's definitely true for personal and FLOSS environments, but not
elsewhere.  Where I earn my paycheck, they use TFS and even
ClearCase(!).



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  5:42             ` Eli Zaretskii
@ 2018-01-15 15:01               ` Dmitry Gutov
  2018-01-15 17:21                 ` Eli Zaretskii
  2018-01-15 16:33               ` John Yates
  1 sibling, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-15 15:01 UTC (permalink / raw)
  To: Eli Zaretskii, John Yates; +Cc: emacs-devel

On 1/15/18 8:42 AM, Eli Zaretskii wrote:

> That's definitely true for personal and FLOSS environments, but not
> elsewhere.  Where I earn my paycheck, they use TFS and even
> ClearCase(!).

No counterparts to 'git ls-files' in those VC systems?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  5:42             ` Eli Zaretskii
  2018-01-15 15:01               ` Dmitry Gutov
@ 2018-01-15 16:33               ` John Yates
  1 sibling, 0 replies; 49+ messages in thread
From: John Yates @ 2018-01-15 16:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Dmitry Gutov, Emacs developers

On Mon, Jan 15, 2018 at 12:42 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>  Where I earn my paycheck, they use TFS and even ClearCase(!).

My condolences :-)

Small aside: At Apollo Computer I contributed to DSEE - ClearCase's
progenitor - by designing SML, the System Model Language.  After our
acquisition by HP I opted not to join Atria, ending up instead in
the Alpha chip group at DEC (my second stint with that company).

/john



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 15:01               ` Dmitry Gutov
@ 2018-01-15 17:21                 ` Eli Zaretskii
  2018-01-15 17:45                   ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-15 17:21 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel, john

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 15 Jan 2018 18:01:00 +0300
> 
> On 1/15/18 8:42 AM, Eli Zaretskii wrote:
> 
> > That's definitely true for personal and FLOSS environments, but not
> > elsewhere.  Where I earn my paycheck, they use TFS and even
> > ClearCase(!).
> 
> No counterparts to 'git ls-files' in those VC systems?

Some, but they are not faster than 'find' running locally, AFAIR.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 17:21                 ` Eli Zaretskii
@ 2018-01-15 17:45                   ` Dmitry Gutov
  2018-01-15 20:56                     ` Matthias Meulien
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-15 17:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, john

On 1/15/18 8:21 PM, Eli Zaretskii wrote:

>> No counterparts to 'git ls-files' in those VC systems?
> 
> Some, but they are not faster than 'find' running locally, AFAIR.

Anyway, I don't think we support those projects via the VC project 
backend because there are no VC backends for these AFAIK.

When someone creates a project backend for them, they can implement the 
file listing speedup one way or another. Maybe by keeping the list of 
files in memory and listening for file events, as one option.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  5:37             ` Eli Zaretskii
@ 2018-01-15 18:50               ` Dmitry Gutov
  2018-01-16 17:50                 ` Eli Zaretskii
  2018-01-17 22:02                 ` Tom Tromey
  2018-01-17 11:08               ` Dmitry Gutov
  1 sibling, 2 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-15 18:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/15/18 8:37 AM, Eli Zaretskii wrote:

>>> No, I think asking once per project should be enough.
>>
>> Until the end of the current Emacs session? And ask again after restart?
> 
> Yes.
> 
>> What about if the user switches to a different project and then back?
> 
> Ideally, don't ask anymore about that project.

This is doable, ok.

>> There's also another optimization opportunity: performing reindexing in
>> an asynchronous fashion, in the background (maybe after a timeout, too),
>> after any file is changed and saved. This one comes with its own
>> tradeoffs, though.
> 
> Doing that asynchronously could be an automatic action , not in need
> of any user confirmation.  It complicates the implementation a bit,
> but perhaps not too much, so this could be a good design choice.

It would shorten the waits, but do nothing for the CPU and disk usage. 
Those I'm more worried about, actually, as a laptop user with a 
not-so-great battery life on GNU/Linux.

If we just invalidate on save, the rescan doesn't happen until you 
intend to use it again. And with asynchronous approach, they will occur 
again and again, just as you edit and save files. With large projects, 
one CPU core might always be busy this way (or does etags parallelize? 
more cores then).

So maybe someone would prefer this approach, but I'd only go for it only 
as a qualify-of-life improvements when scans are already pretty short.

>> To measure the full time:
>>
>> (benchmark 1 '(progn (etags--project-tags-cleanup)
>> (etags--maybe-use-project-tags)))
> 
> 5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
> most of the time is spent by external programs, so perhaps this
> doesn't matter.

Probably doesn't, indeed.

>> To measure the time to generate the list of files only:
>>
>> (benchmark 1 '(all-completions "" (project-file-completion-table
>> (project-current) (list default-directory))))
> 
> 0.95 sec with cold cache, 0.23 with warm cache.

Thanks, so 1 second for file listing for 4.5 seconds for etags. Even if 
we allowed etags to execute in parallel with find, it could only shave 
it down to 4.5 seconds (and probably not even that).

>> How do we figure which files to visit? Do we just visit src/TAGS and
>> expect the rest to be 'include'-d.
> 
> I think just visit TAGS in the directory of the source whose symbol is
> requested, or maybe use locate-dominating-file to look higher in the
> tree if not found in the current directory.

That option is not as easy to code as what I suggested.

Further, if we just visit lisp/TAGS when in lisp/, and xref-etags-mode 
is enabled, we won't be able to find the definition of 'car'.

>> Here's another one: considering the reindexing costs are not always
>> negligible and depend on the size of a project, will there be actual
>> benefit to using the proposed scheme in GNU projects like Emacs, GCC and
>> others (those are the ones that use 'make TAGS')? Or is there a subset
>> of them, at least, which we expect to benefit?
> 
> That's a good question.  But if the tags table is automatically
> produced in the background, the time this takes is much less
> important, and having TAGS always up to date would be a valuable
> feature.  FWIW, I do "make TAGS" in every large project I start
> working on seriously, so at least for me this is important.

You probably do it just once, though, and update very rarely. The old 
way of operation is still going to work.

Can we improve the "warm" reindex times? In the first message of this 
thread I mentioned GNU Global because it reportedly supports incremental 
updates. Can we get such feature in etags, too?

I more or less imagine how I'd implement such a feature using Lisp and 
'etags --append', but that would do nothing to help when the tags are 
generated by make.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 17:45                   ` Dmitry Gutov
@ 2018-01-15 20:56                     ` Matthias Meulien
  2018-01-15 21:44                       ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Matthias Meulien @ 2018-01-15 20:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 229 bytes --]

> Anyway, I don't think we support those projects via the VC project 
> backend because there are no VC backends for these AFAIK.

Few years ago, I was forced to use TFS... https://marmalade-repo.org/pa
ckages/vc-tfs
-- 
Matthias

[-- Attachment #2: Type: text/html, Size: 511 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 20:56                     ` Matthias Meulien
@ 2018-01-15 21:44                       ` Dmitry Gutov
  0 siblings, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-15 21:44 UTC (permalink / raw)
  To: Matthias Meulien; +Cc: emacs-devel

On 1/15/18 11:56 PM, Matthias Meulien wrote:
>> Anyway, I don't think we support those projects via the VC project
>> backend because there are no VC backends for these AFAIK.
> 
> Few years ago, I was forced to use TFS... 
> https://marmalade-repo.org/packages/vc-tfs

That's pretty cool. But the second paragraph remains true.

We should add a VC backend action like 'ls-files', and all backends, 
including yours, can provide their own implementation. The default will 
use 'find'.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 18:50               ` Dmitry Gutov
@ 2018-01-16 17:50                 ` Eli Zaretskii
  2018-01-16 21:56                   ` Dmitry Gutov
  2018-01-17 22:02                 ` Tom Tromey
  1 sibling, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-16 17:50 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 15 Jan 2018 21:50:33 +0300
> 
> Can we improve the "warm" reindex times? In the first message of this 
> thread I mentioned GNU Global because it reportedly supports incremental 
> updates. Can we get such feature in etags, too?

Incremental tagging needs to leave the record about what was tagged
somewhere, right?  Since there's no such feature in etags now, this
sounds like a project for which I won't have time any time soon.  Any
volunteers?

> I more or less imagine how I'd implement such a feature using Lisp and 
> 'etags --append', but that would do nothing to help when the tags are 
> generated by make.

It will also not help if Emacs is restarted, right?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-16 17:50                 ` Eli Zaretskii
@ 2018-01-16 21:56                   ` Dmitry Gutov
  2018-01-17 15:40                     ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-16 21:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/16/18 20:50, Eli Zaretskii wrote:

> Incremental tagging needs to leave the record about what was tagged
> somewhere, right?  

The information is inside the TAGS file, isn't it? Even though it's in a 
flat list, unsorted, spread throughout the file.

I was thinking that maybe we can add this feature simply using some 
clever engineering, without changing the format of the file.

And I think it should be fairly easy (in terms of the algorithm, at 
least) to implement incremental update for one-to-few files: you scan 
through the file, remove the corresponding entries, and then scan the 
files (ones that still exist) and add those entries at the end.

Might be slower to incrementally update when passed (almost) the same 
list of files, like 'make tags' does. It has higher complexity on paper 
(looking for/matching file names), but maybe it would still yield a 
measurable improvement over a full reindex.

> Since there's no such feature in etags now, this
> sounds like a project for which I won't have time any time soon.  Any
> volunteers?

Not volunteering yet. Could be something I might have time for a few 
months from now, depending on whether we have a solid plan and you'll 
want to provide some hand-holding.

>> I more or less imagine how I'd implement such a feature using Lisp and
>> 'etags --append', but that would do nothing to help when the tags are
>> generated by make.
> 
> It will also not help if Emacs is restarted, right?

Right, but it will do a full scan after the restart. Spending a longer 
amount of time just once per project per restart is more or less fine, I 
think. Especially after an explicit prompt (I've added one now, you can 
see it on the branch).




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15  5:37             ` Eli Zaretskii
  2018-01-15 18:50               ` Dmitry Gutov
@ 2018-01-17 11:08               ` Dmitry Gutov
  1 sibling, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-17 11:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/15/18 08:37, Eli Zaretskii wrote:
>> To measure the full time:
>>
>> (benchmark 1 '(progn (etags--project-tags-cleanup)
>> (etags--maybe-use-project-tags)))
> 5.5 sec with warm cache.  This is with an unoptimized Emacs, btw, but
> most of the time is spent by external programs, so perhaps this
> doesn't matter.

BTW, I've just measured it on my older laptop with an SSD, bought around 
2012. Just 2 seconds here.

The CPU is i7-3630QM, 4-core and more or less top-of-the-line for a 
laptop back then, but not desktop grade anyway.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-16 21:56                   ` Dmitry Gutov
@ 2018-01-17 15:40                     ` Eli Zaretskii
  2018-01-17 19:43                       ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-17 15:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 17 Jan 2018 00:56:01 +0300
> 
> On 1/16/18 20:50, Eli Zaretskii wrote:
> 
> > Incremental tagging needs to leave the record about what was tagged
> > somewhere, right?  
> 
> The information is inside the TAGS file, isn't it?

Not really: the signature of each file at the moment of scan is
missing, and maybe also other things.

> And I think it should be fairly easy (in terms of the algorithm, at 
> least) to implement incremental update for one-to-few files: you scan 
> through the file, remove the corresponding entries, and then scan the 
> files (ones that still exist) and add those entries at the end.

This should be relatively easy.  Is that what you meant by
"incremental"?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 15:40                     ` Eli Zaretskii
@ 2018-01-17 19:43                       ` Dmitry Gutov
  2018-01-17 20:12                         ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-17 19:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/17/18 18:40, Eli Zaretskii wrote:

>> The information is inside the TAGS file, isn't it?
> 
> Not really: the signature of each file at the moment of scan is
> missing, and maybe also other things.

Meaning it's not known at etags runtime? That's true.

>> And I think it should be fairly easy (in terms of the algorithm, at
>> least) to implement incremental update for one-to-few files: you scan
>> through the file, remove the corresponding entries, and then scan the
>> files (ones that still exist) and add those entries at the end.
> 
> This should be relatively easy.  Is that what you meant by
> "incremental"?

Incremental is the goal, and this is just one approach. Not sure whether 
it will help with 'make tags', like I said.

Maybe if Makefile somehow tracked the file that need to be reindexed? 
Using their mtimes, for instance.

Make tracks which files need to be recompiled, right? This could reuse 
the same mechanism, if possible.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 19:43                       ` Dmitry Gutov
@ 2018-01-17 20:12                         ` Eli Zaretskii
  2018-01-17 22:19                           ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-01-17 20:12 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 17 Jan 2018 22:43:17 +0300
> 
> On 1/17/18 18:40, Eli Zaretskii wrote:
> 
> >> The information is inside the TAGS file, isn't it?
> > 
> > Not really: the signature of each file at the moment of scan is
> > missing, and maybe also other things.
> 
> Meaning it's not known at etags runtime? That's true.

It's also not recorded after the run, so that the next run could
compare against it.

> Make tracks which files need to be recompiled, right? This could reuse 
> the same mechanism, if possible.

Make doesn't support incremental builds, except on file granularity.
Doing that with etags would be easy, but you say that is not the
"incremental" tagging you had in mind, you want the granularity to be
below file, i.e. on tag level.  For that, a single time stamp of the
TAGS file is insufficient, because all it can tell you is which files
need to be re-tagged, whereas you want to know that on a per-tag
basis.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-15 18:50               ` Dmitry Gutov
  2018-01-16 17:50                 ` Eli Zaretskii
@ 2018-01-17 22:02                 ` Tom Tromey
  2018-01-17 22:44                   ` Dmitry Gutov
  1 sibling, 1 reply; 49+ messages in thread
From: Tom Tromey @ 2018-01-17 22:02 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

Dmitry> Can we improve the "warm" reindex times? In the first message of this
Dmitry> thread I mentioned GNU Global because it reportedly supports
Dmitry> incremental updates. Can we get such feature in etags, too?

A while ago I wrote to this list about some work I'd done in etags.
This was one of the features -- that is, I made "etags -u" work in etags
mode, not just ctags mode.

The overall thrust of the project was to combine the use of a config
file in the source tree with auto-updating.  The former I considered
necessary to (1) avoid dealing with Makefiles and whatnot (and, building
tags in the source tree is much nicer when doing out-of-tree builds...),
and (2) make it so that Emacs could run "etags -u" on save without
needing to teach it about the correct command line arguments -- they are
in the config file (ages ago I did this same project but stored the
command line arguments in the TAGS file; but this patch was never
accepted).

Anyway, this all works and I am happy to send the patches if you want
them.  I personally switched to GNU Global in the meantime; it has these
features and a few more, though it too is not without its warts.

I suppose I could push my branch to git if that would be convenient.
Just let me know.

Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 20:12                         ` Eli Zaretskii
@ 2018-01-17 22:19                           ` Dmitry Gutov
  2018-01-17 22:28                             ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-17 22:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/17/18 23:12, Eli Zaretskii wrote:

>>>> The information is inside the TAGS file, isn't it?
>>>
>>> Not really: the signature of each file at the moment of scan is
>>> missing, and maybe also other things.
>>
>> Meaning it's not known at etags runtime? That's true.
> 
> It's also not recorded after the run, so that the next run could
> compare against it.

What's a signature? The file names are recorded, of course. But not 
mtimes, if you mean them.

>> Make tracks which files need to be recompiled, right? This could reuse
>> the same mechanism, if possible.
> 
> Make doesn't support incremental builds, except on file granularity.
> Doing that with etags would be easy, but you say that is not the
> "incremental" tagging you had in mind, you want the granularity to be
> below file, i.e. on tag level.  For that, a single time stamp of the
> TAGS file is insufficient, because all it can tell you is which files
> need to be re-tagged, whereas you want to know that on a per-tag
> basis.

On the contrary: file granularity should be fine. We just need *some* 
accounting somewhere to detect which files to reindex.

It's hard for me to imagine how a granularity lower than file would even 
work.

Another question is how to deal with file deletions: when a file is 
absent, there's nothing to take into account, for Make to trigger an 
incremental rebuild (or actually a deletion of this file's entry). Maybe 
this is a rare enough case, though, that 'make tags' users will deal 
with it manually in some fashion or other.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 22:19                           ` Dmitry Gutov
@ 2018-01-17 22:28                             ` Dmitry Gutov
  0 siblings, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-17 22:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

On 1/18/18 01:19, Dmitry Gutov wrote:

> On the contrary: file granularity should be fine. We just need *some* 
> accounting somewhere to detect which files to reindex.
> 
> It's hard for me to imagine how a granularity lower than file would even 
> work.

Looking at 
https://www.gnu.org/software/global/manual/global.html#Incremental-updating, 
Global also seems to use file granularity.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 22:02                 ` Tom Tromey
@ 2018-01-17 22:44                   ` Dmitry Gutov
  2018-01-17 23:20                     ` Tom Tromey
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-17 22:44 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

Hi Tom,

On 1/18/18 01:02, Tom Tromey wrote:

> A while ago I wrote to this list about some work I'd done in etags.
> This was one of the features -- that is, I made "etags -u" work in etags
> mode, not just ctags mode.

ctags supports incremental updates?

> The overall thrust of the project was to combine the use of a config
> file in the source tree with auto-updating.  The former I considered
> necessary to (1) avoid dealing with Makefiles and whatnot (and, building
> tags in the source tree is much nicer when doing out-of-tree builds...),
> and (2) make it so that Emacs could run "etags -u" on save without
> needing to teach it about the correct command line arguments -- they are
> in the config file (ages ago I did this same project but stored the
> command line arguments in the TAGS file; but this patch was never
> accepted).

The config file duplicates some info in .gitignore and 
project-vc-ignores, though. Which is unfortunate.

I also doubt that 'make tags' will go away anytime soon.

Any chance your code supports something analogous to 'global 
--single-update' (except with multiple file arguments, hopefully)? 
That's what I've been trying to describe.

> I suppose I could push my branch to git if that would be convenient.
> Just let me know.

I'm sure it would be helpful, even if only to study the approach.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 22:44                   ` Dmitry Gutov
@ 2018-01-17 23:20                     ` Tom Tromey
  2018-01-18  0:14                       ` Dmitry Gutov
  2020-12-08 22:26                       ` Dmitry Gutov
  0 siblings, 2 replies; 49+ messages in thread
From: Tom Tromey @ 2018-01-17 23:20 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

>> A while ago I wrote to this list about some work I'd done in etags.
>> This was one of the features -- that is, I made "etags -u" work in etags
>> mode, not just ctags mode.

Dmitry> ctags supports incremental updates?

Yeah, see "-u" in "ctags --help":

-u, --update
        Update the tag entries for the given files, leaving tag
        entries for other files in place.  Currently, this is
        implemented by deleting the existing entries for the given
        files and then rewriting the new entries at the end of the
        tags file.  It is often faster to simply rebuild the entire
        tag file than to use this.

Dmitry> The config file duplicates some info in .gitignore and
Dmitry> project-vc-ignores, though. Which is unfortunate.

True, but there are two reasons for this.  One, there are still several
version control systems in use, and I didn't want to try to parse every
config file.  Second, sometimes a generated file will be checked in, but
you might still want to omit it from TAGS -- so some mechanism like this
is needed.

Dmitry> I also doubt that 'make tags' will go away anytime soon.

Sure, me too, but this approach is taken by more tools nowadays, and it
can live alongside "make tags".

Dmitry> Any chance your code supports something analogous to 'global
Dmitry> --single-update' (except with multiple file arguments, hopefully)?
Dmitry> That's what I've been trying to describe.

It could certainly be done without much effort.

>> I suppose I could push my branch to git if that would be convenient.
>> Just let me know.

Dmitry> I'm sure it would be helpful, even if only to study the approach.

I pushed it to feature/etags-update.

Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 23:20                     ` Tom Tromey
@ 2018-01-18  0:14                       ` Dmitry Gutov
  2018-01-18  1:30                         ` Dmitry Gutov
  2018-01-19  1:21                         ` Dmitry Gutov
  2020-12-08 22:26                       ` Dmitry Gutov
  1 sibling, 2 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-18  0:14 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/18/18 02:20, Tom Tromey wrote:

> Dmitry> ctags supports incremental updates?
> 
> Yeah, see "-u" in "ctags --help":

Cool. This is probably from Universal Ctags. Still not in Ubuntu (17.04, 
at least).

> -u, --update
>          Update the tag entries for the given files, leaving tag
>          entries for other files in place.  Currently, this is
>          implemented by deleting the existing entries for the given
>          files and then rewriting the new entries at the end of the
>          tags file.  It is often faster to simply rebuild the entire
>          tag file than to use this.

Seems like it updates the file in place. And your implementation copies 
the entries and allocates a new file. Any difference in performance that 
you noticed?

> Dmitry> The config file duplicates some info in .gitignore and
> Dmitry> project-vc-ignores, though. Which is unfortunate.
> 
> True, but there are two reasons for this.  One, there are still several
> version control systems in use, and I didn't want to try to parse every
> config file.

I think it should work like:

git ls-files | etags -

Replace 'git ls-files' with an appropriate incantation for each VCS.

> Second, sometimes a generated file will be checked in, but
> you might still want to omit it from TAGS -- so some mechanism like this
> is needed.

Yup. This seems to work (excluding .clang_format; there's also an option 
to read additional ignores from a file):

git ls-files -cdmo --exclude-standard -x .clang-format | etags -

> Dmitry> I also doubt that 'make tags' will go away anytime soon.
> 
> Sure, me too, but this approach is taken by more tools nowadays, and it
> can live alongside "make tags".

Is that true? The modern tools have taken the approach of duplicating 
the ignore lists?

> Dmitry> Any chance your code supports something analogous to 'global
> Dmitry> --single-update' (except with multiple file arguments, hopefully)?
> Dmitry> That's what I've been trying to describe.
> 
> It could certainly be done without much effort.

Very good. Does it handle file deletions as well?

>>> I suppose I could push my branch to git if that would be convenient.
>>> Just let me know.
> 
> Dmitry> I'm sure it would be helpful, even if only to study the approach.
> 
> I pushed it to feature/etags-update.

Thanks! I'll do some benchmarking if nobody beats me to it.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-18  0:14                       ` Dmitry Gutov
@ 2018-01-18  1:30                         ` Dmitry Gutov
  2018-01-19  1:21                         ` Dmitry Gutov
  1 sibling, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-18  1:30 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/18/18 03:14, Dmitry Gutov wrote:
> On 1/18/18 02:20, Tom Tromey wrote:
> 
>> Dmitry> ctags supports incremental updates?
>>
>> Yeah, see "-u" in "ctags --help":
> 
> Cool. This is probably from Universal Ctags.
Or not. Weird to see a feature in Emacs's ctags, but not in the commonly 
distributed one.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-18  0:14                       ` Dmitry Gutov
  2018-01-18  1:30                         ` Dmitry Gutov
@ 2018-01-19  1:21                         ` Dmitry Gutov
  2018-01-20 22:15                           ` Tom Tromey
  2018-01-20 23:57                           ` Tom Tromey
  1 sibling, 2 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-19  1:21 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/18/18 03:14, Dmitry Gutov wrote:

>> Dmitry> I'm sure it would be helpful, even if only to study the approach.
>>
>> I pushed it to feature/etags-update.
> 
> Thanks! I'll do some benchmarking if nobody beats me to it.

The first result: 'make tags' never finishes with that version. Here's 
the last output before it's stuck:

../lib-src/etags --include=../lisp/TAGS --include=../lwlib/TAGS \
   --regex='{c}/[ 	]*DEFVAR_[A-Z_ 	(]+"\([^"]+\)"/\1/' \
   --regex='{c}/[ 	]*DEFVAR_[A-Z_ 	(]+"[^"]+",[ 	]\([A-Za-z0-9_]+\)/\1/' \
   ./character.c <...many lines here...> ./systime.h \
   --regex='{objc}/[ 	]*DEFVAR_[A-Z_ 	(]+"\([^"]+\)"/\1/' \
   --regex='{objc}/[ 	]*DEFVAR_[A-Z_ 	(]+"[^"]+",[ 
]\([A-Za-z0-9_]+\)/\1/' \
   ./macfont.m ./nsfont.m ./nsmenu.m ./nsterm.m ./nsselect.m ./nsfns.m 
./nsimage.m

Tried it several times, including after 'make bootstrap'.

And here's another way to make it freeze:

find . -name "*.[ch]" | lib-src/etags - -o TAGS

(But both *.c and *.h finish okay separately).

Other than that:

1. '-u' doesn't seem to be statistically significantly slower than 
without (good), but maybe I haven't tested large enough repos.

2. Not sure I understand how .etags works. How do I tell it to only 
index *.c files? Any presence of wildcard ignore (!*) makes 'etags 
--find' index nothing.

3. 'lib-src/etags src/category.c -u -o TAGS' results in having only that 
file's tags inside TAGS (before, it contained the entries for all .c 
files). This behavior contradicts the help and stops the rest of my 
testing in its tracks.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-19  1:21                         ` Dmitry Gutov
@ 2018-01-20 22:15                           ` Tom Tromey
  2018-01-20 23:57                           ` Tom Tromey
  1 sibling, 0 replies; 49+ messages in thread
From: Tom Tromey @ 2018-01-20 22:15 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

Dmitry> The first result: 'make tags' never finishes with that version.

Thanks, there was a bug in the file name hash table.
I've pushed a fix.

Dmitry> And here's another way to make it freeze:
Dmitry> find . -name "*.[ch]" | lib-src/etags - -o TAGS

This was the same problem.

Dmitry> 2. Not sure I understand how .etags works. How do I tell it to only
Dmitry> index *.c files? Any presence of wildcard ignore (!*) makes 'etags
Dmitry> --find' index nothing.

Each file that is found is run through all the matchers, and the last
one that matches wins.  So, this worked for me to ignore everything, but
allow .c files:

!*
*.c

The first line ignores everything, but then the second line accepts .c
files.

Dmitry> 3. 'lib-src/etags src/category.c -u -o TAGS' results in having only
Dmitry> that file's tags inside TAGS (before, it contained the entries for all
Dmitry> .c files). This behavior contradicts the help and stops the rest of my
Dmitry> testing in its tracks.

I didn't look into this yet.

Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-19  1:21                         ` Dmitry Gutov
  2018-01-20 22:15                           ` Tom Tromey
@ 2018-01-20 23:57                           ` Tom Tromey
  2018-01-21 12:26                             ` Dmitry Gutov
  1 sibling, 1 reply; 49+ messages in thread
From: Tom Tromey @ 2018-01-20 23:57 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, emacs-devel

Dmitry> 3. 'lib-src/etags src/category.c -u -o TAGS' results in having only
Dmitry> that file's tags inside TAGS (before, it contained the entries for all
Dmitry> .c files). This behavior contradicts the help and stops the rest of my
Dmitry> testing in its tracks.

Another bug on my part, I think I've fixed it now.

The next question for you is how you want it to work.

Right now there are two modes:

* The old mode (but with the possibility of -u)
* --find mode -- which implies -u.  The .etags file is only read in find
  mode.

My idea for this code was to use locate-dominating-file to find the
.etags file and just run "etags --find" from there on save.

However I know you'd like to use git ls-files, and maybe other things --
you mentioned correctly handling file deletions.

Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-20 23:57                           ` Tom Tromey
@ 2018-01-21 12:26                             ` Dmitry Gutov
  2018-01-30  4:45                               ` Tom Tromey
  2018-01-30  5:05                               ` Tom Tromey
  0 siblings, 2 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-01-21 12:26 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/21/18 2:57 AM, Tom Tromey wrote:
> Dmitry> 3. 'lib-src/etags src/category.c -u -o TAGS' results in having only
> Dmitry> that file's tags inside TAGS (before, it contained the entries for all
> Dmitry> .c files). This behavior contradicts the help and stops the rest of my
> Dmitry> testing in its tracks.
> 
> Another bug on my part, I think I've fixed it now.

Thanks! It's working better now, but here's another experiment:

~/v/emacs-master (feature/etags-update|…) $ make tags
<completes okay>
~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c | 
lib-src/etags - -u -o src/TAGS
~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c | 
lib-src/etags - -u -o src/TAGS
lib-src/etags: invalid TAGS file

Also getting the same result after a third call to 'etags --find'.

> The next question for you is how you want it to work.
> 
> Right now there are two modes:
> 
> * The old mode (but with the possibility of -u)

That's how I'm going to use it for now. BTW, '--find' looks consistently 
slower here than 'find -name "*.rb" |':

$ bash -c "time find . -name \"*.rb\" | ~/vc/emacs-master/lib-src/etags 
- -o TAGS"

real	0m0,215s
user	0m0,200s
sys	0m0,116s
$ rm TAGS
$ bash -c "time ~/vc/emacs-master/lib-src/etags --find"

real	0m0,329s
user	0m0,180s
sys	0m0,136s
$ cat .etags
!*
*.rb

> * --find mode -- which implies -u.  The .etags file is only read in find
>    mode.

Not sure --find should always imply -u, actually (or '-o TAGS'). Might 
be harder for debugging. Also, '--find' doesn't deal with deleted files, 
so you'll delete one, call '--find', and the entry will remain in TAGS.

Anyway, my intention for now is to cherry-pick the '-u' feature as soon 
as it's working well. Nothing against '--find', but it should probably 
be a separate discussion.

> My idea for this code was to use locate-dominating-file to find the
> .etags file and just run "etags --find" from there on save.

That would complicate things somewhat, especially when .etags is not a 
VC root of the project. You could create a .etags based project type, 
though.

> However I know you'd like to use git ls-files, and maybe other things --
> you mentioned correctly handling file deletions.

Speaking of deletions, this is not strictly necessary (we could just 
delete the whole TAGS and recreate), but it would be helpful if, when 
passed a non-existing file that is in TAGS, 'etags -u' deleted it from 
the index. Right now, it only complains "No such file or directory", but 
keeps it in the index.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-21 12:26                             ` Dmitry Gutov
@ 2018-01-30  4:45                               ` Tom Tromey
  2018-02-04 23:32                                 ` Dmitry Gutov
  2018-01-30  5:05                               ` Tom Tromey
  1 sibling, 1 reply; 49+ messages in thread
From: Tom Tromey @ 2018-01-30  4:45 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

Dmitry> Thanks! It's working better now, but here's another experiment:
Dmitry> ~/v/emacs-master (feature/etags-update|…) $ make tags
Dmitry> <completes okay>
Dmitry> ~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c |
Dmitry> lib-src/etags - -u -o src/TAGS
Dmitry> ~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c |
Dmitry> lib-src/etags - -u -o src/TAGS
Dmitry> lib-src/etags: invalid TAGS file

Ok, I think I've fixed this now.

Dmitry> Anyway, my intention for now is to cherry-pick the '-u' feature as
Dmitry> soon as it's working well. Nothing against '--find', but it should
Dmitry> probably be a separate discussion.

It's fine by me.  There are 3 patches so far to get, but also I haven't
addressed this yet:

Dmitry> Speaking of deletions, this is not strictly necessary (we could just
Dmitry> delete the whole TAGS and recreate), but it would be helpful if, when
Dmitry> passed a non-existing file that is in TAGS, 'etags -u' deleted it from
Dmitry> the index. Right now, it only complains "No such file or directory",
Dmitry> but keeps it in the index.


Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-21 12:26                             ` Dmitry Gutov
  2018-01-30  4:45                               ` Tom Tromey
@ 2018-01-30  5:05                               ` Tom Tromey
  2018-02-04 23:40                                 ` Dmitry Gutov
  1 sibling, 1 reply; 49+ messages in thread
From: Tom Tromey @ 2018-01-30  5:05 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

Dmitry> Speaking of deletions, this is not strictly necessary (we could just
Dmitry> delete the whole TAGS and recreate), but it would be helpful if, when
Dmitry> passed a non-existing file that is in TAGS, 'etags -u' deleted it from
Dmitry> the index. Right now, it only complains "No such file or directory",
Dmitry> but keeps it in the index.

Could you try the appended?  It worked locally for me.

Tom

diff --git a/lib-src/etags.c b/lib-src/etags.c
index 925355f9da..d6eda19be6 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -1942,7 +1942,15 @@ process_file_name (char *file, language *lang)
       if (! inf)
 	{
 	  errno = file_errno;
-	  perror (file);
+
+	  if (update && errno == ENOENT)
+	    {
+	      /* Add the filename to the hash but don't do anything else.  This
+		 will result in the entry being removed from the tags file.  */
+	      add_filename_to_hash (file);
+	    }
+	  else
+	    perror (file);
 	  goto cleanup;
 	}
     }



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-30  4:45                               ` Tom Tromey
@ 2018-02-04 23:32                                 ` Dmitry Gutov
  0 siblings, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-04 23:32 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/30/18 07:45, Tom Tromey wrote:
>>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:
> 
> Dmitry> Thanks! It's working better now, but here's another experiment:
> Dmitry> ~/v/emacs-master (feature/etags-update|…) $ make tags
> Dmitry> <completes okay>
> Dmitry> ~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c |
> Dmitry> lib-src/etags - -u -o src/TAGS
> Dmitry> ~/v/emacs-master (feature/etags-update|…) $ echo src/doc.c |
> Dmitry> lib-src/etags - -u -o src/TAGS
> Dmitry> lib-src/etags: invalid TAGS file
> 
> Ok, I think I've fixed this now.

Confirmed, thank you!



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-30  5:05                               ` Tom Tromey
@ 2018-02-04 23:40                                 ` Dmitry Gutov
  2018-02-05 17:06                                   ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-04 23:40 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 1/30/18 08:05, Tom Tromey wrote:
>>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:
> 
> Dmitry> Speaking of deletions, this is not strictly necessary (we could just
> Dmitry> delete the whole TAGS and recreate), but it would be helpful if, when
> Dmitry> passed a non-existing file that is in TAGS, 'etags -u' deleted it from
> Dmitry> the index. Right now, it only complains "No such file or directory",
> Dmitry> but keeps it in the index.
> 
> Could you try the appended?  It worked locally for me.

Seems to work here too, thank you. But we should probably keep the "file 
not found" reporting (-u means etags's output is unused anyway). It will 
help when the file name is actually a typo.

I wonder if 'make tags' would be able to make use of it somehow. If not, 
I'll probably have to reimplement it in Elisp anyway. Eli, any thoughts?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-04 23:40                                 ` Dmitry Gutov
@ 2018-02-05 17:06                                   ` Eli Zaretskii
  2018-02-05 20:10                                     ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-02-05 17:06 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: tom, emacs-devel

> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 5 Feb 2018 02:40:59 +0300
> 
> I wonder if 'make tags' would be able to make use of it somehow. If not, 
> I'll probably have to reimplement it in Elisp anyway. Eli, any thoughts?

Not sure I understand the question.  Are you asking if "make TAGS"
could run "etags -u"?  I think it could, but we'd need some Make
wizardry to generate a list of files modified after TAGS was last
written.  Currently, we just generate a list of all the relevant
source files.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-05 17:06                                   ` Eli Zaretskii
@ 2018-02-05 20:10                                     ` Dmitry Gutov
  2018-02-06 19:36                                       ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-05 20:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tom, emacs-devel

On 2/5/18 20:06, Eli Zaretskii wrote:
>> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Mon, 5 Feb 2018 02:40:59 +0300
>>
>> I wonder if 'make tags' would be able to make use of it somehow. If not,
>> I'll probably have to reimplement it in Elisp anyway. Eli, any thoughts?
> 
> Not sure I understand the question.  Are you asking if "make TAGS"
> could run "etags -u"?  I think it could, but we'd need some Make
> wizardry to generate a list of files modified after TAGS was last
> written.  Currently, we just generate a list of all the relevant
> source files.

For modified files, it could check modification times, similarly to how 
it can recompile only files it needs to. I think.

But I'm asking if you have any idea for how to deal with _deleted_ 
files? I.e. files that have been deleted since 'make tags' was last run.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-05 20:10                                     ` Dmitry Gutov
@ 2018-02-06 19:36                                       ` Eli Zaretskii
  2018-02-06 20:41                                         ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-02-06 19:36 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: tom, emacs-devel

> Cc: tom@tromey.com, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 5 Feb 2018 23:10:34 +0300
> 
> But I'm asking if you have any idea for how to deal with _deleted_ 
> files? I.e. files that have been deleted since 'make tags' was last run.

Is that important?  We almost never delete files, certainly not C
files.  Lisp files get moved into obsolete/, so maybe the same rule
that detects and deletes "orphan" *.elc files could update TAGS on
behalf of deleted/moved Lisp files.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-06 19:36                                       ` Eli Zaretskii
@ 2018-02-06 20:41                                         ` Dmitry Gutov
  2018-02-07  3:26                                           ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-06 20:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tom, emacs-devel

On 2/6/18 22:36, Eli Zaretskii wrote:

> Is that important?  We almost never delete files, certainly not C
> files.  Lisp files get moved into obsolete/, so maybe the same rule
> that detects and deletes "orphan" *.elc files could update TAGS on
> behalf of deleted/moved Lisp files.

Well... It's a normal occurrence in most projects. Maybe not in Emacs, 
but what about GCC and the rest, which also use 'make tags'?

Also, the developer might be developing some new feature on a branch 
(adding new files), and switch back and forth between branches, 
effectively deleting and recreating said files.

It would be nice to keep the index up-to-date automatically during all 
that. Many users would expect that, I think.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-06 20:41                                         ` Dmitry Gutov
@ 2018-02-07  3:26                                           ` Eli Zaretskii
  2018-02-07  9:47                                             ` Dmitry Gutov
  0 siblings, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2018-02-07  3:26 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: tom, emacs-devel

> X-Spam-Status: No, score=2.3 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
> 	RCVD_IN_SORBS_WEB,T_DKIM_INVALID autolearn=disabled version=3.3.2
> Cc: tom@tromey.com, emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 6 Feb 2018 23:41:29 +0300
> 
> Well... It's a normal occurrence in most projects. Maybe not in Emacs, 
> but what about GCC and the rest, which also use 'make tags'?
> 
> Also, the developer might be developing some new feature on a branch 
> (adding new files), and switch back and forth between branches, 
> effectively deleting and recreating said files.
> 
> It would be nice to keep the index up-to-date automatically during all 
> that. Many users would expect that, I think.

Then I guess we should ask "etags -u" to do that automatically,
i.e. for every file name it finds in TAGS to check whether that file
exists, and if not, remove all its tags from the file.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-07  3:26                                           ` Eli Zaretskii
@ 2018-02-07  9:47                                             ` Dmitry Gutov
  2018-02-07 21:30                                               ` Tom Tromey
  2018-02-08 20:31                                               ` John Yates
  0 siblings, 2 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-07  9:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: tom, emacs-devel

On 2/7/18 06:26, Eli Zaretskii wrote:

>> It would be nice to keep the index up-to-date automatically during all
>> that. Many users would expect that, I think.
> 
> Then I guess we should ask "etags -u" to do that automatically,
> i.e. for every file name it finds in TAGS to check whether that file
> exists, and if not, remove all its tags from the file.

Maybe with a new option (e.g. 'etags -u --prune'), because it'll likely 
take some time. I wonder how much the overhead is going to be.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-07  9:47                                             ` Dmitry Gutov
@ 2018-02-07 21:30                                               ` Tom Tromey
  2018-02-09  9:41                                                 ` Dmitry Gutov
  2018-02-08 20:31                                               ` John Yates
  1 sibling, 1 reply; 49+ messages in thread
From: Tom Tromey @ 2018-02-07 21:30 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, tom, emacs-devel

>>>>> "Dmitry" == Dmitry Gutov <dgutov@yandex.ru> writes:

>> Then I guess we should ask "etags -u" to do that automatically,
>> i.e. for every file name it finds in TAGS to check whether that file
>> exists, and if not, remove all its tags from the file.

Dmitry> Maybe with a new option (e.g. 'etags -u --prune'), because it'll
Dmitry> likely take some time. I wonder how much the overhead is going to be.

One idea would be to detect this situation at M-. time -- that is, when
TAGS tells us about a file that doesn't exist, ignore the bad result and
re-run etags --prune or whatever.

The reason I added --find was to circumvent both this problem (though as
you saw, I didn't actually write this part) and also to deal with a
couple other problems: the need to avoid Makefiles (gcc and gdb have to
be built out-of-tree, which is a pain for running "make tags"), and
consequently the need to have the config be accessible to etags itself.

Tom



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-07  9:47                                             ` Dmitry Gutov
  2018-02-07 21:30                                               ` Tom Tromey
@ 2018-02-08 20:31                                               ` John Yates
  2018-02-09  0:22                                                 ` Dmitry Gutov
  1 sibling, 1 reply; 49+ messages in thread
From: John Yates @ 2018-02-08 20:31 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, tom, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

Git seems to be able to compute new/modified/dropped with quite tolerable
efficiency even fir large projects. Are there lessons to be learned there?

/john

On Feb 7, 2018 3:47 AM, "Dmitry Gutov" <dgutov@yandex.ru> wrote:

> On 2/7/18 06:26, Eli Zaretskii wrote:
>
> It would be nice to keep the index up-to-date automatically during all
>>> that. Many users would expect that, I think.
>>>
>>
>> Then I guess we should ask "etags -u" to do that automatically,
>> i.e. for every file name it finds in TAGS to check whether that file
>> exists, and if not, remove all its tags from the file.
>>
>
> Maybe with a new option (e.g. 'etags -u --prune'), because it'll likely
> take some time. I wonder how much the overhead is going to be.
>
>

[-- Attachment #2: Type: text/html, Size: 1341 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-08 20:31                                               ` John Yates
@ 2018-02-09  0:22                                                 ` Dmitry Gutov
  0 siblings, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-09  0:22 UTC (permalink / raw)
  To: John Yates; +Cc: Eli Zaretskii, tom, emacs-devel

On 2/8/18 23:31, John Yates wrote:
> Git seems to be able to compute new/modified/dropped with quite 
> tolerable efficiency even fir large projects. Are there lessons to be 
> learned there?

One way to interpret that is maybe checking the presence of even a large 
number of files is a fast enough operation. Would you like to give it a 
test?

Alternatively, Git uses some smart caching somewhere.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-02-07 21:30                                               ` Tom Tromey
@ 2018-02-09  9:41                                                 ` Dmitry Gutov
  0 siblings, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2018-02-09  9:41 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

On 2/8/18 00:30, Tom Tromey wrote:

> Dmitry> Maybe with a new option (e.g. 'etags -u --prune'), because it'll
> Dmitry> likely take some time. I wonder how much the overhead is going to be.
> 
> One idea would be to detect this situation at M-. time -- that is, when
> TAGS tells us about a file that doesn't exist, ignore the bad result and
> re-run etags --prune or whatever.

You mean call file-exists-p on all files in TAGS before each lookup? 
That would work. We also have the false positives returned from 
tags-completion-at-point-function to deal with.

I was thinking of using e.g. inotify when it's available, but have not 
decided on the fallbacks.

> The reason I added --find was to circumvent both this problem (though as
> you saw, I didn't actually write this part)

But which problem exactly? As we recall, 'etags --find' turned out to be 
somewhat slower than 'find ... | etags'.

> and also to deal with a
> couple other problems: the need to avoid Makefiles (gcc and gdb have to
> be built out-of-tree, which is a pain for running "make tags"), and
> consequently the need to have the config be accessible to etags itself.

I think, to a certain extent, we can get the same separation of 
information using the '--regex=@regexfile' and standardizing on a file 
name for that.

But this won't help recreate the multiple TAGS files structure with 
inclusions. As a smaller step forward, maybe 'make tags' will call 
'etags --prune' when a certain environment var is set.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Generation of tags for the current project on the fly
  2018-01-17 23:20                     ` Tom Tromey
  2018-01-18  0:14                       ` Dmitry Gutov
@ 2020-12-08 22:26                       ` Dmitry Gutov
  1 sibling, 0 replies; 49+ messages in thread
From: Dmitry Gutov @ 2020-12-08 22:26 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel

Hi Tom,

On 18.01.2018 01:20, Tom Tromey wrote:
>>> I suppose I could push my branch to git if that would be convenient.
>>> Just let me know.
> Dmitry> I'm sure it would be helpful, even if only to study the approach.
> 
> I pushed it to feature/etags-update.

I finally (sigh) went back to this problem and now rebased your branch 
on top of master (see feature/etags_update_v2). And it actually compiles.

Also tried to add checking of file modification times (so that etags 
would skip input files where mtime was older than TAGS's mtime), but 
that doesn't seem to have any positive effect on performance in my 
benchmarking. Perhaps you could take a look at bb625ea36d sometime, see 
whether I missed something?

I'll post the update on the overall endeavor in a separate thread.



^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2020-12-08 22:26 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12  1:02 Generation of tags for the current project on the fly Dmitry Gutov
2018-01-12  9:01 ` Eli Zaretskii
2018-01-12 13:52   ` Dmitry Gutov
2018-01-12 18:52     ` Eli Zaretskii
2018-01-14  2:05       ` Dmitry Gutov
2018-01-14 16:21         ` Eli Zaretskii
2018-01-15  1:44           ` Dmitry Gutov
2018-01-15  5:37             ` Eli Zaretskii
2018-01-15 18:50               ` Dmitry Gutov
2018-01-16 17:50                 ` Eli Zaretskii
2018-01-16 21:56                   ` Dmitry Gutov
2018-01-17 15:40                     ` Eli Zaretskii
2018-01-17 19:43                       ` Dmitry Gutov
2018-01-17 20:12                         ` Eli Zaretskii
2018-01-17 22:19                           ` Dmitry Gutov
2018-01-17 22:28                             ` Dmitry Gutov
2018-01-17 22:02                 ` Tom Tromey
2018-01-17 22:44                   ` Dmitry Gutov
2018-01-17 23:20                     ` Tom Tromey
2018-01-18  0:14                       ` Dmitry Gutov
2018-01-18  1:30                         ` Dmitry Gutov
2018-01-19  1:21                         ` Dmitry Gutov
2018-01-20 22:15                           ` Tom Tromey
2018-01-20 23:57                           ` Tom Tromey
2018-01-21 12:26                             ` Dmitry Gutov
2018-01-30  4:45                               ` Tom Tromey
2018-02-04 23:32                                 ` Dmitry Gutov
2018-01-30  5:05                               ` Tom Tromey
2018-02-04 23:40                                 ` Dmitry Gutov
2018-02-05 17:06                                   ` Eli Zaretskii
2018-02-05 20:10                                     ` Dmitry Gutov
2018-02-06 19:36                                       ` Eli Zaretskii
2018-02-06 20:41                                         ` Dmitry Gutov
2018-02-07  3:26                                           ` Eli Zaretskii
2018-02-07  9:47                                             ` Dmitry Gutov
2018-02-07 21:30                                               ` Tom Tromey
2018-02-09  9:41                                                 ` Dmitry Gutov
2018-02-08 20:31                                               ` John Yates
2018-02-09  0:22                                                 ` Dmitry Gutov
2020-12-08 22:26                       ` Dmitry Gutov
2018-01-17 11:08               ` Dmitry Gutov
2018-01-15  1:50           ` John Yates
2018-01-15  5:42             ` Eli Zaretskii
2018-01-15 15:01               ` Dmitry Gutov
2018-01-15 17:21                 ` Eli Zaretskii
2018-01-15 17:45                   ` Dmitry Gutov
2018-01-15 20:56                     ` Matthias Meulien
2018-01-15 21:44                       ` Dmitry Gutov
2018-01-15 16:33               ` John Yates

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).