unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Automatic (e)tags generation and incremental updates
@ 2020-12-14  3:36 Dmitry Gutov
  2021-01-07  3:46 ` Dmitry Gutov
  0 siblings, 1 reply; 52+ messages in thread
From: Dmitry Gutov @ 2020-12-14  3:36 UTC (permalink / raw)
  To: emacs-devel; +Cc: Philip K., Tom Tromey, John Yates

[-- Attachment #1: Type: text/plain, Size: 3149 bytes --]

Hi all!

I went back to an old thread from 2018 and updated the patch. Together 
with 'project-files' being faster than previously and a few other 
tweaks, the current state feels surprisingly usable, from small to 
medium projects, up until tags file sizes where code completion starts 
giving an unpleasant latency.

The code lives in the branch scranch/etags-regen and currently it's just 
triggered by etags when you try using the etags xref backend without 
having a tags table visited. I'll also attach a patch against master to 
this email.

This time I took the approach of implementing the meat of incremental 
updates inside Emacs, instead of relying on an external tool. And it 
works better when Emacs knows which exact things it needs to update, and 
which it doesn't need to. E.g., it doesn't need to re-visit the whole 
45MB tags file if it just needs a file re-indexed; it directs etags's 
output to the buffer and then appends it to the file. Completion table 
updates could be made faster this way too, although I think we'd need 
some new data structure for them.

The main question remains how to update information for files that have 
been deleted, or edited from outside Emacs (including by 'git 
checkout'). Two main approaches that I'm thinking of:

- On a timer, re-create the list of project files together with their 
mtimes (for instance, by piping through 'stat -c %Y') and compare with 
the previous saved list. Given a big enough project, it will create 
intermittent stalls in the UI, though, which could be unpleasant. But it 
can be the first approach to be implemented anyway.

- filenotify. I have already been warned here that it's unreliable, 
prone to overflowing due to excessive notifications or file watching 
limits. The current API doesn't allow to want a directory recursively 
either (which would be required to know about new files). There is a 
project walled Watchman, however (https://github.com/facebook/watchman) 
which I have read good things about, and it must use some sort of file 
notification API under the covers. Perhaps if someone here is familiar 
with its architecture, they could advise how to build a better 
abstraction on top of inotify in Emacs as well.

Of course, we can give the users a manual knob as well (in could come in 
the form of enabling/disabling an associated minor mode), but first we 
should try to make it work automatically, at least for projects of up to 
certain size.

Another question I'd like to ask is where the maintainers want to see 
this code: inside etags.el, in a new file near it (etags-regen.el, 
perhaps?), or just in GNU ELPA? It can be a minor mode, or a value in 
some user option like proposed in bug#43086.

Normally I would go the ELPA route straight away, but this kind of 
feature (automatic code indexing) is what Emacs sorely needs OOtB, IMHO. 
It loses here even to Sublime Text 3 released 7 years ago, which is not 
a very "smart" editor. And "democratizing" etags this way should result 
in better adoption, bug reports, feature requests, etc.

Please give it a try and comment.

(Cc'd some folks who went near previous discussions.)

[-- Attachment #2: etags-regen.diff --]
[-- Type: text/x-patch, Size: 5920 bytes --]

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 104d889b8b..00723608da 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -2069,7 +2069,9 @@ etags-xref-find-definitions-tag-order
 file name, add `tag-partial-file-name-match-p' to the list value.")
 
 ;;;###autoload
-(defun etags--xref-backend () 'etags)
+(defun etags--xref-backend ()
+  (etags--maybe-use-project-tags)
+  'etags)
 
 (cl-defmethod xref-backend-identifier-at-point ((_backend (eql etags)))
   (find-tag--default))
@@ -2144,6 +2146,132 @@ xref-location-line
     (nth 1 tag-info)))
 
 \f
+;;; Simple tags generation, with automatic invalidation
+
+(defvar etags--project-tags-file nil)
+(defvar etags--project-tags-root nil)
+(defvar etags--project-new-file nil)
+
+(defvar etags--command (executable-find "etags")
+  ;; How do we get the correct etags here?
+  ;; E.g. "~/vc/emacs-master/lib-src/etags"
+  ;;
+  ;; ctags's etags requires '-L -' for stdin input.
+  ;; It also looks broken here (indexes only some of the input files).
+  ;;
+  ;; If our etags supported '-L', we could use any version of etags.
+  )
+
+(defun etags--maybe-use-project-tags ()
+  (let (proj)
+    (when (and etags--project-tags-root
+               (not (file-in-directory-p default-directory
+                                         etags--project-tags-root)))
+      (etags--project-tags-cleanup))
+    (when (and (not (or tags-file-name
+                        tags-table-list))
+               (setq proj (project-current)))
+      (message "Generating new tags table...")
+      (let ((start (time-to-seconds)))
+        (etags--project-tags-generate proj)
+        (message "...done (%.2f s)" (- (time-to-seconds) start)))
+      ;; Invalidate the scanned tags after any change is written to disk.
+      (add-hook 'after-save-hook #'etags--project-update-file)
+      (add-hook 'before-save-hook #'etags--project-mark-as-new)
+      (visit-tags-table etags--project-tags-file))))
+
+(defun etags--project-tags-generate (proj)
+  (let* ((root (project-root proj))
+         (default-directory root)
+         (files (project-files proj))
+         ;; FIXME: List all extensions, or wait for etags fix.
+         ;; http://lists.gnu.org/archive/html/emacs-devel/2018-01/msg00323.html
+         (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp"
+                       "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl"
+                       "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada"))
+         (file-regexp (format "\\.%s\\'" (regexp-opt extensions t))))
+    (setq etags--project-tags-file (make-temp-file "emacs-project-tags-")
+          etags--project-tags-root root)
+    (with-temp-buffer
+      (mapc (lambda (f)
+              (when (string-match-p file-regexp f)
+                (insert f "\n")))
+            files)
+      (shell-command-on-region
+       (point-min) (point-max)
+       (format "%s - -o %s" etags--command etags--project-tags-file)
+       nil nil "*etags-project-tags-errors*" t))))
+
+(defun etags--project-update-file ()
+  ;; TODO: Maybe only do this when Emacs is idle for a bit.
+  (let ((file-name buffer-file-name)
+        (tags-file-buf (get-file-buffer etags--project-tags-file))
+        pr should-scan)
+    (save-excursion
+      (when tags-file-buf
+        (cond
+         ((and etags--project-new-file
+               (kill-local-variable 'etags--project-new-file)
+               (setq pr (project-current))
+               (equal (project-root pr) etags--project-tags-root)
+               (member file-name (project-files pr)))
+          (set-buffer tags-file-buf)
+          (setq should-scan t))
+         ((progn (set-buffer tags-file-buf)
+                 (goto-char (point-min))
+                 (re-search-forward (format "^%s," (regexp-quote file-name)) nil t))
+          (let ((start (line-beginning-position)))
+            (re-search-forward "\f\n" nil 'move)
+            (let ((inhibit-read-only t)
+                  (save-silently t))
+              (delete-region (- start 2)
+                             (if (eobp)
+                                 (point)
+                               (- (point) 2)))
+              (write-region (point-min) (point-max) buffer-file-name nil 'silent)
+              (set-visited-file-modtime)))
+          (setq should-scan t))))
+      (when should-scan
+        (goto-char (point-max))
+        (let ((inhibit-read-only t)
+              (current-end (point)))
+          (call-process
+           etags--command
+           nil
+           '(t "*etags-project-tags-errors*")
+           nil
+           file-name
+           "--append"
+           "-o"
+           "-")
+          ;; XXX: When the project is big (tags file in 10s of megabytes),
+          ;; this is much faster than revert-buffer.  Or even using
+          ;; write-region without APPEND.
+          ;; We could also keep TAGS strictly as a buffer, with no
+          ;; backing on disk.
+          (write-region current-end (point-max) etags--project-tags-file t))
+        (set-visited-file-modtime)
+        (set-buffer-modified-p nil)
+        ;; FIXME: Is there a better way to do this?
+        ;; Completion table is the only remaining place where the
+        ;; update is not incremental.
+        (setq-default tags-completion-table nil)
+        ))))
+
+(defun etags--project-mark-as-new ()
+  (unless buffer-file-number
+    (setq-local etags--project-new-file t)))
+
+(defun etags--project-tags-cleanup ()
+  (when etags--project-tags-file
+    (delete-file etags--project-tags-file)
+    (setq tags-file-name nil
+          tags-table-list nil
+          etags--project-tags-file nil
+          etags--project-tags-root nil))
+  (remove-hook 'after-save-hook #'etags--project-update-file)
+  (remove-hook 'before-save-hook #'etags--project-mark-as-new))
+
 (provide 'etags)
 
 ;;; etags.el ends here

^ permalink raw reply related	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-02-23  1:15 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14  3:36 Automatic (e)tags generation and incremental updates Dmitry Gutov
2021-01-07  3:46 ` Dmitry Gutov
2021-01-07 14:15   ` Eli Zaretskii
2021-01-07 15:56     ` Dmitry Gutov
2021-01-07 16:17       ` Stefan Kangas
2021-01-09 21:49       ` Tom Tromey
2021-01-10 13:53         ` Dmitry Gutov
2021-01-10 16:56           ` Tom Tromey
2021-01-10 19:39             ` Tom Tromey
2021-01-10 23:09               ` Dmitry Gutov
2021-01-10 23:36             ` Dmitry Gutov
2021-01-10 23:50               ` Dmitry Gutov
2021-01-11 14:56                 ` Eli Zaretskii
2021-01-12  1:33                   ` Dmitry Gutov
2021-01-12  4:21                     ` Stefan Monnier
2021-01-12 16:59                       ` Dmitry Gutov
2021-01-12 17:24                         ` Stefan Monnier
2021-01-12 15:08                     ` Eli Zaretskii
2021-01-12 16:48                       ` Dmitry Gutov
2021-01-12 17:15                         ` Eli Zaretskii
2021-01-12 17:32                           ` Dmitry Gutov
2021-01-12 17:55                             ` Eli Zaretskii
2021-01-12 22:26                               ` Dmitry Gutov
2021-01-13 15:01                                 ` Eli Zaretskii
2021-01-13 15:52                                   ` Dmitry Gutov
2021-01-13 15:58                                     ` Eli Zaretskii
2021-01-16  3:57                                       ` Dmitry Gutov
2021-01-16  7:34                                         ` Eli Zaretskii
2021-01-10 16:49         ` Eli Zaretskii
2021-01-10 16:58           ` Tom Tromey
2021-01-10 17:56           ` Dmitry Gutov
2021-01-10 18:14             ` Eli Zaretskii
2021-01-10 23:13               ` Dmitry Gutov
2021-01-11 14:53                 ` Eli Zaretskii
2021-01-12  1:49                   ` Dmitry Gutov
2021-01-12 15:09                     ` Eli Zaretskii
2021-02-18 23:26       ` Dmitry Gutov
2021-02-19  8:33         ` Eli Zaretskii
2021-02-19 14:35           ` Dmitry Gutov
2021-02-19 15:44             ` Eli Zaretskii
2021-02-20  1:35               ` Dmitry Gutov
2021-02-20  7:30                 ` Eli Zaretskii
2021-02-20 20:27                   ` Dmitry Gutov
2021-02-20 20:41                     ` Eli Zaretskii
2021-02-20 21:05                       ` Dmitry Gutov
2021-02-20 21:14                       ` Dmitry Gutov
2021-02-21 19:53                         ` Eli Zaretskii
2021-02-21 20:39                           ` Dmitry Gutov
2021-02-22 16:08                             ` Eli Zaretskii
2021-02-22 19:25                               ` Dmitry Gutov
2021-02-22 19:33                                 ` Eli Zaretskii
2021-02-23  1:15                                   ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).