From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.devel Subject: Automatic (e)tags generation and incremental updates Date: Mon, 14 Dec 2020 05:36:32 +0200 Message-ID: <779a6328-9ca5-202a-25a2-b270c66fe6dd@yandex.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------BE9DAFC3C4C3E29286721824" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5374"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 Cc: "Philip K." , Tom Tromey , John Yates To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 14 04:37:39 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1koeg7-0001JL-97 for ged-emacs-devel@m.gmane-mx.org; Mon, 14 Dec 2020 04:37:39 +0100 Original-Received: from localhost ([::1]:49746 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1koeg6-00051P-BZ for ged-emacs-devel@m.gmane-mx.org; Sun, 13 Dec 2020 22:37:38 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:59738) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1koefB-0004bo-72 for emacs-devel@gnu.org; Sun, 13 Dec 2020 22:36:41 -0500 Original-Received: from mail-wr1-x434.google.com ([2a00:1450:4864:20::434]:40396) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1koef8-0007E5-QT for emacs-devel@gnu.org; Sun, 13 Dec 2020 22:36:40 -0500 Original-Received: by mail-wr1-x434.google.com with SMTP id 91so14957670wrj.7 for ; Sun, 13 Dec 2020 19:36:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:to:cc:from:subject:message-id:date:user-agent:mime-version :content-language; bh=lx20I6I/uz4Z2tZ9sbSUd2e9zcw5pyAzQctGtF8avBs=; b=qZQE2HfW5RPYmcsX7UY4gcCgMrxp0oxpQZ/7D+5l5ttRm8DHl0FtU3k6oEJxE1OLJ0 ONmk4Iwh5OMAiDkhHWAASXn/mp/4sxnDOLAVXLIEIPtZmTJECMywEVhyXUJUZddvhv6d Jlrd485HhoJ87UqOZkRhJZZ/fDtRQFO648JKFjXy7wJnnCh5Ifs+SE6E2X0mNVkJpSxQ y9lDpJ5KWd69JETUaVRjwPAyZ3XDBEQC+gu3rAdoYTWhSxZ3XrsMJlD+l1S0ODk3Yzg/ LAMBCFgliE8DvI68XnFDehH1N5cKx/QFujkoqgCYlGoPtTzvNIHmQQzbLw5HNo2Nxv+2 IkpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:to:cc:from:subject:message-id:date :user-agent:mime-version:content-language; bh=lx20I6I/uz4Z2tZ9sbSUd2e9zcw5pyAzQctGtF8avBs=; b=lKm7CYq3HSAB8uSqL6eE7jIWPgVLNl49/by9vjH+08DVzfqVivOBqIEImCaWzBO8Ww bVtoS/QBqYsuNexdJ+97OfHeCt4vVXTcUUbyLWPfg7l9NVC1UjCb1jyiQZbCYaR37ikW tbBnx6oeWC6g4hYs7LfmDND2UJGM+E0fzg10vGAIEp69LRe4bEhrMCD9mt+fqF3VDOZX P1/uP3Gp6tCFYNAH1vJDpT9IgDPTdv1lMm4hK1eEYnritQsI8/NjRaO6NeXbnPDfwJ1A l/m5iQOVYwWbLMP+TRLCLXqezU/0fQ4No7gg9SZsPbMjOED+gJHHQXZfSxA19HXe0N0i mzDw== X-Gm-Message-State: AOAM531rs3Z9NnlJi84lwl0RtgUwDgVkUeXXgwKp0d5pzNFAX8LjgDFV 4qG4hgf6MazdrOVb5P0v3Uk= X-Google-Smtp-Source: ABdhPJz67SRugw10CfIFRYyLFc43F4KW99ALYliryq4WzD/8CGUwx9MaSXkZdE9ptbV/cboq+hRoOw== X-Received: by 2002:a05:6000:cc:: with SMTP id q12mr24483818wrx.335.1607916996634; Sun, 13 Dec 2020 19:36:36 -0800 (PST) Original-Received: from [192.168.0.5] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id b200sm28792934wmb.10.2020.12.13.19.36.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 13 Dec 2020 19:36:35 -0800 (PST) Content-Language: en-US Received-SPF: pass client-ip=2a00:1450:4864:20::434; envelope-from=raaahh@gmail.com; helo=mail-wr1-x434.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:260784 Archived-At: This is a multi-part message in MIME format. --------------BE9DAFC3C4C3E29286721824 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi all! I went back to an old thread from 2018 and updated the patch. Together with 'project-files' being faster than previously and a few other tweaks, the current state feels surprisingly usable, from small to medium projects, up until tags file sizes where code completion starts giving an unpleasant latency. The code lives in the branch scranch/etags-regen and currently it's just triggered by etags when you try using the etags xref backend without having a tags table visited. I'll also attach a patch against master to this email. This time I took the approach of implementing the meat of incremental updates inside Emacs, instead of relying on an external tool. And it works better when Emacs knows which exact things it needs to update, and which it doesn't need to. E.g., it doesn't need to re-visit the whole 45MB tags file if it just needs a file re-indexed; it directs etags's output to the buffer and then appends it to the file. Completion table updates could be made faster this way too, although I think we'd need some new data structure for them. The main question remains how to update information for files that have been deleted, or edited from outside Emacs (including by 'git checkout'). Two main approaches that I'm thinking of: - On a timer, re-create the list of project files together with their mtimes (for instance, by piping through 'stat -c %Y') and compare with the previous saved list. Given a big enough project, it will create intermittent stalls in the UI, though, which could be unpleasant. But it can be the first approach to be implemented anyway. - filenotify. I have already been warned here that it's unreliable, prone to overflowing due to excessive notifications or file watching limits. The current API doesn't allow to want a directory recursively either (which would be required to know about new files). There is a project walled Watchman, however (https://github.com/facebook/watchman) which I have read good things about, and it must use some sort of file notification API under the covers. Perhaps if someone here is familiar with its architecture, they could advise how to build a better abstraction on top of inotify in Emacs as well. Of course, we can give the users a manual knob as well (in could come in the form of enabling/disabling an associated minor mode), but first we should try to make it work automatically, at least for projects of up to certain size. Another question I'd like to ask is where the maintainers want to see this code: inside etags.el, in a new file near it (etags-regen.el, perhaps?), or just in GNU ELPA? It can be a minor mode, or a value in some user option like proposed in bug#43086. Normally I would go the ELPA route straight away, but this kind of feature (automatic code indexing) is what Emacs sorely needs OOtB, IMHO. It loses here even to Sublime Text 3 released 7 years ago, which is not a very "smart" editor. And "democratizing" etags this way should result in better adoption, bug reports, feature requests, etc. Please give it a try and comment. (Cc'd some folks who went near previous discussions.) --------------BE9DAFC3C4C3E29286721824 Content-Type: text/x-patch; charset=UTF-8; name="etags-regen.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="etags-regen.diff" diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el index 104d889b8b..00723608da 100644 --- a/lisp/progmodes/etags.el +++ b/lisp/progmodes/etags.el @@ -2069,7 +2069,9 @@ etags-xref-find-definitions-tag-order file name, add `tag-partial-file-name-match-p' to the list value.") ;;;###autoload -(defun etags--xref-backend () 'etags) +(defun etags--xref-backend () + (etags--maybe-use-project-tags) + 'etags) (cl-defmethod xref-backend-identifier-at-point ((_backend (eql etags))) (find-tag--default)) @@ -2144,6 +2146,132 @@ xref-location-line (nth 1 tag-info))) +;;; Simple tags generation, with automatic invalidation + +(defvar etags--project-tags-file nil) +(defvar etags--project-tags-root nil) +(defvar etags--project-new-file nil) + +(defvar etags--command (executable-find "etags") + ;; How do we get the correct etags here? + ;; E.g. "~/vc/emacs-master/lib-src/etags" + ;; + ;; ctags's etags requires '-L -' for stdin input. + ;; It also looks broken here (indexes only some of the input files). + ;; + ;; If our etags supported '-L', we could use any version of etags. + ) + +(defun etags--maybe-use-project-tags () + (let (proj) + (when (and etags--project-tags-root + (not (file-in-directory-p default-directory + etags--project-tags-root))) + (etags--project-tags-cleanup)) + (when (and (not (or tags-file-name + tags-table-list)) + (setq proj (project-current))) + (message "Generating new tags table...") + (let ((start (time-to-seconds))) + (etags--project-tags-generate proj) + (message "...done (%.2f s)" (- (time-to-seconds) start))) + ;; Invalidate the scanned tags after any change is written to disk. + (add-hook 'after-save-hook #'etags--project-update-file) + (add-hook 'before-save-hook #'etags--project-mark-as-new) + (visit-tags-table etags--project-tags-file)))) + +(defun etags--project-tags-generate (proj) + (let* ((root (project-root proj)) + (default-directory root) + (files (project-files proj)) + ;; FIXME: List all extensions, or wait for etags fix. + ;; http://lists.gnu.org/archive/html/emacs-devel/2018-01/msg00323.html + (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp" + "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl" + "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada")) + (file-regexp (format "\\.%s\\'" (regexp-opt extensions t)))) + (setq etags--project-tags-file (make-temp-file "emacs-project-tags-") + etags--project-tags-root root) + (with-temp-buffer + (mapc (lambda (f) + (when (string-match-p file-regexp f) + (insert f "\n"))) + files) + (shell-command-on-region + (point-min) (point-max) + (format "%s - -o %s" etags--command etags--project-tags-file) + nil nil "*etags-project-tags-errors*" t)))) + +(defun etags--project-update-file () + ;; TODO: Maybe only do this when Emacs is idle for a bit. + (let ((file-name buffer-file-name) + (tags-file-buf (get-file-buffer etags--project-tags-file)) + pr should-scan) + (save-excursion + (when tags-file-buf + (cond + ((and etags--project-new-file + (kill-local-variable 'etags--project-new-file) + (setq pr (project-current)) + (equal (project-root pr) etags--project-tags-root) + (member file-name (project-files pr))) + (set-buffer tags-file-buf) + (setq should-scan t)) + ((progn (set-buffer tags-file-buf) + (goto-char (point-min)) + (re-search-forward (format "^%s," (regexp-quote file-name)) nil t)) + (let ((start (line-beginning-position))) + (re-search-forward "\f\n" nil 'move) + (let ((inhibit-read-only t) + (save-silently t)) + (delete-region (- start 2) + (if (eobp) + (point) + (- (point) 2))) + (write-region (point-min) (point-max) buffer-file-name nil 'silent) + (set-visited-file-modtime))) + (setq should-scan t)))) + (when should-scan + (goto-char (point-max)) + (let ((inhibit-read-only t) + (current-end (point))) + (call-process + etags--command + nil + '(t "*etags-project-tags-errors*") + nil + file-name + "--append" + "-o" + "-") + ;; XXX: When the project is big (tags file in 10s of megabytes), + ;; this is much faster than revert-buffer. Or even using + ;; write-region without APPEND. + ;; We could also keep TAGS strictly as a buffer, with no + ;; backing on disk. + (write-region current-end (point-max) etags--project-tags-file t)) + (set-visited-file-modtime) + (set-buffer-modified-p nil) + ;; FIXME: Is there a better way to do this? + ;; Completion table is the only remaining place where the + ;; update is not incremental. + (setq-default tags-completion-table nil) + )))) + +(defun etags--project-mark-as-new () + (unless buffer-file-number + (setq-local etags--project-new-file t))) + +(defun etags--project-tags-cleanup () + (when etags--project-tags-file + (delete-file etags--project-tags-file) + (setq tags-file-name nil + tags-table-list nil + etags--project-tags-file nil + etags--project-tags-root nil)) + (remove-hook 'after-save-hook #'etags--project-update-file) + (remove-hook 'before-save-hook #'etags--project-mark-as-new)) + (provide 'etags) ;;; etags.el ends here --------------BE9DAFC3C4C3E29286721824--