unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / Atom feed
* Automatic (e)tags generation and incremental updates
@ 2020-12-14  3:36 Dmitry Gutov
  2021-01-07  3:46 ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2020-12-14  3:36 UTC (permalink / raw)
  To: emacs-devel; +Cc: Philip K., Tom Tromey, John Yates

[-- Attachment #1: Type: text/plain, Size: 3149 bytes --]

Hi all!

I went back to an old thread from 2018 and updated the patch. Together 
with 'project-files' being faster than previously and a few other 
tweaks, the current state feels surprisingly usable, from small to 
medium projects, up until tags file sizes where code completion starts 
giving an unpleasant latency.

The code lives in the branch scranch/etags-regen and currently it's just 
triggered by etags when you try using the etags xref backend without 
having a tags table visited. I'll also attach a patch against master to 
this email.

This time I took the approach of implementing the meat of incremental 
updates inside Emacs, instead of relying on an external tool. And it 
works better when Emacs knows which exact things it needs to update, and 
which it doesn't need to. E.g., it doesn't need to re-visit the whole 
45MB tags file if it just needs a file re-indexed; it directs etags's 
output to the buffer and then appends it to the file. Completion table 
updates could be made faster this way too, although I think we'd need 
some new data structure for them.

The main question remains how to update information for files that have 
been deleted, or edited from outside Emacs (including by 'git 
checkout'). Two main approaches that I'm thinking of:

- On a timer, re-create the list of project files together with their 
mtimes (for instance, by piping through 'stat -c %Y') and compare with 
the previous saved list. Given a big enough project, it will create 
intermittent stalls in the UI, though, which could be unpleasant. But it 
can be the first approach to be implemented anyway.

- filenotify. I have already been warned here that it's unreliable, 
prone to overflowing due to excessive notifications or file watching 
limits. The current API doesn't allow to want a directory recursively 
either (which would be required to know about new files). There is a 
project walled Watchman, however (https://github.com/facebook/watchman) 
which I have read good things about, and it must use some sort of file 
notification API under the covers. Perhaps if someone here is familiar 
with its architecture, they could advise how to build a better 
abstraction on top of inotify in Emacs as well.

Of course, we can give the users a manual knob as well (in could come in 
the form of enabling/disabling an associated minor mode), but first we 
should try to make it work automatically, at least for projects of up to 
certain size.

Another question I'd like to ask is where the maintainers want to see 
this code: inside etags.el, in a new file near it (etags-regen.el, 
perhaps?), or just in GNU ELPA? It can be a minor mode, or a value in 
some user option like proposed in bug#43086.

Normally I would go the ELPA route straight away, but this kind of 
feature (automatic code indexing) is what Emacs sorely needs OOtB, IMHO. 
It loses here even to Sublime Text 3 released 7 years ago, which is not 
a very "smart" editor. And "democratizing" etags this way should result 
in better adoption, bug reports, feature requests, etc.

Please give it a try and comment.

(Cc'd some folks who went near previous discussions.)

[-- Attachment #2: etags-regen.diff --]
[-- Type: text/x-patch, Size: 5920 bytes --]

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 104d889b8b..00723608da 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -2069,7 +2069,9 @@ etags-xref-find-definitions-tag-order
 file name, add `tag-partial-file-name-match-p' to the list value.")
 
 ;;;###autoload
-(defun etags--xref-backend () 'etags)
+(defun etags--xref-backend ()
+  (etags--maybe-use-project-tags)
+  'etags)
 
 (cl-defmethod xref-backend-identifier-at-point ((_backend (eql etags)))
   (find-tag--default))
@@ -2144,6 +2146,132 @@ xref-location-line
     (nth 1 tag-info)))
 
 \f
+;;; Simple tags generation, with automatic invalidation
+
+(defvar etags--project-tags-file nil)
+(defvar etags--project-tags-root nil)
+(defvar etags--project-new-file nil)
+
+(defvar etags--command (executable-find "etags")
+  ;; How do we get the correct etags here?
+  ;; E.g. "~/vc/emacs-master/lib-src/etags"
+  ;;
+  ;; ctags's etags requires '-L -' for stdin input.
+  ;; It also looks broken here (indexes only some of the input files).
+  ;;
+  ;; If our etags supported '-L', we could use any version of etags.
+  )
+
+(defun etags--maybe-use-project-tags ()
+  (let (proj)
+    (when (and etags--project-tags-root
+               (not (file-in-directory-p default-directory
+                                         etags--project-tags-root)))
+      (etags--project-tags-cleanup))
+    (when (and (not (or tags-file-name
+                        tags-table-list))
+               (setq proj (project-current)))
+      (message "Generating new tags table...")
+      (let ((start (time-to-seconds)))
+        (etags--project-tags-generate proj)
+        (message "...done (%.2f s)" (- (time-to-seconds) start)))
+      ;; Invalidate the scanned tags after any change is written to disk.
+      (add-hook 'after-save-hook #'etags--project-update-file)
+      (add-hook 'before-save-hook #'etags--project-mark-as-new)
+      (visit-tags-table etags--project-tags-file))))
+
+(defun etags--project-tags-generate (proj)
+  (let* ((root (project-root proj))
+         (default-directory root)
+         (files (project-files proj))
+         ;; FIXME: List all extensions, or wait for etags fix.
+         ;; http://lists.gnu.org/archive/html/emacs-devel/2018-01/msg00323.html
+         (extensions '("rb" "js" "py" "pl" "el" "c" "cpp" "cc" "h" "hh" "hpp"
+                       "java" "go" "cl" "lisp" "prolog" "php" "erl" "hrl"
+                       "F" "f" "f90" "for" "cs" "a" "asm" "ads" "adb" "ada"))
+         (file-regexp (format "\\.%s\\'" (regexp-opt extensions t))))
+    (setq etags--project-tags-file (make-temp-file "emacs-project-tags-")
+          etags--project-tags-root root)
+    (with-temp-buffer
+      (mapc (lambda (f)
+              (when (string-match-p file-regexp f)
+                (insert f "\n")))
+            files)
+      (shell-command-on-region
+       (point-min) (point-max)
+       (format "%s - -o %s" etags--command etags--project-tags-file)
+       nil nil "*etags-project-tags-errors*" t))))
+
+(defun etags--project-update-file ()
+  ;; TODO: Maybe only do this when Emacs is idle for a bit.
+  (let ((file-name buffer-file-name)
+        (tags-file-buf (get-file-buffer etags--project-tags-file))
+        pr should-scan)
+    (save-excursion
+      (when tags-file-buf
+        (cond
+         ((and etags--project-new-file
+               (kill-local-variable 'etags--project-new-file)
+               (setq pr (project-current))
+               (equal (project-root pr) etags--project-tags-root)
+               (member file-name (project-files pr)))
+          (set-buffer tags-file-buf)
+          (setq should-scan t))
+         ((progn (set-buffer tags-file-buf)
+                 (goto-char (point-min))
+                 (re-search-forward (format "^%s," (regexp-quote file-name)) nil t))
+          (let ((start (line-beginning-position)))
+            (re-search-forward "\f\n" nil 'move)
+            (let ((inhibit-read-only t)
+                  (save-silently t))
+              (delete-region (- start 2)
+                             (if (eobp)
+                                 (point)
+                               (- (point) 2)))
+              (write-region (point-min) (point-max) buffer-file-name nil 'silent)
+              (set-visited-file-modtime)))
+          (setq should-scan t))))
+      (when should-scan
+        (goto-char (point-max))
+        (let ((inhibit-read-only t)
+              (current-end (point)))
+          (call-process
+           etags--command
+           nil
+           '(t "*etags-project-tags-errors*")
+           nil
+           file-name
+           "--append"
+           "-o"
+           "-")
+          ;; XXX: When the project is big (tags file in 10s of megabytes),
+          ;; this is much faster than revert-buffer.  Or even using
+          ;; write-region without APPEND.
+          ;; We could also keep TAGS strictly as a buffer, with no
+          ;; backing on disk.
+          (write-region current-end (point-max) etags--project-tags-file t))
+        (set-visited-file-modtime)
+        (set-buffer-modified-p nil)
+        ;; FIXME: Is there a better way to do this?
+        ;; Completion table is the only remaining place where the
+        ;; update is not incremental.
+        (setq-default tags-completion-table nil)
+        ))))
+
+(defun etags--project-mark-as-new ()
+  (unless buffer-file-number
+    (setq-local etags--project-new-file t)))
+
+(defun etags--project-tags-cleanup ()
+  (when etags--project-tags-file
+    (delete-file etags--project-tags-file)
+    (setq tags-file-name nil
+          tags-table-list nil
+          etags--project-tags-file nil
+          etags--project-tags-root nil))
+  (remove-hook 'after-save-hook #'etags--project-update-file)
+  (remove-hook 'before-save-hook #'etags--project-mark-as-new))
+
 (provide 'etags)
 
 ;;; etags.el ends here

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2020-12-14  3:36 Automatic (e)tags generation and incremental updates Dmitry Gutov
@ 2021-01-07  3:46 ` Dmitry Gutov
  2021-01-07 14:15   ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-07  3:46 UTC (permalink / raw)
  To: emacs-devel; +Cc: Philip K., Tom Tromey, John Yates

On 14.12.2020 05:36, Dmitry Gutov wrote:
> Another question I'd like to ask is where the maintainers want to see 
> this code: inside etags.el, in a new file near it (etags-regen.el, 
> perhaps?), or just in GNU ELPA? It can be a minor mode, or a value in 
> some user option like proposed in bug#43086.

In the absence of any feedback, I've moved it to a separate file, with 
its own minor mode.

Also added the user option to allow specifying additional regexps for 
parsed languages, and set up the appropriate value in Emacs's 
.dir-locals.el to make 'make tags' unnecessary.

Check it out.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-07  3:46 ` Dmitry Gutov
@ 2021-01-07 14:15   ` Eli Zaretskii
  2021-01-07 15:56     ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-07 14:15 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, john, emacs-devel

> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Thu, 7 Jan 2021 05:46:31 +0200
> Cc: "Philip K." <philipk@posteo.net>, Tom Tromey <tom@tromey.com>,
>  John Yates <john@yates-sheets.org>
> 
> Check it out.

Thanks for working on this branch.

However, it is hard to check it out without investing an inordinate
amount of time, without having some guidance: what features are added,
how they work, and what are the relevant commands to invoke them and
variables to control them.  Would it be possible to post a short
summary of that?  The changes are large, and when I looked at them, it
was hard for me to figure out what exactly are we gaining and how to
test that.

TIA



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-07 14:15   ` Eli Zaretskii
@ 2021-01-07 15:56     ` Dmitry Gutov
  2021-01-07 16:17       ` Stefan Kangas
  2021-01-09 21:49       ` Tom Tromey
  0 siblings, 2 replies; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-07 15:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, john, emacs-devel

Hi Eli,

On 07.01.2021 16:15, Eli Zaretskii wrote:
> Thanks for working on this branch.
> 
> However, it is hard to check it out without investing an inordinate
> amount of time, without having some guidance: what features are added,
> how they work, and what are the relevant commands to invoke them and
> variables to control them.  Would it be possible to post a short
> summary of that?  The changes are large, and when I looked at them, it
> was hard for me to figure out what exactly are we gaining and how to
> test that.

The changes are now fairly small and non-invasive, you can review the 
diff like this: git diff master...scratch/etags-regen

They include:

- New file and new minor mode etags-regen-mode with 3 defcustom's (see 
the docstrings).

- A change to .dir-locals.el which sets the value of one of those 
options, so that the resulting tags table for the Emacs project itself 
looks close enough to what 'make tags' generates.

- A change to lib-src/etags.c which implements handing of '-L' flag, for 
compatibility with ctags.

To try it out:

- Start Emacs built from this branch and open, say, src/xdisp.c.

- Call 'M-x etags-regen-mode' to enable it.

- DO NOT call 'M-x visit-tags-table'.

- Call some xref command like 'M-.'. See message "Generating new tags 
table..." (it's saved in /tmp/...), then see navigation happen.

- Pressing C-M-i instead should also trigger tags table generation.

- Rename some function, try navigating to the new name. That should 
work. Navigation to the previous name should fail (unless there are 
other definitions for that name).

What doesn't work:

- When you switch between projects, the previously generated tags tables 
are discarded. It's not too hard to improve, but that would involve some 
choices/tradeoffs.

- When files are deleted, or otherwise changed outside of Emacs (perhaps 
with 'git checkout'), nothing is updated. I have a few new ideas, 
haven't started on them yet. Workaround: toggle etags-regen-mode off and 
on, which will result in full rescan when you use 'M-.'.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-07 15:56     ` Dmitry Gutov
@ 2021-01-07 16:17       ` Stefan Kangas
  2021-01-09 21:49       ` Tom Tromey
  1 sibling, 0 replies; 36+ messages in thread
From: Stefan Kangas @ 2021-01-07 16:17 UTC (permalink / raw)
  To: Dmitry Gutov, Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

Dmitry Gutov <dgutov@yandex.ru> writes:

> - Start Emacs built from this branch and open, say, src/xdisp.c.
>
> - Call 'M-x etags-regen-mode' to enable it.
>
> - DO NOT call 'M-x visit-tags-table'.
>
> - Call some xref command like 'M-.'. See message "Generating new tags
> table..." (it's saved in /tmp/...), then see navigation happen.
>
> - Pressing C-M-i instead should also trigger tags table generation.
>
> - Rename some function, try navigating to the new name. That should
> work. Navigation to the previous name should fail (unless there are
> other definitions for that name).

Sounds really promising.  Thanks for working on this.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-07 15:56     ` Dmitry Gutov
  2021-01-07 16:17       ` Stefan Kangas
@ 2021-01-09 21:49       ` Tom Tromey
  2021-01-10 13:53         ` Dmitry Gutov
  2021-01-10 16:49         ` Eli Zaretskii
  1 sibling, 2 replies; 36+ messages in thread
From: Tom Tromey @ 2021-01-09 21:49 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, tom, philipk, john, emacs-devel

Thank you for CCing me on this.

Dmitry> - Call some xref command like 'M-.'. See message "Generating new tags
Dmitry>   table..." (it's saved in /tmp/...), then see navigation happen.

On "xdg"-ish systems, it would make sense to use the XDG cache directory
here.

Dmitry> - Pressing C-M-i instead should also trigger tags table generation.

Dmitry> - When you switch between projects, the previously generated tags
Dmitry>   tables are discarded. It's not too hard to improve, but that would
Dmitry>  involve some choices/tradeoffs.

What are the tradeoffs?

I tend to think that conceptually each buffer should point to its
corresponding tags table.  Then some separate logic could be used to
decide when to kill some tag file buffer.

Dmitry> - When files are deleted, or otherwise changed outside of Emacs
Dmitry>   (perhaps with 'git checkout'), nothing is updated. I have a few new
Dmitry>  ideas, haven't started on them yet. Workaround: toggle
Dmitry> etags-regen-mode off and on, which will result in full rescan when you
Dmitry> use 'M-.'.

It seems to me that the default ought to be to update the tags table on M-.
One nice way to do this would be to run etags in the background, so that
the work of updating would be done in parallel with the user typing,
since presumably you'd want to ensure that etags has finished before
jumping to the result (or fulfilling a completion request).

Tom



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-09 21:49       ` Tom Tromey
@ 2021-01-10 13:53         ` Dmitry Gutov
  2021-01-10 16:56           ` Tom Tromey
  2021-01-10 16:49         ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 13:53 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, john, philipk, emacs-devel

Hi Tom,

On 09.01.2021 23:49, Tom Tromey wrote:

> Dmitry> - Call some xref command like 'M-.'. See message "Generating new tags
> Dmitry>   table..." (it's saved in /tmp/...), then see navigation happen.
> 
> On "xdg"-ish systems, it would make sense to use the XDG cache directory
> here.

I should look into this later. If we end up using putting TAGS files 
into prject directories (see below), that will be moot.

> Dmitry> - Pressing C-M-i instead should also trigger tags table generation.
> 
> Dmitry> - When you switch between projects, the previously generated tags
> Dmitry>   tables are discarded. It's not too hard to improve, but that would
> Dmitry>  involve some choices/tradeoffs.
> 
> What are the tradeoffs?

There are several questions.

- Do we want to store the generated files openly in the root directories 
of each project? I.e. save them as TAGS. That might look more 
familiar/comfortable by the old-timers, and some users might even 
pre-generate such files, if the process takes a long time. And the 
contents will be able to reliably survive for a long time. Storing them 
with "garbled" names somewhere in /tmp of XDG cache risks having to 
fully renenerate the indexes at least every time the machine reboots. 
The downside to creating TAGS files is it's unfamiliar to newcomers who 
usually expect index cache to be hidden, and they'll have to either 
update .gitignore or risk checking them in.

I don't see a lot of the old timers in this discussion now, and most 
existing users of etags are likely satisfied with the current workflow, 
so perhaps that kind of familiarity is not important. So it probably 
comes down to being able to generate such files only once per project, 
and only update them later. I'm not such what size of project that will 
become a significant advantage at, but it's likely that at that point 
etags.el's other performance limitation will come into play. We'll need 
some real feedback on that. Until then, the generated files will stay in 
/tmp. Might even keep them off disk entirely, actually, though that 
would require some changes to etags.el (help welcome).

- Do we keep such file in memory every time after the user has switched 
to a different project (and, say, maintain a {project -> file} alist in 
memory), or close and reopen upon switching. If the files are stored on 
disk, implementing the latter is plainly easier with the current 
etags.el code. Is eliminating the delay worth the code complexity and 
increased memory usage?

- Being able to pick up an old TAGS depends on our ability to compare 
the current project contents against a list of files and one timestamp 
(TAGS modification datetime) quickly, much quicker than simply 
regenerating such file would take. And if we can't, there's no point in 
keeping them around.

> I tend to think that conceptually each buffer should point to its
> corresponding tags table.  Then some separate logic could be used to
> decide when to kill some tag file buffer.

Ideally, perhaps, etags.el would provide an interface for polling a 
specific tags table (for completions or locations) by simply binding one 
or two local variables. At the moment, though, the route there seems to 
be through file-local variables and through calling visit-tags-table 
with non-nil second argument.

> Dmitry> - When files are deleted, or otherwise changed outside of Emacs
> Dmitry>   (perhaps with 'git checkout'), nothing is updated. I have a few new
> Dmitry>  ideas, haven't started on them yet. Workaround: toggle
> Dmitry> etags-regen-mode off and on, which will result in full rescan when you
> Dmitry> use 'M-.'.
> 
> It seems to me that the default ought to be to update the tags table on M-.
> One nice way to do this would be to run etags in the background, so that
> the work of updating would be done in parallel with the user typing,
> since presumably you'd want to ensure that etags has finished before
> jumping to the result (or fulfilling a completion request).

It sounds clever, but UUIC that would only benefit users who call M-. 
with C-u or who have customized xref-prompt-for-identifier to t. Even 
among those, it would only be able to help (without sacrificing 
correctness) only those who don't use something like icomplete-mode for 
tag input. Because the completion table already depends on the tags 
index (which is out of date). On the flip side, even when the tags 
generation is synchronous, you can start typing right away.

If general, doing updates when Emacs is idle and/or asynchronously are 
quality-of-life changes that can come later after we improve correctness 
(i.e. make sure the index is up to date even after external changes).

Doing that will require some more processing implemented in Elisp which 
can still create annoying stutters in Emacs, whether the process calls 
are asynchronous or not. Debugging those and working on algorithmic 
complexity there is easier when work happens at predictable points in 
time. Asynchronous calls also make error handling more difficult (and 
the current Emacs threads -- even more so).



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-09 21:49       ` Tom Tromey
  2021-01-10 13:53         ` Dmitry Gutov
@ 2021-01-10 16:49         ` Eli Zaretskii
  2021-01-10 16:58           ` Tom Tromey
  2021-01-10 17:56           ` Dmitry Gutov
  1 sibling, 2 replies; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-10 16:49 UTC (permalink / raw)
  To: Tom Tromey; +Cc: philipk, emacs-devel, john, dgutov

> From: Tom Tromey <tom@tromey.com>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org,  philipk@posteo.net,
>   tom@tromey.com,  john@yates-sheets.org
> Date: Sat, 09 Jan 2021 14:49:36 -0700
> 
> I tend to think that conceptually each buffer should point to its
> corresponding tags table.

What about buffers that don't belong to any project?  M-. generally
works (or should work) in any buffer.

> It seems to me that the default ought to be to update the tags table on M-.
> One nice way to do this would be to run etags in the background, so that
> the work of updating would be done in parallel with the user typing,
> since presumably you'd want to ensure that etags has finished before
> jumping to the result (or fulfilling a completion request).

The common use case is that you place point on a symbol and type M-.,
in which case you don't need (and don't expect) to type anything else.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 13:53         ` Dmitry Gutov
@ 2021-01-10 16:56           ` Tom Tromey
  2021-01-10 19:39             ` Tom Tromey
  2021-01-10 23:36             ` Dmitry Gutov
  0 siblings, 2 replies; 36+ messages in thread
From: Tom Tromey @ 2021-01-10 16:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, Tom Tromey, philipk, john, emacs-devel

Dmitry> - Do we want to store the generated files openly in the root
Dmitry>   directories of each project?

Yeah.  Either way is fine by me, since the projects I work on already
have TAGS in their .gitignore.

Dmitry> Storing them with "garbled" names somewhere in /tmp of XDG cache
Dmitry> risks having to fully renenerate the indexes at least every time
Dmitry> the machine reboots.

Does anyone really configure their system this way?  XDG warns about
relying on the cache, but in practice I think that is a warning for
developers -- the model being that the user should be able to delete the
cache at any time.  At least on the systems I've used, the cache is
persistent in practice.

Dmitry> Is eliminating the delay worth the
Dmitry> code complexity and increased memory usage?

It should certainly be an option.  Memory is just no barrier on typical
machines these days.

Dmitry> If general, doing updates when Emacs is idle and/or asynchronously are
Dmitry> quality-of-life changes that can come later after we improve
Dmitry> correctness (i.e. make sure the index is up to date even after
Dmitry> external changes).

Please try it on a non-trivial project before committing to something.
IME a lot of these things work fine for simple projects, but when I want
to use them on gdb or gcc, they are unusably slow.

Anyway, thanks for working on this.  Lack of incremental update is why I
stopped using etags, in favor of ggtags.  I hope someday to switch back,
since etags has some features that GNU Global lacks.

Tom



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 16:49         ` Eli Zaretskii
@ 2021-01-10 16:58           ` Tom Tromey
  2021-01-10 17:56           ` Dmitry Gutov
  1 sibling, 0 replies; 36+ messages in thread
From: Tom Tromey @ 2021-01-10 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, Tom Tromey, emacs-devel, john, dgutov

>> I tend to think that conceptually each buffer should point to its
>> corresponding tags table.

Eli> What about buffers that don't belong to any project?  M-. generally
Eli> works (or should work) in any buffer.

I basically never use it that way, but instead just from buffers that
have some associated project (normally a source file, but also sometimes
vc-dir).

Tom



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 16:49         ` Eli Zaretskii
  2021-01-10 16:58           ` Tom Tromey
@ 2021-01-10 17:56           ` Dmitry Gutov
  2021-01-10 18:14             ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 17:56 UTC (permalink / raw)
  To: Eli Zaretskii, Tom Tromey; +Cc: philipk, john, emacs-devel

On 10.01.2021 18:49, Eli Zaretskii wrote:

>> I tend to think that conceptually each buffer should point to its
>> corresponding tags table.
> 
> What about buffers that don't belong to any project?  M-. generally
> works (or should work) in any buffer.

That's a different workflow, one that we support already.

I don't know how to automate it better. If the buffer doesn't belong to 
any project, which tags table would we use for it? We'll have to prompt 
the user, and that's what we do already by default.

Some kind of automation could be made if, instead of using 
project-current straight away, etags-regen could add some 
defcustom/hooks/etc which could return some value that implements a 
generalized notion of a project for etags-regen's specific purpose, with 
file list, the "where to put TAGS" directory and inclusion logic. Which 
we would then use.

I don't know how many users would go to all that trouble, however, when 
they could just go on with using 'M-x visit-tags-table'.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 17:56           ` Dmitry Gutov
@ 2021-01-10 18:14             ` Eli Zaretskii
  2021-01-10 23:13               ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-10 18:14 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, john, emacs-devel

> Cc: emacs-devel@gnu.org, philipk@posteo.net, john@yates-sheets.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sun, 10 Jan 2021 19:56:34 +0200
> 
> I don't know how to automate it better. If the buffer doesn't belong to 
> any project, which tags table would we use for it? We'll have to prompt 
> the user, and that's what we do already by default.

I thought this prompt will not be supported anymore.  If we are
keeping it as fallback when the current buffer doesn't specify which
tags table to use, then I'm okay with that.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 16:56           ` Tom Tromey
@ 2021-01-10 19:39             ` Tom Tromey
  2021-01-10 23:09               ` Dmitry Gutov
  2021-01-10 23:36             ` Dmitry Gutov
  1 sibling, 1 reply; 36+ messages in thread
From: Tom Tromey @ 2021-01-10 19:39 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel, philipk, john, Dmitry Gutov

Tom> Please try it on a non-trivial project before committing to something.
Tom> IME a lot of these things work fine for simple projects, but when I want
Tom> to use them on gdb or gcc, they are unusably slow.

I built this branch & tried it today.  I don't know if this machine is
just super fast or what, but it seems to perform well enough to use on
gdb.  Thank you again for working on this.

Tom



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 19:39             ` Tom Tromey
@ 2021-01-10 23:09               ` Dmitry Gutov
  0 siblings, 0 replies; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 23:09 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, john, philipk, emacs-devel

On 10.01.2021 21:39, Tom Tromey wrote:
> Tom> Please try it on a non-trivial project before committing to something.
> Tom> IME a lot of these things work fine for simple projects, but when I want
> Tom> to use them on gdb or gcc, they are unusably slow.
> 
> I built this branch & tried it today.  I don't know if this machine is
> just super fast or what, but it seems to perform well enough to use on
> gdb.  Thank you again for working on this.

I'm glad.

How big is the project?

What do

   (benchmark 1 '(project-files (project-current)))

   and

   (length (project-files (project-current)))

say?

I take it, both the initial generation is fast enough, and, of course 
after-save-hook? I optimized the latter on mozilla-central.

That just leaves the periodic updates logic as the main remaining big 
hurdle.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 18:14             ` Eli Zaretskii
@ 2021-01-10 23:13               ` Dmitry Gutov
  2021-01-11 14:53                 ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 23:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, john, emacs-devel

On 10.01.2021 20:14, Eli Zaretskii wrote:
>> I don't know how to automate it better. If the buffer doesn't belong to
>> any project, which tags table would we use for it? We'll have to prompt
>> the user, and that's what we do already by default.
> I thought this prompt will not be supported anymore.  If we are
> keeping it as fallback when the current buffer doesn't specify which
> tags table to use, then I'm okay with that.

For now it's clearly supported with the default value of 
etags-regen-mode (which is 'off'). Or if you 'M-x visit-tags-table' 
before having one auto-generated.

Having them work together would be better, I guess, but what should we 
do exactly?

Would we prompt for the tags table just once upon encountering a 
project-less buffer, and then use it for all such buffers in the current 
session?



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 16:56           ` Tom Tromey
  2021-01-10 19:39             ` Tom Tromey
@ 2021-01-10 23:36             ` Dmitry Gutov
  2021-01-10 23:50               ` Dmitry Gutov
  1 sibling, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 23:36 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, john, philipk, emacs-devel

On 10.01.2021 18:56, Tom Tromey wrote:
> Dmitry> - Do we want to store the generated files openly in the root
> Dmitry>   directories of each project?
> 
> Yeah.  Either way is fine by me, since the projects I work on already
> have TAGS in their .gitignore.

One big advantage of this project for me personally would be having 
users that never got into etags at all (considering it too complicated, 
for one reason or other), having navigation work automatically OOtB, in 
multiple languages. Or with a minor mode enabled, at least.

> Dmitry> Storing them with "garbled" names somewhere in /tmp of XDG cache
> Dmitry> risks having to fully renenerate the indexes at least every time
> Dmitry> the machine reboots.
> 
> Does anyone really configure their system this way?  XDG warns about
> relying on the cache, but in practice I think that is a warning for
> developers -- the model being that the user should be able to delete the
> cache at any time.  At least on the systems I've used, the cache is
> persistent in practice.

I have no idea, actually. Just figured that something like "cache" 
should be emptied out periodically.

If it usually lives on, we could do away with having a {project -> tags 
file} alist in memory, serializing it on disk when Emacs is killed.

> Dmitry> If general, doing updates when Emacs is idle and/or asynchronously are
> Dmitry> quality-of-life changes that can come later after we improve
> Dmitry> correctness (i.e. make sure the index is up to date even after
> Dmitry> external changes).
> 
> Please try it on a non-trivial project before committing to something.
> IME a lot of these things work fine for simple projects, but when I want
> to use them on gdb or gcc, they are unusably slow.

I have several projects of moderate size (Emacs itself and some work 
ones, approximately similar) and I regularly try it on a checkout of the 
Mozilla codebase (which is close to 200K files).

It seems to work all right on the former, and less well on the latter 
(initial generation takes 10+ seconds; I did optimize subsequent updates 
for it, but even searching for a single file name in a big enough tags 
file can take half a second).

I have some projects in the middle (e.g. Ruby is 20K files), but I don't 
know which sizes are prevalent among our users, or even among folks 
here. So I'm looking for feedback.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 23:36             ` Dmitry Gutov
@ 2021-01-10 23:50               ` Dmitry Gutov
  2021-01-11 14:56                 ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-10 23:50 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Eli Zaretskii, emacs-devel, philipk, john

On 11.01.2021 01:36, Dmitry Gutov wrote:
> but even searching for a single file name in a big enough tags file can 
> take half a second).

Correction: it's actually writing the updated file to disk which takes 
half a second. This call:

   (write-region (point-min) (point-max) buffer-file-name nil 'silent)

I wonder if *that* could be done asynchronously.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 23:13               ` Dmitry Gutov
@ 2021-01-11 14:53                 ` Eli Zaretskii
  2021-01-12  1:49                   ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-11 14:53 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, john, emacs-devel

> Cc: tom@tromey.com, emacs-devel@gnu.org, philipk@posteo.net,
>  john@yates-sheets.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Mon, 11 Jan 2021 01:13:17 +0200
> 
> On 10.01.2021 20:14, Eli Zaretskii wrote:
> >> I don't know how to automate it better. If the buffer doesn't belong to
> >> any project, which tags table would we use for it? We'll have to prompt
> >> the user, and that's what we do already by default.
> > I thought this prompt will not be supported anymore.  If we are
> > keeping it as fallback when the current buffer doesn't specify which
> > tags table to use, then I'm okay with that.
> 
> For now it's clearly supported with the default value of 
> etags-regen-mode (which is 'off').

I meant with it's being ON, of course.

> Or if you 'M-x visit-tags-table' before having one auto-generated.

I meant both the generation and the usage.  Can we prompt the user in
both cases if the current buffer doesn't tell us where the tags table
should be?

> Would we prompt for the tags table just once upon encountering a 
> project-less buffer, and then use it for all such buffers in the current 
> session?

I think we should use it for buffers that don't have their local
value, yes.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-10 23:50               ` Dmitry Gutov
@ 2021-01-11 14:56                 ` Eli Zaretskii
  2021-01-12  1:33                   ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-11 14:56 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> From: Dmitry Gutov <dgutov@yandex.ru>
> Cc: Eli Zaretskii <eliz@gnu.org>, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> Date: Mon, 11 Jan 2021 01:50:13 +0200
> 
> Correction: it's actually writing the updated file to disk which takes 
> half a second. This call:
> 
>    (write-region (point-min) (point-max) buffer-file-name nil 'silent)
> 
> I wonder if *that* could be done asynchronously.

What kind of asynchronicity did you have in mind?

And I'm probbaly missing something, because I don't understand how
Emacs is involved in updating the tags table.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-11 14:56                 ` Eli Zaretskii
@ 2021-01-12  1:33                   ` Dmitry Gutov
  2021-01-12  4:21                     ` Stefan Monnier
  2021-01-12 15:08                     ` Eli Zaretskii
  0 siblings, 2 replies; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12  1:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 11.01.2021 16:56, Eli Zaretskii wrote:

>> Correction: it's actually writing the updated file to disk which takes
>> half a second. This call:
>>
>>     (write-region (point-min) (point-max) buffer-file-name nil 'silent)
>>
>> I wonder if *that* could be done asynchronously.
> 
> What kind of asynchronicity did you have in mind?

One where the Lisp code doesn't have to wait for the disk write to 
complete. Could use being able to set up a callback for the end, though.

> And I'm probbaly missing something, because I don't understand how
> Emacs is involved in updating the tags table.

It's part of the secret sauce for the quick incremental updates: if 
etags writes to disk, even just to update one file's index, we'll have 
to revert-buffer, and the bigger the tags file is, the longer the revert 
will take. Basically, N(project-size).

To go around that, when updating a file, we delete its existing entry 
inside the buffer visiting the tags file, and then call etags directing 
its output to the end of the same buffer. This way we avoid reverting 
the buffer, although the completion table still needs to be regenerated.

To synchronize the buffer contents with disk, though, we need that 
write-region. *If* we want to synchronize it, of course, e.g. to be able 
to open the file in some future Emacs session. Or I guess we could write 
to that file only once later, in kill-emacs-hook.

BTW, writing it to disk becomes 10x faster if the tags file is visited 
literally. I wonder if that mode could work well for most users. Some 
quick testing showed that it's functional, though I'm guessing it won't 
work for identifiers containing unicode characters.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-11 14:53                 ` Eli Zaretskii
@ 2021-01-12  1:49                   ` Dmitry Gutov
  2021-01-12 15:09                     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12  1:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, john, emacs-devel

On 11.01.2021 16:53, Eli Zaretskii wrote:

>> Or if you 'M-x visit-tags-table' before having one auto-generated.
> 
> I meant both the generation and the usage.  Can we prompt the user in
> both cases if the current buffer doesn't tell us where the tags table
> should be?

Okay, would we prompt for an existing, bespoke TAGS file, or for a 
project directory to auto-generate tags from?

The latter case means we'd basically mark that project as the "default" 
for the purposes of etags-regen until the end of the current session.

It wouldn't include tags from any of the project-less files, though. But 
if you were only thinking about non-file-visiting buffers, that's not a 
problem.

>> Would we prompt for the tags table just once upon encountering a
>> project-less buffer, and then use it for all such buffers in the current
>> session?
> 
> I think we should use it for buffers that don't have their local
> value, yes.

All right, see above.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12  1:33                   ` Dmitry Gutov
@ 2021-01-12  4:21                     ` Stefan Monnier
  2021-01-12 16:59                       ` Dmitry Gutov
  2021-01-12 15:08                     ` Eli Zaretskii
  1 sibling, 1 reply; 36+ messages in thread
From: Stefan Monnier @ 2021-01-12  4:21 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, tom, philipk, john, emacs-devel

> BTW, writing it to disk becomes 10x faster if the tags file is visited
> literally. I wonder if that mode could work well for most users. Some quick
> testing showed that it's functional, though I'm guessing it won't work for
> identifiers containing unicode characters.

How 'bout saving it in `utf-8-emacs`?


        Stefan




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12  1:33                   ` Dmitry Gutov
  2021-01-12  4:21                     ` Stefan Monnier
@ 2021-01-12 15:08                     ` Eli Zaretskii
  2021-01-12 16:48                       ` Dmitry Gutov
  1 sibling, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-12 15:08 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 12 Jan 2021 03:33:10 +0200
> 
> >>     (write-region (point-min) (point-max) buffer-file-name nil 'silent)
> >>
> >> I wonder if *that* could be done asynchronously.
> > 
> > What kind of asynchronicity did you have in mind?
> 
> One where the Lisp code doesn't have to wait for the disk write to 
> complete.

Wed don't have such facilities, because they would need locking the
region (actually, probably the entire buffer) for changes while it
runs.

> > And I'm probbaly missing something, because I don't understand how
> > Emacs is involved in updating the tags table.
> 
> It's part of the secret sauce for the quick incremental updates: if 
> etags writes to disk, even just to update one file's index, we'll have 
> to revert-buffer, and the bigger the tags file is, the longer the revert 
> will take. Basically, N(project-size).

But still faster than write-region, no?  Did you try it the other way
around: let etags write to a file, and then revert, or even just
replace some region that you wanted updated (with
insert-file-contents)?

Anyway, TAGS and etags.el were designed to be very tolerant to
changes, so you shouldn't need to update very often.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12  1:49                   ` Dmitry Gutov
@ 2021-01-12 15:09                     ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-12 15:09 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, john, emacs-devel

> Cc: tom@tromey.com, emacs-devel@gnu.org, philipk@posteo.net,
>  john@yates-sheets.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 12 Jan 2021 03:49:49 +0200
> 
> On 11.01.2021 16:53, Eli Zaretskii wrote:
> 
> >> Or if you 'M-x visit-tags-table' before having one auto-generated.
> > 
> > I meant both the generation and the usage.  Can we prompt the user in
> > both cases if the current buffer doesn't tell us where the tags table
> > should be?
> 
> Okay, would we prompt for an existing, bespoke TAGS file, or for a 
> project directory to auto-generate tags from?
> 
> The latter case means we'd basically mark that project as the "default" 
> for the purposes of etags-regen until the end of the current session.
> 
> It wouldn't include tags from any of the project-less files, though. But 
> if you were only thinking about non-file-visiting buffers, that's not a 
> problem.

I was indeed thinking about buffers that don't visit files.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 15:08                     ` Eli Zaretskii
@ 2021-01-12 16:48                       ` Dmitry Gutov
  2021-01-12 17:15                         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12 16:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 12.01.2021 17:08, Eli Zaretskii wrote:
>> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 12 Jan 2021 03:33:10 +0200
>>
>>>>      (write-region (point-min) (point-max) buffer-file-name nil 'silent)
>>>>
>>>> I wonder if *that* could be done asynchronously.
>>>
>>> What kind of asynchronicity did you have in mind?
>>
>> One where the Lisp code doesn't have to wait for the disk write to
>> complete.
> 
> Wed don't have such facilities, because they would need locking the
> region (actually, probably the entire buffer) for changes while it
> runs.

I'd be fine with that (or if this feature required the use of a separate 
thread, with the buffer locked to it in the meantime).

But perhaps we could go with something simpler.

>>> And I'm probbaly missing something, because I don't understand how
>>> Emacs is involved in updating the tags table.
>>
>> It's part of the secret sauce for the quick incremental updates: if
>> etags writes to disk, even just to update one file's index, we'll have
>> to revert-buffer, and the bigger the tags file is, the longer the revert
>> will take. Basically, N(project-size).
> 
> But still faster than write-region, no?  Did you try it the other way
> around: let etags write to a file, and then revert, or even just
> replace some region that you wanted updated (with
> insert-file-contents)?

With a 45 MB tags file visited:

;; If visited normally:
(benchmark 1 '(revert-buffer t t)) ; => 1.49 s
(benchmark 1 '(write-region 1 (point-max) buffer-file-name)) ; => 0.45 s

;; If visited literally:
(benchmark 1 '(revert-buffer t t)) ; => 1.37 s
(benchmark 1 '(write-region 1 (point-max) buffer-file-name)) ; => 0.06 s

(insert-file-contents "...") takes 0.15 in this example, which would 
make it possible to use if etags implemented the same logic that we do 
in Elisp currently (I tried bringing Tom's etags branch up to date, but 
the performance wasn't good). Also, we can choose not to synchronize to 
disk, or do it much less often, whereas we'd have to call 
insert-file-contents during every update.

> Anyway, TAGS and etags.el were designed to be very tolerant to
> changes, so you shouldn't need to update very often.

I want it to update reliably; maybe not too often (depending on a 
project), but if a user switches to a different Git branch, they should 
be confident that they'll see the changes reflected in the index soon. 
Even (and especially) if those are big changes.

Right now only the update-on-save feature is in there, but I have to 
tell you, having a newly written function in the index right away 
(without having to invoke any commands or switch to the terminal) is 
pretty nice.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12  4:21                     ` Stefan Monnier
@ 2021-01-12 16:59                       ` Dmitry Gutov
  2021-01-12 17:24                         ` Stefan Monnier
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12 16:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, tom, philipk, john, emacs-devel

Hi Stefan,

On 12.01.2021 06:21, Stefan Monnier wrote:
>> BTW, writing it to disk becomes 10x faster if the tags file is visited
>> literally. I wonder if that mode could work well for most users. Some quick
>> testing showed that it's functional, though I'm guessing it won't work for
>> identifiers containing unicode characters.
> 
> How 'bout saving it in `utf-8-emacs`?

Am I doing it right?

(let ((coding-system-for-write 'utf-8-emacs))
   (write-region (point-min) (point-max) buffer-file-name)))

shows only a 5-20% improvement in speed.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 16:48                       ` Dmitry Gutov
@ 2021-01-12 17:15                         ` Eli Zaretskii
  2021-01-12 17:32                           ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-12 17:15 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 12 Jan 2021 18:48:24 +0200
> 
> (insert-file-contents "...") takes 0.15 in this example, which would 
> make it possible to use if etags implemented the same logic that we do 
> in Elisp currently

Which logic is that? what does it do?

> > Anyway, TAGS and etags.el were designed to be very tolerant to
> > changes, so you shouldn't need to update very often.
> 
> I want it to update reliably; maybe not too often (depending on a 
> project), but if a user switches to a different Git branch, they should 
> be confident that they'll see the changes reflected in the index soon. 
> Even (and especially) if those are big changes.

What about regenerating etags only if M-. etc. fail to find an
identifier?

> Right now only the update-on-save feature is in there, but I have to 
> tell you, having a newly written function in the index right away 
> (without having to invoke any commands or switch to the terminal) is 
> pretty nice.

Of course, but then the user will need to be prepared to pay the
price.  TANSTAAFL.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 16:59                       ` Dmitry Gutov
@ 2021-01-12 17:24                         ` Stefan Monnier
  0 siblings, 0 replies; 36+ messages in thread
From: Stefan Monnier @ 2021-01-12 17:24 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Eli Zaretskii, tom, philipk, john, emacs-devel

>> How 'bout saving it in `utf-8-emacs`?
>
> Am I doing it right?
>
> (let ((coding-system-for-write 'utf-8-emacs))
>   (write-region (point-min) (point-max) buffer-file-name)))
>
> shows only a 5-20% improvement in speed.

How disappointing :-(


        Stefan




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 17:15                         ` Eli Zaretskii
@ 2021-01-12 17:32                           ` Dmitry Gutov
  2021-01-12 17:55                             ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12 17:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 12.01.2021 19:15, Eli Zaretskii wrote:
>> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 12 Jan 2021 18:48:24 +0200
>>
>> (insert-file-contents "...") takes 0.15 in this example, which would
>> make it possible to use if etags implemented the same logic that we do
>> in Elisp currently
> 
> Which logic is that? what does it do?

It removes the existing entry for the file from the index, and scans the 
file again (with etags) to add the updated entry for it at the end.

>>> Anyway, TAGS and etags.el were designed to be very tolerant to
>>> changes, so you shouldn't need to update very often.
>>
>> I want it to update reliably; maybe not too often (depending on a
>> project), but if a user switches to a different Git branch, they should
>> be confident that they'll see the changes reflected in the index soon.
>> Even (and especially) if those are big changes.
> 
> What about regenerating etags only if M-. etc. fail to find an
> identifier?

There are downsides.

You will have to press M-. twice sometimes with that approach. 
Completion will routinely fail to show some tags that should be present. 
Also, if the rescan takes a while, I'm not sure you'd want to do that on 
every 'M-.' failure.

But this could be an optional mode of operation.

>> Right now only the update-on-save feature is in there, but I have to
>> tell you, having a newly written function in the index right away
>> (without having to invoke any commands or switch to the terminal) is
>> pretty nice.
> 
> Of course, but then the user will need to be prepared to pay the
> price.  TANSTAAFL.

I think the current state of the feature has already reduced a few 
prices. And we should be able to go further.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 17:32                           ` Dmitry Gutov
@ 2021-01-12 17:55                             ` Eli Zaretskii
  2021-01-12 22:26                               ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-12 17:55 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 12 Jan 2021 19:32:18 +0200
> 
> > What about regenerating etags only if M-. etc. fail to find an
> > identifier?
> 
> There are downsides.
> 
> You will have to press M-. twice sometimes with that approach. 

Why twice? I thought about regenerating automatically once the search
fails, then retrying the search with an updated tags table.

> Completion will routinely fail to show some tags that should be present. 
> Also, if the rescan takes a while, I'm not sure you'd want to do that on 
> every 'M-.' failure.

You could also regenerate when the user switches to another branch.
That's not a very frequent thing to do.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 17:55                             ` Eli Zaretskii
@ 2021-01-12 22:26                               ` Dmitry Gutov
  2021-01-13 15:01                                 ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-12 22:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 12.01.2021 19:55, Eli Zaretskii wrote:
>> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Tue, 12 Jan 2021 19:32:18 +0200
>>
>>> What about regenerating etags only if M-. etc. fail to find an
>>> identifier?
>>
>> There are downsides.
>>
>> You will have to press M-. twice sometimes with that approach.
> 
> Why twice? I thought about regenerating automatically once the search
> fails, then retrying the search with an updated tags table.

Hm, I guess we can do that.

Still, I try to navigate to unknown symbols often enough. It can be an 
unfamiliar class, I don't know if it's defined here or in some 
dependency library... If every time I did that Emacs froze for, say, a 
second, and I didn't know the reasoning for it, my opinion of the editor 
would get pretty low.

So it might be a fine approach, but you have to know what it's doing and 
why. So, off by default.

>> Completion will routinely fail to show some tags that should be present.
>> Also, if the rescan takes a while, I'm not sure you'd want to do that on
>> every 'M-.' failure.
> 
> You could also regenerate when the user switches to another branch.
> That's not a very frequent thing to do.

That can be part of it. So far the options seem to be:

- Brute-force it with project-files and 'stat', checking to see which 
files were added, which removed, and which ones have modtime more recent 
than the previous update.

- Do something clever with Git. I.e. track the current revision, see 
whether it changed and which files changed with it, and also check the 
modtimes of untracked files.

- inotify. Seems to be the most effort-intensive option. I've also seen 
the 'watchman' program (by Facebook) mentioned as pretty reliable, but 
depending on it is probably not great.

Anyway, if you have any suggestions regarding how to write the file 
quicker (and why utf-8-emacs coding didn't help), that would be great.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-12 22:26                               ` Dmitry Gutov
@ 2021-01-13 15:01                                 ` Eli Zaretskii
  2021-01-13 15:52                                   ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-13 15:01 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 13 Jan 2021 00:26:50 +0200
> 
> Anyway, if you have any suggestions regarding how to write the file 
> quicker (and why utf-8-emacs coding didn't help), that would be great.

Almost all the identifiers are ASCII, right?  So maybe optimize 99.9%
of use cases by storing such tags tables in a unibyte buffer, read
with insert-file-contents-literally?

As for why utf-8-emacs didn't help: I'm not really sure why Stefan
thought it will.  I mean, look at the code: it still encodes, just
differently.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-13 15:01                                 ` Eli Zaretskii
@ 2021-01-13 15:52                                   ` Dmitry Gutov
  2021-01-13 15:58                                     ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-13 15:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 13.01.2021 17:01, Eli Zaretskii wrote:
>> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>>   emacs-devel@gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> Date: Wed, 13 Jan 2021 00:26:50 +0200
>>
>> Anyway, if you have any suggestions regarding how to write the file
>> quicker (and why utf-8-emacs coding didn't help), that would be great.
> 
> Almost all the identifiers are ASCII, right?  So maybe optimize 99.9%
> of use cases by storing such tags tables in a unibyte buffer, read
> with insert-file-contents-literally?

All right, and that option is probably handled well enough already by 
the user choosing (l) in the prompt when the tags file is very big.

> As for why utf-8-emacs didn't help: I'm not really sure why Stefan
> thought it will.  I mean, look at the code: it still encodes, just
> differently.

My (apparently faulty) intuition was that if utf-8-emacs is the memory 
representation of buffer text, converting it into that encoding can be 
faster because it could be done by copying from memory rather that 
having to do the work of recoding every character.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-13 15:52                                   ` Dmitry Gutov
@ 2021-01-13 15:58                                     ` Eli Zaretskii
  2021-01-16  3:57                                       ` Dmitry Gutov
  0 siblings, 1 reply; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-13 15:58 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 13 Jan 2021 17:52:16 +0200
> 
> > Almost all the identifiers are ASCII, right?  So maybe optimize 99.9%
> > of use cases by storing such tags tables in a unibyte buffer, read
> > with insert-file-contents-literally?
> 
> All right, and that option is probably handled well enough already by 
> the user choosing (l) in the prompt when the tags file is very big.

Yes, but my idea was to do that automatically.  After all, the size
threshold beyond which we prompt the user is customizable, so it could
be very large.

> > As for why utf-8-emacs didn't help: I'm not really sure why Stefan
> > thought it will.  I mean, look at the code: it still encodes, just
> > differently.
> 
> My (apparently faulty) intuition was that if utf-8-emacs is the memory 
> representation of buffer text, converting it into that encoding can be 
> faster because it could be done by copying from memory rather that 
> having to do the work of recoding every character.

We don't recode characters when they are valid UTF-8 sequences, but
you forget the raw bytes: they are converted from internal multibyte
representation to single bytes, and that requires walking the buffer
one character at a time.

IOW, utf-8-emacs is the same as utf-8 for this purpose.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-13 15:58                                     ` Eli Zaretskii
@ 2021-01-16  3:57                                       ` Dmitry Gutov
  2021-01-16  7:34                                         ` Eli Zaretskii
  0 siblings, 1 reply; 36+ messages in thread
From: Dmitry Gutov @ 2021-01-16  3:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, tom, emacs-devel, john

On 13.01.2021 17:58, Eli Zaretskii wrote:

>>> Almost all the identifiers are ASCII, right?  So maybe optimize 99.9%
>>> of use cases by storing such tags tables in a unibyte buffer, read
>>> with insert-file-contents-literally?
>>
>> All right, and that option is probably handled well enough already by
>> the user choosing (l) in the prompt when the tags file is very big.
> 
> Yes, but my idea was to do that automatically.  After all, the size
> threshold beyond which we prompt the user is customizable, so it could
> be very large.

Even so, this mode of operation removes a feature. How frequently it's 
used, I have no idea, but it's better to have full functionality by 
default. There must be a reason why all those languages added support 
for unicode chars in identifiers.

For the time being, I just disabled synchronization to disk, given that 
we don't yet know how to refresh an existing file anyway.

>> My (apparently faulty) intuition was that if utf-8-emacs is the memory
>> representation of buffer text, converting it into that encoding can be
>> faster because it could be done by copying from memory rather that
>> having to do the work of recoding every character.
> 
> We don't recode characters when they are valid UTF-8 sequences, but
> you forget the raw bytes: they are converted from internal multibyte
> representation to single bytes, and that requires walking the buffer
> one character at a time.
> 
> IOW, utf-8-emacs is the same as utf-8 for this purpose.

So utf-8-emacs is not the same as "internal multibyte representation"?




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Automatic (e)tags generation and incremental updates
  2021-01-16  3:57                                       ` Dmitry Gutov
@ 2021-01-16  7:34                                         ` Eli Zaretskii
  0 siblings, 0 replies; 36+ messages in thread
From: Eli Zaretskii @ 2021-01-16  7:34 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: philipk, tom, emacs-devel, john

> Cc: tom@tromey.com, john@yates-sheets.org, philipk@posteo.net,
>  emacs-devel@gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Sat, 16 Jan 2021 05:57:21 +0200
> 
> > We don't recode characters when they are valid UTF-8 sequences, but
> > you forget the raw bytes: they are converted from internal multibyte
> > representation to single bytes, and that requires walking the buffer
> > one character at a time.
> > 
> > IOW, utf-8-emacs is the same as utf-8 for this purpose.
> 
> So utf-8-emacs is not the same as "internal multibyte representation"?

No, not according to my reading of the code.  (The telltale sign is
that "C-h C" tells you utf-8-emacs has the usual 3 EOL variants,
something that makes no sense for the internal representation.)  If
something like "dump internal representation" coding-system is needed,
we will have to add it, I think.



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-01-16  7:34 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-14  3:36 Automatic (e)tags generation and incremental updates Dmitry Gutov
2021-01-07  3:46 ` Dmitry Gutov
2021-01-07 14:15   ` Eli Zaretskii
2021-01-07 15:56     ` Dmitry Gutov
2021-01-07 16:17       ` Stefan Kangas
2021-01-09 21:49       ` Tom Tromey
2021-01-10 13:53         ` Dmitry Gutov
2021-01-10 16:56           ` Tom Tromey
2021-01-10 19:39             ` Tom Tromey
2021-01-10 23:09               ` Dmitry Gutov
2021-01-10 23:36             ` Dmitry Gutov
2021-01-10 23:50               ` Dmitry Gutov
2021-01-11 14:56                 ` Eli Zaretskii
2021-01-12  1:33                   ` Dmitry Gutov
2021-01-12  4:21                     ` Stefan Monnier
2021-01-12 16:59                       ` Dmitry Gutov
2021-01-12 17:24                         ` Stefan Monnier
2021-01-12 15:08                     ` Eli Zaretskii
2021-01-12 16:48                       ` Dmitry Gutov
2021-01-12 17:15                         ` Eli Zaretskii
2021-01-12 17:32                           ` Dmitry Gutov
2021-01-12 17:55                             ` Eli Zaretskii
2021-01-12 22:26                               ` Dmitry Gutov
2021-01-13 15:01                                 ` Eli Zaretskii
2021-01-13 15:52                                   ` Dmitry Gutov
2021-01-13 15:58                                     ` Eli Zaretskii
2021-01-16  3:57                                       ` Dmitry Gutov
2021-01-16  7:34                                         ` Eli Zaretskii
2021-01-10 16:49         ` Eli Zaretskii
2021-01-10 16:58           ` Tom Tromey
2021-01-10 17:56           ` Dmitry Gutov
2021-01-10 18:14             ` Eli Zaretskii
2021-01-10 23:13               ` Dmitry Gutov
2021-01-11 14:53                 ` Eli Zaretskii
2021-01-12  1:49                   ` Dmitry Gutov
2021-01-12 15:09                     ` Eli Zaretskii

unofficial mirror of emacs-devel@gnu.org 

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/emacs-devel/0 emacs-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 emacs-devel emacs-devel/ https://yhetil.org/emacs-devel \
		emacs-devel@gnu.org
	public-inbox-index emacs-devel

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.emacs.devel
	nntp://news.gmane.io/gmane.emacs.devel


AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git