* BUG 20703 further evidence @ 2016-01-13 17:54 Sam Halliday 2016-01-13 21:25 ` Dmitry Gutov ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Sam Halliday @ 2016-01-13 17:54 UTC (permalink / raw) To: help-gnu-emacs Hi all, I have been seeing a problem that is described in this bug report https://debbugs.gnu.org/db/20/20703.html I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case. I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough! I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits. (defun etags-tags-completion-table () ; Doc string? (let ((table (make-vector 511 0)) (progress-reporter (make-progress-reporter (format "Making tags completion table for %s..." buffer-file-name) (point-min) (point-max)))) (save-excursion (goto-char (point-min)) ;; This monster regexp matches an etags tag line. ;; \1 is the string to match; ;; \2 is not interesting; ;; \3 is the guessed tag name; XXX guess should be better eg DEFUN ;; \4 is not interesting; ;; \5 is the explicitly-specified tag name. ;; \6 is the line to start searching at; ;; \7 is the char to start searching at. (condition-case err (while (re-search-forward "^\\(\\([^\177]+[^-a-zA-Z0-9_+*$:\177]+\\)?\ \\([-a-zA-Z0-9_+*$?:]+\\)[^-a-zA-Z0-9_+*$?:\177]*\\)\177\ \\(\\([^\n\001]+\\)\001\\)?\\([0-9]+\\)?,\\([0-9]+\\)?\n" nil t) (intern (prog1 (if (match-beginning 5) ;; There is an explicit tag name. (buffer-substring (match-beginning 5) (match-end 5)) ;; No explicit tag name. Best guess. (buffer-substring (match-beginning 3) (match-end 3))) (progress-reporter-update progress-reporter (point))) table)) (error (message "error happened near %d" (point)) (error (error-message-string err))))) table)) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG 20703 further evidence 2016-01-13 17:54 BUG 20703 further evidence Sam Halliday @ 2016-01-13 21:25 ` Dmitry Gutov 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen 2020-08-25 9:13 ` Lars Ingebrigtsen 2016-01-13 21:25 ` Dmitry Gutov [not found] ` <mailman.2317.1452720341.843.help-gnu-emacs@gnu.org> 2 siblings, 2 replies; 10+ messages in thread From: Dmitry Gutov @ 2016-01-13 21:25 UTC (permalink / raw) To: Sam Halliday, help-gnu-emacs; +Cc: 20703 Hi Sam, On 01/13/2016 08:54 PM, Sam Halliday wrote: > I have been seeing a problem that is described in this bug report > > https://debbugs.gnu.org/db/20/20703.html > > I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case. You should try the current version in emacs-25, it's smaller and faster than previously, although it also probably fails at long-enough lines. > I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough! > > I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits. I can think of some other possible solutions: - External pre-processor that removes lines that are too long. - Extra step, together with a custom variable, in visit-tags-table, that goes through the opened files and does the same. - re-search-forward with limit, as implemented in the patch below (against emacs-25), that might work against problematic files like that (I haven't tested it). I don't really know if we should install it, though, because it adds a performance overhead of ~10%. And I don't know if this problem is common enough. Because another way to combat it is at the source: through judicious application of --exclude argument. As a bonus, the generation phase will become faster as well (sometimes dramatically). Should we add a validation phase to visit-tags-table instead? Like, one that would say "your TAGS files contains obviously malformed entries from file XXX.min.js, go back and ignore it"? diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el index 2db7220..9a663d4 100644 --- a/lisp/progmodes/etags.el +++ b/lisp/progmodes/etags.el @@ -1252,8 +1252,9 @@ etags-file-of-tag str (expand-file-name str (file-truename default-directory)))))) +(defvar etags--table-line-limit 500) -(defun etags-tags-completion-table () ; Doc string? +(defun etags-tags-completion-table () ; Doc string? (let (table (progress-reporter (make-progress-reporter @@ -1263,10 +1264,13 @@ etags-tags-completion-table (goto-char (point-min)) ;; This regexp matches an explicit tag name or the place where ;; it would start. - (while (re-search-forward - "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" - nil t) - (push (prog1 (if (match-beginning 1) + (while (not (eobp)) + (if (not (re-search-forward + "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" + ;; Avoid lines that are too long (bug#20703). + (+ (point) etags--table-line-limit) t)) + (forward-line 1) + (push (prog1 (if (match-beginning 1) ;; There is an explicit tag name. (buffer-substring (match-beginning 1) (match-end 1)) ;; No explicit tag name. Backtrack a little, @@ -1277,7 +1281,7 @@ etags-tags-completion-table (buffer-substring (point) (match-beginning 0)) (goto-char (match-end 0)))) (progress-reporter-update progress-reporter (point))) - table))) + table)))) table)) (defun etags-snarf-tag (&optional use-explicit) ; Doc string? ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: bug#20703: BUG 20703 further evidence 2016-01-13 21:25 ` Dmitry Gutov @ 2020-08-25 9:13 ` Lars Ingebrigtsen 2020-08-25 14:54 ` Drew Adams ` (2 more replies) 2020-08-25 9:13 ` Lars Ingebrigtsen 1 sibling, 3 replies; 10+ messages in thread From: Lars Ingebrigtsen @ 2020-08-25 9:13 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs Dmitry Gutov <dgutov@yandex.ru> writes: >> I'm triggering the error in an extremely long line of code (46,000 >> characters!). [...] > - re-search-forward with limit, as implemented in the patch below > (against emacs-25), that might work against problematic files like > that (I haven't tested it). > > I don't really know if we should install it, though, because it adds a > performance overhead of ~10%. And I don't know if this problem is > common enough. I think this is a use case (46K long lines) that's really obscure, and a 10% performance it wouldn't be appropriate. > Because another way to combat it is at the source: through judicious > application of --exclude argument. As a bonus, the generation phase > will become faster as well (sometimes dramatically). > > Should we add a validation phase to visit-tags-table instead? Like, > one that would say "your TAGS files contains obviously malformed > entries from file XXX.min.js, go back and ignore it"? If that can be done efficiently, then that sounds like a good idea. Otherwise, perhaps we should just say that etags just doesn't support 46K long line source files and close this report as a wontfix? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#20703: BUG 20703 further evidence 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen @ 2020-08-25 14:54 ` Drew Adams 2020-10-11 3:08 ` Lars Ingebrigtsen 2020-10-11 3:08 ` Lars Ingebrigtsen 2 siblings, 0 replies; 10+ messages in thread From: Drew Adams @ 2020-08-25 14:54 UTC (permalink / raw) To: Lars Ingebrigtsen, Dmitry Gutov; +Cc: Sam Halliday, 20703 Is there really a need to cc help-gnu-emacs@gnu.org? ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#20703: BUG 20703 further evidence 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen 2020-08-25 14:54 ` Drew Adams @ 2020-10-11 3:08 ` Lars Ingebrigtsen 2020-10-11 3:08 ` Lars Ingebrigtsen 2 siblings, 0 replies; 10+ messages in thread From: Lars Ingebrigtsen @ 2020-10-11 3:08 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs Lars Ingebrigtsen <larsi@gnus.org> writes: > If that can be done efficiently, then that sounds like a good idea. > Otherwise, perhaps we should just say that etags just doesn't support > 46K long line source files and close this report as a wontfix? No comments in six weeks, so I'm closing this as a wontfix. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: bug#20703: BUG 20703 further evidence 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen 2020-08-25 14:54 ` Drew Adams 2020-10-11 3:08 ` Lars Ingebrigtsen @ 2020-10-11 3:08 ` Lars Ingebrigtsen 2 siblings, 0 replies; 10+ messages in thread From: Lars Ingebrigtsen @ 2020-10-11 3:08 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs Lars Ingebrigtsen <larsi@gnus.org> writes: > If that can be done efficiently, then that sounds like a good idea. > Otherwise, perhaps we should just say that etags just doesn't support > 46K long line source files and close this report as a wontfix? No comments in six weeks, so I'm closing this as a wontfix. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#20703: BUG 20703 further evidence 2016-01-13 21:25 ` Dmitry Gutov 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen @ 2020-08-25 9:13 ` Lars Ingebrigtsen 1 sibling, 0 replies; 10+ messages in thread From: Lars Ingebrigtsen @ 2020-08-25 9:13 UTC (permalink / raw) To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs Dmitry Gutov <dgutov@yandex.ru> writes: >> I'm triggering the error in an extremely long line of code (46,000 >> characters!). [...] > - re-search-forward with limit, as implemented in the patch below > (against emacs-25), that might work against problematic files like > that (I haven't tested it). > > I don't really know if we should install it, though, because it adds a > performance overhead of ~10%. And I don't know if this problem is > common enough. I think this is a use case (46K long lines) that's really obscure, and a 10% performance it wouldn't be appropriate. > Because another way to combat it is at the source: through judicious > application of --exclude argument. As a bonus, the generation phase > will become faster as well (sometimes dramatically). > > Should we add a validation phase to visit-tags-table instead? Like, > one that would say "your TAGS files contains obviously malformed > entries from file XXX.min.js, go back and ignore it"? If that can be done efficiently, then that sounds like a good idea. Otherwise, perhaps we should just say that etags just doesn't support 46K long line source files and close this report as a wontfix? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 10+ messages in thread
* bug#20703: BUG 20703 further evidence 2016-01-13 17:54 BUG 20703 further evidence Sam Halliday 2016-01-13 21:25 ` Dmitry Gutov @ 2016-01-13 21:25 ` Dmitry Gutov [not found] ` <mailman.2317.1452720341.843.help-gnu-emacs@gnu.org> 2 siblings, 0 replies; 10+ messages in thread From: Dmitry Gutov @ 2016-01-13 21:25 UTC (permalink / raw) To: Sam Halliday, help-gnu-emacs; +Cc: 20703 Hi Sam, On 01/13/2016 08:54 PM, Sam Halliday wrote: > I have been seeing a problem that is described in this bug report > > https://debbugs.gnu.org/db/20/20703.html > > I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case. You should try the current version in emacs-25, it's smaller and faster than previously, although it also probably fails at long-enough lines. > I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough! > > I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits. I can think of some other possible solutions: - External pre-processor that removes lines that are too long. - Extra step, together with a custom variable, in visit-tags-table, that goes through the opened files and does the same. - re-search-forward with limit, as implemented in the patch below (against emacs-25), that might work against problematic files like that (I haven't tested it). I don't really know if we should install it, though, because it adds a performance overhead of ~10%. And I don't know if this problem is common enough. Because another way to combat it is at the source: through judicious application of --exclude argument. As a bonus, the generation phase will become faster as well (sometimes dramatically). Should we add a validation phase to visit-tags-table instead? Like, one that would say "your TAGS files contains obviously malformed entries from file XXX.min.js, go back and ignore it"? diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el index 2db7220..9a663d4 100644 --- a/lisp/progmodes/etags.el +++ b/lisp/progmodes/etags.el @@ -1252,8 +1252,9 @@ etags-file-of-tag str (expand-file-name str (file-truename default-directory)))))) +(defvar etags--table-line-limit 500) -(defun etags-tags-completion-table () ; Doc string? +(defun etags-tags-completion-table () ; Doc string? (let (table (progress-reporter (make-progress-reporter @@ -1263,10 +1264,13 @@ etags-tags-completion-table (goto-char (point-min)) ;; This regexp matches an explicit tag name or the place where ;; it would start. - (while (re-search-forward - "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" - nil t) - (push (prog1 (if (match-beginning 1) + (while (not (eobp)) + (if (not (re-search-forward + "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" + ;; Avoid lines that are too long (bug#20703). + (+ (point) etags--table-line-limit) t)) + (forward-line 1) + (push (prog1 (if (match-beginning 1) ;; There is an explicit tag name. (buffer-substring (match-beginning 1) (match-end 1)) ;; No explicit tag name. Backtrack a little, @@ -1277,7 +1281,7 @@ etags-tags-completion-table (buffer-substring (point) (match-beginning 0)) (goto-char (match-end 0)))) (progress-reporter-update progress-reporter (point))) - table))) + table)))) table)) (defun etags-snarf-tag (&optional use-explicit) ; Doc string? ^ permalink raw reply related [flat|nested] 10+ messages in thread
[parent not found: <mailman.2317.1452720341.843.help-gnu-emacs@gnu.org>]
* Re: BUG 20703 further evidence [not found] ` <mailman.2317.1452720341.843.help-gnu-emacs@gnu.org> @ 2016-01-13 21:36 ` Sam Halliday 2016-01-13 21:50 ` Dmitry Gutov 0 siblings, 1 reply; 10+ messages in thread From: Sam Halliday @ 2016-01-13 21:36 UTC (permalink / raw) To: help-gnu-emacs Thanks Dmitry, For Emacs 25 we have the option to be smarter, but since I'm on Emacs 24 I am currently in the market for an evil hack :-) (although, copying the emacs-25 faster implementation might not be a bad idea as well, this is a particularly slow part of using Emacs). The approach that sounds most sensible for my use case sounds like just excluding that one file from indexing, because I can do that from my .ctags. I actually hadn't thought of it until you mentioned it! I was thinking along the lines of a function that deletes all the long lines from a TAGS file, part of a validation / cleanup phase. If you have a recipe in mind for that, it would be pretty useful. Could you please copy out your proposed changes in full? I won't be applying them against their sources, I'll just put them in my scratch and execute in the running instance. Best regards, Sam On Wednesday, 13 January 2016 21:25:43 UTC, Dmitry Gutov wrote: > Hi Sam, > > On 01/13/2016 08:54 PM, Sam Halliday wrote: > > > I have been seeing a problem that is described in this bug report > > > > https://debbugs.gnu.org/db/20/20703.html > > > > I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case. > > You should try the current version in emacs-25, it's smaller and faster > than previously, although it also probably fails at long-enough lines. > > > I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough! > > > > I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits. > > I can think of some other possible solutions: > > - External pre-processor that removes lines that are too long. > > - Extra step, together with a custom variable, in visit-tags-table, that > goes through the opened files and does the same. > > - re-search-forward with limit, as implemented in the patch below > (against emacs-25), that might work against problematic files like that > (I haven't tested it). > > I don't really know if we should install it, though, because it adds a > performance overhead of ~10%. And I don't know if this problem is common > enough. > > Because another way to combat it is at the source: through judicious > application of --exclude argument. As a bonus, the generation phase will > become faster as well (sometimes dramatically). > > Should we add a validation phase to visit-tags-table instead? Like, one > that would say "your TAGS files contains obviously malformed entries > from file XXX.min.js, go back and ignore it"? > > diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el > index 2db7220..9a663d4 100644 > --- a/lisp/progmodes/etags.el > +++ b/lisp/progmodes/etags.el > @@ -1252,8 +1252,9 @@ etags-file-of-tag > str > (expand-file-name str (file-truename default-directory)))))) > > +(defvar etags--table-line-limit 500) > > -(defun etags-tags-completion-table () ; Doc string? > +(defun etags-tags-completion-table () ; Doc string? > (let (table > (progress-reporter > (make-progress-reporter > @@ -1263,10 +1264,13 @@ etags-tags-completion-table > (goto-char (point-min)) > ;; This regexp matches an explicit tag name or the place where > ;; it would start. > - (while (re-search-forward > - "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" > - nil t) > - (push (prog1 (if (match-beginning 1) > + (while (not (eobp)) > + (if (not (re-search-forward > + "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" > + ;; Avoid lines that are too long (bug#20703). > + (+ (point) etags--table-line-limit) t)) > + (forward-line 1) > + (push (prog1 (if (match-beginning 1) > ;; There is an explicit tag name. > (buffer-substring (match-beginning 1) (match-end 1)) > ;; No explicit tag name. Backtrack a little, > @@ -1277,7 +1281,7 @@ etags-tags-completion-table > (buffer-substring (point) > (match-beginning 0)) > (goto-char (match-end 0)))) > (progress-reporter-update progress-reporter (point))) > - table))) > + table)))) > table)) > > (defun etags-snarf-tag (&optional use-explicit) ; Doc string? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: BUG 20703 further evidence 2016-01-13 21:36 ` Sam Halliday @ 2016-01-13 21:50 ` Dmitry Gutov 0 siblings, 0 replies; 10+ messages in thread From: Dmitry Gutov @ 2016-01-13 21:50 UTC (permalink / raw) To: Sam Halliday, help-gnu-emacs On 01/14/2016 12:36 AM, Sam Halliday wrote: > The approach that sounds most sensible for my use case sounds like just excluding that one file from indexing, because I can do that from my .ctags. I actually hadn't thought of it until you mentioned it! I've suggested it before, in a comment to the bug in question: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20703#26 > I was thinking along the lines of a function that deletes all the long lines from a TAGS file, part of a validation / cleanup phase. If you have a recipe in mind for that, it would be pretty useful. That code still has to be written. Maybe after 25.1? > Could you please copy out your proposed changes in full? I won't be applying them against their sources, I'll just put them in my scratch and execute in the running instance. Try replacing these definitions: (defvar etags--table-line-limit 500) (defun etags-tags-completion-table () ; Doc string? (let (table (progress-reporter (make-progress-reporter (format "Making tags completion table for %s..." buffer-file-name) (point-min) (point-max)))) (save-excursion (goto-char (point-min)) ;; This regexp matches an explicit tag name or the place where ;; it would start. (while (not (eobp)) (if (not (re-search-forward "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?" ;; Avoid lines that are too long (bug#20703). (+ (point) etags--table-line-limit) t)) (forward-line 1) (push (prog1 (if (match-beginning 1) ;; There is an explicit tag name. (buffer-substring (match-beginning 1) (match-end 1)) ;; No explicit tag name. Backtrack a little, ;; and look for the implicit one. (goto-char (match-beginning 0)) (skip-chars-backward "^\f\t\n\r()=,; ") (prog1 (buffer-substring (point) (match-beginning 0)) (goto-char (match-end 0)))) (progress-reporter-update progress-reporter (point))) table)))) table)) (defun tags-completion-table () "Build `tags-completion-table' on demand. The tags included in the completion table are those in the current tags table and its (recursively) included tags tables." (or tags-completion-table ;; No cached value for this buffer. (condition-case () (let (current-table combined-table) (message "Making tags completion table for %s..." buffer-file-name) (save-excursion ;; Iterate over the current list of tags tables. (while (visit-tags-table-buffer (and combined-table t)) ;; Find possible completions in this table. (setq current-table (funcall tags-completion-table-function)) ;; Merge this buffer's completions into the combined table. (if combined-table (mapatoms (lambda (sym) (intern (symbol-name sym) combined-table)) current-table) (setq combined-table current-table)))) (message "Making tags completion table for %s...done" buffer-file-name) ;; Cache the result in a buffer-local variable. (setq tags-completion-table combined-table)) (quit (message "Tags completion table construction aborted.") (setq tags-completion-table nil))))) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-10-11 3:08 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-01-13 17:54 BUG 20703 further evidence Sam Halliday 2016-01-13 21:25 ` Dmitry Gutov 2020-08-25 9:13 ` bug#20703: " Lars Ingebrigtsen 2020-08-25 14:54 ` Drew Adams 2020-10-11 3:08 ` Lars Ingebrigtsen 2020-10-11 3:08 ` Lars Ingebrigtsen 2020-08-25 9:13 ` Lars Ingebrigtsen 2016-01-13 21:25 ` Dmitry Gutov [not found] ` <mailman.2317.1452720341.843.help-gnu-emacs@gnu.org> 2016-01-13 21:36 ` Sam Halliday 2016-01-13 21:50 ` Dmitry Gutov
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.