unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#20703: BUG 20703 further evidence
       [not found] <5ab4af6b-5b7d-40f9-b49f-2d8cc6926e9f@googlegroups.com>
@ 2016-01-13 21:25 ` Dmitry Gutov
       [not found] ` <5696C0CC.9010300@yandex.ru>
  1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Gutov @ 2016-01-13 21:25 UTC (permalink / raw)
  To: Sam Halliday, help-gnu-emacs; +Cc: 20703

Hi Sam,

On 01/13/2016 08:54 PM, Sam Halliday wrote:

> I have been seeing a problem that is described in this bug report
>
>    https://debbugs.gnu.org/db/20/20703.html
>
> I have applied the suggested patch to etags-tags-completion-table (copied below in completeness for your convenience) and trapped an error case.

You should try the current version in emacs-25, it's smaller and faster 
than previously, although it also probably fails at long-enough lines.

> I'm triggering the error in an extremely long line of code (46,000 characters!). I presume somebody programmatically generated the line and pasted it into the source. A workaround could be to simply filter such lines at the ctag building or loading stage, just something that deletes "long" lines, whatever that may mean. Probably 500 characters is long enough!
>
> I could also look at adding maximum sizes to my regexes in ctags, but that really isn't a general solution because many ctags patterns do not have such limits.

I can think of some other possible solutions:

- External pre-processor that removes lines that are too long.

- Extra step, together with a custom variable, in visit-tags-table, that 
goes through the opened files and does the same.

- re-search-forward with limit, as implemented in the patch below 
(against emacs-25), that might work against problematic files like that 
(I haven't tested it).

I don't really know if we should install it, though, because it adds a 
performance overhead of ~10%. And I don't know if this problem is common 
enough.

Because another way to combat it is at the source: through judicious 
application of --exclude argument. As a bonus, the generation phase will 
become faster as well (sometimes dramatically).

Should we add a validation phase to visit-tags-table instead? Like, one 
that would say "your TAGS files contains obviously malformed entries 
from file XXX.min.js, go back and ignore it"?

diff --git a/lisp/progmodes/etags.el b/lisp/progmodes/etags.el
index 2db7220..9a663d4 100644
--- a/lisp/progmodes/etags.el
+++ b/lisp/progmodes/etags.el
@@ -1252,8 +1252,9 @@ etags-file-of-tag
  	  str
  	(expand-file-name str (file-truename default-directory))))))

+(defvar etags--table-line-limit 500)

-(defun etags-tags-completion-table () ; Doc string?
+(defun etags-tags-completion-table ()   ; Doc string?
    (let (table
  	(progress-reporter
  	 (make-progress-reporter
@@ -1263,10 +1264,13 @@ etags-tags-completion-table
        (goto-char (point-min))
        ;; This regexp matches an explicit tag name or the place where
        ;; it would start.
-      (while (re-search-forward
-              "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
-	      nil t)
-	(push	(prog1 (if (match-beginning 1)
+      (while (not (eobp))
+        (if (not (re-search-forward
+                  "[\f\t\n\r()=,; ]?\177\\\(?:\\([^\n\001]+\\)\001\\)?"
+                  ;; Avoid lines that are too long (bug#20703).
+                  (+ (point) etags--table-line-limit) t))
+            (forward-line 1)
+          (push (prog1 (if (match-beginning 1)
  			   ;; There is an explicit tag name.
  			   (buffer-substring (match-beginning 1) (match-end 1))
  			 ;; No explicit tag name.  Backtrack a little,
@@ -1277,7 +1281,7 @@ etags-tags-completion-table
                               (buffer-substring (point) 
(match-beginning 0))
                             (goto-char (match-end 0))))
  		  (progress-reporter-update progress-reporter (point)))
-		table)))
+		table))))
      table))

  (defun etags-snarf-tag (&optional use-explicit) ; Doc string?






^ permalink raw reply related	[flat|nested] 4+ messages in thread

* bug#20703: BUG 20703 further evidence
       [not found] ` <5696C0CC.9010300@yandex.ru>
@ 2020-08-25  9:13   ` Lars Ingebrigtsen
       [not found]   ` <875z97dr6j.fsf@gnus.org>
  1 sibling, 0 replies; 4+ messages in thread
From: Lars Ingebrigtsen @ 2020-08-25  9:13 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs

Dmitry Gutov <dgutov@yandex.ru> writes:

>> I'm triggering the error in an extremely long line of code (46,000
>> characters!).

[...]

> - re-search-forward with limit, as implemented in the patch below
>   (against emacs-25), that might work against problematic files like
>   that (I haven't tested it).
>
> I don't really know if we should install it, though, because it adds a
> performance overhead of ~10%. And I don't know if this problem is
> common enough.

I think this is a use case (46K long lines) that's really obscure, and a
10% performance it wouldn't be appropriate.

> Because another way to combat it is at the source: through judicious
> application of --exclude argument. As a bonus, the generation phase
> will become faster as well (sometimes dramatically).
>
> Should we add a validation phase to visit-tags-table instead? Like,
> one that would say "your TAGS files contains obviously malformed
> entries from file XXX.min.js, go back and ignore it"?

If that can be done efficiently, then that sounds like a good idea.
Otherwise, perhaps we should just say that etags just doesn't support
46K long line source files and close this report as a wontfix?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#20703: BUG 20703 further evidence
       [not found]   ` <875z97dr6j.fsf@gnus.org>
@ 2020-08-25 14:54     ` Drew Adams
  2020-10-11  3:08     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 4+ messages in thread
From: Drew Adams @ 2020-08-25 14:54 UTC (permalink / raw)
  To: Lars Ingebrigtsen, Dmitry Gutov; +Cc: Sam Halliday, 20703

Is there really a need to cc help-gnu-emacs@gnu.org?





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#20703: BUG 20703 further evidence
       [not found]   ` <875z97dr6j.fsf@gnus.org>
  2020-08-25 14:54     ` Drew Adams
@ 2020-10-11  3:08     ` Lars Ingebrigtsen
  1 sibling, 0 replies; 4+ messages in thread
From: Lars Ingebrigtsen @ 2020-10-11  3:08 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Sam Halliday, 20703, help-gnu-emacs

Lars Ingebrigtsen <larsi@gnus.org> writes:

> If that can be done efficiently, then that sounds like a good idea.
> Otherwise, perhaps we should just say that etags just doesn't support
> 46K long line source files and close this report as a wontfix?

No comments in six weeks, so I'm closing this as a wontfix.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-10-11  3:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5ab4af6b-5b7d-40f9-b49f-2d8cc6926e9f@googlegroups.com>
2016-01-13 21:25 ` bug#20703: BUG 20703 further evidence Dmitry Gutov
     [not found] ` <5696C0CC.9010300@yandex.ru>
2020-08-25  9:13   ` Lars Ingebrigtsen
     [not found]   ` <875z97dr6j.fsf@gnus.org>
2020-08-25 14:54     ` Drew Adams
2020-10-11  3:08     ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).