Inefficient code in xml.el

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Inefficient code in xml.el
@ 2005-06-06 11:51 Kim F. Storm
  2005-06-06 12:02 ` David Kastrup
  0 siblings, 1 reply; 4+ messages in thread
From: Kim F. Storm @ 2005-06-06 11:51 UTC (permalink / raw)
  Cc: emacs-devel


I noticed that xml.el has several occurences of code like this:

	   ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
				"\\)[ \t\n\r]*\\(" xml-entity-value-re
				"\\)[ \t\n\r]*>"))
	    (let ((name  (buffer-substring (nth 2 (match-data))
					   (nth 3 (match-data))))
		  (value (buffer-substring (+ (nth 4 (match-data)) 1)
					   (- (nth 5 (match-data)) 1))))
	      (goto-char (nth 1 (match-data)))


Using (match-data) like that is VERY, VERY inefficient.

Use either match-beginning/match-end, or match-string instead.


-- 
Kim F. Storm <storm@cua.dk> http://www.cua.dk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inefficient code in xml.el
  2005-06-06 11:51 Kim F. Storm
@ 2005-06-06 12:02 ` David Kastrup
  0 siblings, 0 replies; 4+ messages in thread
From: David Kastrup @ 2005-06-06 12:02 UTC (permalink / raw)
  Cc: Mark A. Hershberger, emacs-devel

storm@cua.dk (Kim F. Storm) writes:

> I noticed that xml.el has several occurences of code like this:
>
> 	   ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
> 				"\\)[ \t\n\r]*\\(" xml-entity-value-re
> 				"\\)[ \t\n\r]*>"))
> 	    (let ((name  (buffer-substring (nth 2 (match-data))
> 					   (nth 3 (match-data))))
> 		  (value (buffer-substring (+ (nth 4 (match-data)) 1)
> 					   (- (nth 5 (match-data)) 1))))
> 	      (goto-char (nth 1 (match-data)))
>
>
> Using (match-data) like that is VERY, VERY inefficient.

Uh, that is only half the story.  It is not just devastatingly
inefficient by itself.  Every single such call creates a whole slew of
markers, and those slow down _any_ text manipulation in the buffer
_afterwards_ until they get garbage-collected at some indeterminate
point of time in the future.

The code passage above creates 30 new markers _every_ time it is run.
All of these are maintained for every insertion/deletion in the buffer
until garbage collection finally removes them.

> Use either match-beginning/match-end, or match-string instead.

Yes, yes, YES!

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Inefficient code in xml.el
@ 2005-06-06 12:41 klaus.berndl
  2005-06-06 17:18 ` Kevin Rodgers
  0 siblings, 1 reply; 4+ messages in thread
From: klaus.berndl @ 2005-06-06 12:41 UTC (permalink / raw)
  Cc: mah, emacs-devel

Well, i get the point, but how an elisp-programmer should know this?!

The manual says:

 - Function: match-data
     This function returns a newly constructed list containing all the
     information on what text the last search matched.  Element zero is
     the position of the beginning of the match for the whole
     expression; element one is the position of the end of the match
     for the expression.  The next two elements are the positions of
     the beginning and end of the match for the first subexpression,
     and so on.  In general, element number 2N corresponds to
     `(match-beginning N)'; and element number 2N + 1 corresponds to
     `(match-end N)'.

     All the elements are markers or `nil' if matching was done on a
     buffer, and all are integers or `nil' if matching was done on a
     string with `string-match'.

     As always, there must be no possibility of intervening searches
     between the call to a search function and the call to `match-data'
     that is intended to access the match data for that search.

          (match-data)
               =>  (#<marker at 9 in foo>
                    #<marker at 17 in foo>
                    #<marker at 13 in foo>
                    #<marker at 17 in foo>)

What about to mention such inefficiency-problems in the documentation?!
The manual only says "... Corresponds to ....". IMHO the documentation
must mention the performnace-topic if it is so important you wrote in
these postings!

Ciao,
Klaus


David Kastrup wrote:
> storm@cua.dk (Kim F. Storm) writes:
> 
>> I noticed that xml.el has several occurences of code like this:
>> 
>> 	   ((looking-at (concat "<!ENTITY[ \t\n\r]*\\(" xml-name-re
>> 				"\\)[ \t\n\r]*\\(" xml-entity-value-re
>> 				"\\)[ \t\n\r]*>"))
>> 	    (let ((name  (buffer-substring (nth 2 (match-data))
>> 					   (nth 3 (match-data))))
>> 		  (value (buffer-substring (+ (nth 4 (match-data)) 1)
>> 					   (- (nth 5 (match-data)) 1))))
>> 	      (goto-char (nth 1 (match-data)))
>> 
>> 
>> Using (match-data) like that is VERY, VERY inefficient.
> 
> Uh, that is only half the story.  It is not just devastatingly
> inefficient by itself.  Every single such call creates a whole slew of
> markers, and those slow down _any_ text manipulation in the buffer
> _afterwards_ until they get garbage-collected at some indeterminate
> point of time in the future.
> 
> The code passage above creates 30 new markers _every_ time it is run.
> All of these are maintained for every insertion/deletion in the buffer
> until garbage collection finally removes them.
> 
>> Use either match-beginning/match-end, or match-string instead.
> 
> Yes, yes, YES!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inefficient code in xml.el
  2005-06-06 12:41 Inefficient code in xml.el klaus.berndl
@ 2005-06-06 17:18 ` Kevin Rodgers
  0 siblings, 0 replies; 4+ messages in thread
From: Kevin Rodgers @ 2005-06-06 17:18 UTC (permalink / raw)


klaus.berndl@sdm.de wrote:
 > Well, i get the point, but how an elisp-programmer should know this?!
 >
 > The manual says:
 >
 >  - Function: match-data
 >      This function returns a newly constructed list containing all the
 >      information on what text the last search matched.  Element zero is
 >      the position of the beginning of the match for the whole
 >      expression; element one is the position of the end of the match
 >      for the expression.  The next two elements are the positions of
 >      the beginning and end of the match for the first subexpression,
 >      and so on.  In general, element number 2N corresponds to
 >      `(match-beginning N)'; and element number 2N + 1 corresponds to
 >      `(match-end N)'.
 >
 >      All the elements are markers or `nil' if matching was done on a
 >      buffer, and all are integers or `nil' if matching was done on a
 >      string with `string-match'.
 >
 >      As always, there must be no possibility of intervening searches
 >      between the call to a search function and the call to `match-data'
 >      that is intended to access the match data for that search.
 >
 >           (match-data)
 >                =>  (#<marker at 9 in foo>
 >                     #<marker at 17 in foo>
 >                     #<marker at 13 in foo>
 >                     #<marker at 17 in foo>)
 >
 > What about to mention such inefficiency-problems in the documentation?!
 > The manual only says "... Corresponds to ....". IMHO the documentation
 > must mention the performnace-topic if it is so important you wrote in
 > these postings!

Just change the first sentence to:

	This function returns a newly constructed list of newly
	allocated markers, containing all the information on what text
	the last search matched.

A reference to the Markers node there might be useful.  And this bit
from the Markers node should be emphasized:

	Insertion and deletion in a buffer must check all the markers
	and relocate them if necessary.  This slows processing in a
	buffer with a large number of markers.

-- 
Kevin Rodgers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-06-06 17:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-06 12:41 Inefficient code in xml.el klaus.berndl
2005-06-06 17:18 ` Kevin Rodgers
  -- strict thread matches above, loose matches on Subject: below --
2005-06-06 11:51 Kim F. Storm
2005-06-06 12:02 ` David Kastrup

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).