unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* elisp optimization question
@ 2008-05-08 18:03 brad clawsie
  2008-05-08 22:14 ` Lennart Borgman (gmail)
  0 siblings, 1 reply; 8+ messages in thread
From: brad clawsie @ 2008-05-08 18:03 UTC (permalink / raw)
  To: help-gnu-emacs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

hi, i use the following function to translate unicode and other
entities found on the web into ascii that i can view in emacs-w3m. i
am concerned that each search and replace as done in my example is
inefficient, is there a better way to do this? i.e., is there a better
way to group search/replace pairs? thanks in advance!

(defun w3m-filter-brad (url)
  (goto-char (point-min))
  (while (re-search-forward "»" nil t)
    (replace-match ">>"))
  (goto-char (point-min))
  (while (re-search-forward "’" nil t)
    (replace-match "'"))
  (goto-char (point-min))
  (while (re-search-forward "“" nil t)
    (replace-match "\""))
  (goto-char (point-min))
  (while (re-search-forward "”" nil t)
    (replace-match "\""))
  (goto-char (point-min))
  (while (re-search-forward "—" nil t)
    (replace-match "-"))
  (goto-char (point-min))
  (while (re-search-forward "«" nil t)
    (replace-match "<"))
  (goto-char (point-min))
  (while (re-search-forward "»" nil t)
    (replace-match ">"))
  (goto-char (point-min))
  (while (re-search-forward "ö" nil t)
    (replace-match "o"))
  )
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkgjQIwACgkQxRg3RkRK91MO8gCgqJHsYhE/3bUERIeVztOkABUI
xy0An3rk59o/OCHfaOlSVmM3zBdTgUXQ
=lwIH
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
  2008-05-08 18:03 brad clawsie
@ 2008-05-08 22:14 ` Lennart Borgman (gmail)
  2008-05-09  1:42   ` Kevin Rodgers
  0 siblings, 1 reply; 8+ messages in thread
From: Lennart Borgman (gmail) @ 2008-05-08 22:14 UTC (permalink / raw)
  To: brad clawsie; +Cc: help-gnu-emacs

brad clawsie wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> hi, i use the following function to translate unicode and other
> entities found on the web into ascii that i can view in emacs-w3m. i
> am concerned that each search and replace as done in my example is
> inefficient, is there a better way to do this? i.e., is there a better
> way to group search/replace pairs? thanks in advance!
> 
> (defun w3m-filter-brad (url)
>   (goto-char (point-min))
>   (while (re-search-forward "»" nil t)
>     (replace-match ">>"))
>   (goto-char (point-min))
>   (while (re-search-forward "’" nil t)
>     (replace-match "'"))
>   (goto-char (point-min))
>   (while (re-search-forward "“" nil t)
>     (replace-match "\""))
>   (goto-char (point-min))
>   (while (re-search-forward "”" nil t)
>     (replace-match "\""))
>   (goto-char (point-min))
>   (while (re-search-forward "—" nil t)
>     (replace-match "-"))
>   (goto-char (point-min))
>   (while (re-search-forward "«" nil t)
>     (replace-match "<"))
>   (goto-char (point-min))
>   (while (re-search-forward "»" nil t)
>     (replace-match ">"))
>   (goto-char (point-min))
>   (while (re-search-forward "ö" nil t)
>     (replace-match "o"))
>   )


When you write it the way you do you do not need re-search-forward, just 
search-forward since you search for strings, not regular expressions.

Another way to make it faster would perhaps be to make one regular 
expression with regexp-opt and then check the match.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
       [not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
@ 2008-05-09  0:00 ` harven
  2008-05-09  1:45   ` Kevin Rodgers
       [not found]   ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
  2008-05-09  0:36 ` Xah
  1 sibling, 2 replies; 8+ messages in thread
From: harven @ 2008-05-09  0:00 UTC (permalink / raw)
  To: help-gnu-emacs

hi,
you can save some typing by using an alist. Here is what i use to
convert
accented-letters into html and back.

(defun accent-html (prefix)
 "Accented letter translation     é -> &eacute.
  With an argument,  reverse    é <- &eacute.
  Works on the whole buffer"
 (interactive "P")
 (save-excursion
   (let ((association
          '(("É" . "&Eacute;") ("á" . "&aacute;")  ("à" . "&agrave;")
            ("â" . "&acirc;")  ("ä" . "&auml;")    (""" . "&atilde;")
            ("é" . "&eacute;") ("è" . "&egrave;")  ("ê" . "&ecirc;")
            ("ë" . "&euml;")   ("í" . "&iacute;")  ("ì" . "&igrave;")
            ("î" . "&icirc;")  ("ï" . "&iuml;")    ("ñ" . "&ntilde;")
            ("ó" . "&oacute;") ("ò" . "&ograve;")  ("ô" . "&ocirc;")
            ("ö" . "&ouml;")   ("ı" . "&otilde;")  ("ú" .
"&uacute;")
            ("ù" . "&ugrave;") ("û" . "&ucirc;")   ("ü" . "&uuml;")
            ("ç" . "&ccedil;")))
        (case-fold-search nil))
   (dolist (paire association)
     (when prefix
       (setq paire (cons (cdr paire) (car paire))))
     (goto-char (point-min))
     (while (search-forward (car paire) nil t)
         (replace-match (cdr paire) nil t))))))

This is not more efficient than your own defun. If you only want to
translate characters, the function (subst-char-in-region) is a
primitive
that saves a while loop and is probably faster.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
       [not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
  2008-05-09  0:00 ` elisp optimization question harven
@ 2008-05-09  0:36 ` Xah
  1 sibling, 0 replies; 8+ messages in thread
From: Xah @ 2008-05-09  0:36 UTC (permalink / raw)
  To: help-gnu-emacs

On May 8, 11:03 am, brad clawsie <claw...@fastmail.fm> wrote:
> hi, i use the following function to translate unicode and other
> entities found on the web into ascii that i can view in emacs-w3m. i
> am concerned that each search and replace as done in my example is
> inefficient, is there a better way to do this? i.e., is there a better
> way to group search/replace pairs? thanks in advance!
>
> (defun w3m-filter-brad (url)
>   (goto-char (point-min))
>   (while (re-search-forward "&#187;" nil t)
>     (replace-match "&gt;&gt;"))
>   (goto-char (point-min))
>   (while (re-search-forward "&#8217;" nil t)
>     (replace-match "'"))
...
>   )

I had similar problem and also thought about the efficiency or
different implementation issues.

Here's a alternative implementation. The idea is that instead of
working on buffer, you grab them into a string, and do replacement on
the string, then put them back in buffer. I haven't tested whether it
is faster, but i think David Kastrup mentioned in the past that
working on string is slower.

(defun fold (f x li)
  "Recursively apply (f x i), where i is the ith element in the list
li.\n
For example, (fold f x '(1 2)) returns (f (f x 1) 2)"
  (let ((li2 li) (ele) (x2 x))
    (while (setq ele (pop li2))
      (setq x2 (funcall f x2 ele))
    )
    x2
  )
)

(defun replace-string-pairs (str pairs)
"Replace the string str repeatedy by the list pairs.\n
Example: (replace-string-pairs \"yes or no\"
'( (\"yes\" \"no\") (\"no\" \"n\") ) )
 ⇒  \"n or n\""
(fold (lambda (x y) ""
        (replace-regexp-in-string
         (nth 0 y) (nth 1 y) x) ) str pairs) )

you might use replace-string instead of replace-regexp-in-string.

--------------------

Also, the following are 3 different implementations.

The first is same as yours except in works on region, by first narrow-
to-region. The second is avoided the narrow-to-region by grabing the
region as string and work on the string. Since i heard that working on
string is slower, and since i want to avoid narrow-to-region, i
thougth of using a temp buffer instead. That's the third solution,
which i believe to be the best.

However, at the time either the 2nd or the 3rd solution had a bug, so
i switched back to the first. I haven't had time to investigate what
was the problem.


(defun replace-string-pairs-region (start end mylist)
  "Replace string pairs in region.
Example syntax:
 (replace-string-pairs-region start end '((\"alpha\" \"α\") (\"beta\"
\"β\")))
The search string and replace string are all literal."
  (save-restriction
    (narrow-to-region start end)
    (mapc
      (lambda (arg)
        (goto-char (point-min))
        (while (search-forward (car arg) nil t) (replace-match (cadr
arg) t t) ))
      mylist)))

(defun replace-string-pairs-region2 (start end mylist)
  "Replace string pairs in region.
Same as replace-string-pairs-region but with different implementation.
This implementation does not use narrow-to-region or save-restriction.
Is cleaner in a sense."
  (let (mystr)
    (setq mystr (buffer-substring start end))
    (mapc
     (lambda (x) (setq mystr (replace-regexp-in-string (car x) (cadr
x) mystr)))
     mylist)
    (delete-region start end)
    (insert mystr)
    )
)

(defun replace-string-pairs-region3 (start end mylist)
  "Replace string pairs in region.
Same as replace-string-pairs-region but with different
implementation."
  (let (mystr tempbuff)
    (setq mystr (buffer-substring start end))
    (setq tempbuff (concat " " (random)))
    (save-current-buffer
      (set-buffer (get-buffer-create tempbuff))
      (insert mystr)
      (mapc
       (lambda (arg)
         (goto-char (point-min))
         (while (search-forward (car arg) nil t) (replace-match (cadr
arg) t t) ))
       mylist)
      (kill-buffer tempbuff)
      )
    (delete-region start end)
    (insert mystr)
    )
  )

  Xah
  xah@xahlee.org
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
  2008-05-08 22:14 ` Lennart Borgman (gmail)
@ 2008-05-09  1:42   ` Kevin Rodgers
  0 siblings, 0 replies; 8+ messages in thread
From: Kevin Rodgers @ 2008-05-09  1:42 UTC (permalink / raw)
  To: help-gnu-emacs

Lennart Borgman (gmail) wrote:
> Another way to make it faster would perhaps be to make one regular 
> expression with regexp-opt and then check the match.

That's a good suggestion, and it led me to look into regexp-opt for the
first time.  But how do I get it to capture just the variant part of the
matched strings in "\\( ... \\)" i.e. excluding any common prefix or
suffix?

E.g.

(regexp-opt '("&#187;" "&#8217;")) =>
"&#\\(?:\\(?:18\\|821\\)7;\\)"

(regexp-opt '("&#187;" "&#8217;") t) =>
"\\(&#\\(?:\\(?:18\\|821\\)7;\\)\\)"

But what I'd like it to return is "&#\\(\\(?:18\\|821\\)7;\\)"
so that (match-string 1) would return just "187" or "8217".

-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
  2008-05-09  0:00 ` elisp optimization question harven
@ 2008-05-09  1:45   ` Kevin Rodgers
       [not found]   ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Kevin Rodgers @ 2008-05-09  1:45 UTC (permalink / raw)
  To: help-gnu-emacs

harven wrote:
> hi,
> you can save some typing by using an alist. Here is what i use to
> convert
> accented-letters into html and back.
> 
> (defun accent-html (prefix)
>  "Accented letter translation     é -> &eacute.
>   With an argument,  reverse    é <- &eacute.
>   Works on the whole buffer"
>  (interactive "P")
>  (save-excursion
>    (let ((association
>           '(("É" . "&Eacute;") ("á" . "&aacute;")  ("à" . "&agrave;")
>             ("â" . "&acirc;")  ("ä" . "&auml;")    (""" . "&atilde;")
>             ("é" . "&eacute;") ("è" . "&egrave;")  ("ê" . "&ecirc;")
>             ("ë" . "&euml;")   ("í" . "&iacute;")  ("ì" . "&igrave;")
>             ("î" . "&icirc;")  ("ï" . "&iuml;")    ("ñ" . "&ntilde;")
>             ("ó" . "&oacute;") ("ò" . "&ograve;")  ("ô" . "&ocirc;")
>             ("ö" . "&ouml;")   ("ı" . "&otilde;")  ("ú" .
> "&uacute;")
>             ("ù" . "&ugrave;") ("û" . "&ucirc;")   ("ü" . "&uuml;")
>             ("ç" . "&ccedil;")))
>         (case-fold-search nil))
>    (dolist (paire association)
>      (when prefix
>        (setq paire (cons (cdr paire) (car paire))))
>      (goto-char (point-min))
>      (while (search-forward (car paire) nil t)
>          (replace-match (cdr paire) nil t))))))

Even faster than an alist is a hash table.

-- 
Kevin Rodgers
Denver, Colorado, USA





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
       [not found]   ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
@ 2008-05-09  9:59     ` Rupert Swarbrick
  2008-05-09 21:43       ` harven
  0 siblings, 1 reply; 8+ messages in thread
From: Rupert Swarbrick @ 2008-05-09  9:59 UTC (permalink / raw)
  To: help-gnu-emacs

Kevin Rodgers <kevin.d.rodgers@gmail.com> writes:

> harven wrote:
>> hi,
>> you can save some typing by using an alist. Here is what i use to
>> convert
>> accented-letters into html and back.
>>
>> (defun accent-html (prefix)
>>  "Accented letter translation     é -> &eacute.
>>   With an argument,  reverse    é <- &eacute.
>>   Works on the whole buffer"
>>  (interactive "P")
>>  (save-excursion
>>    (let ((association
>>           '(("É" . "&Eacute;") ("á" . "&aacute;")  ("à" . "&agrave;")
>>             ("â" . "&acirc;")  ("ä" . "&auml;")    (""" . "&atilde;")
>>             ("é" . "&eacute;") ("è" . "&egrave;")  ("ê" . "&ecirc;")
>>             ("ë" . "&euml;")   ("í" . "&iacute;")  ("ì" . "&igrave;")
>>             ("î" . "&icirc;")  ("ï" . "&iuml;")    ("ñ" . "&ntilde;")
>>             ("ó" . "&oacute;") ("ò" . "&ograve;")  ("ô" . "&ocirc;")
>>             ("ö" . "&ouml;")   ("ı" . "&otilde;")  ("ú" .
>> "&uacute;")
>>             ("ù" . "&ugrave;") ("û" . "&ucirc;")   ("ü" . "&uuml;")
>>             ("ç" . "&ccedil;")))
>>         (case-fold-search nil))
>>    (dolist (paire association)
>>      (when prefix
>>        (setq paire (cons (cdr paire) (car paire))))
>>      (goto-char (point-min))
>>      (while (search-forward (car paire) nil t)
>>          (replace-match (cdr paire) nil t))))))
>
> Even faster than an alist is a hash table.
>
Huh? In this code, he's iterating over the alist (which is pretty fast
- there's only a small, fixed number of items). For each element of
this alist, he's doing a search/replace. Each of those is expensive.

The data structure he uses for association is thus completely
irrelevant. Not sure that this is the best approach, but your
criticism definitely doesn't hold.

I wonder whether one could use that alist to "build" a regexp which
you could use with regexp-replace: you could use the \, syntax to add
lisp code to the stuff run.

Rupert


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: elisp optimization question
  2008-05-09  9:59     ` Rupert Swarbrick
@ 2008-05-09 21:43       ` harven
  0 siblings, 0 replies; 8+ messages in thread
From: harven @ 2008-05-09 21:43 UTC (permalink / raw)
  To: help-gnu-emacs

> I wonder whether one could use that alist to "build" a regexp which
> you could use with regexp-replace: you could use the \, syntax to add
> lisp code to the stuff run.
>
> Rupert

Here is a short command that take advantage of the advices in the
previous posts.

(setq my-alist '(
        ("&#187;" . "&gt;&gt;") ("&ouml;" . "o") ("&#8212;" . "-")))

(setq html-regexp (regexp-opt (mapcar 'car my-alist)))

(defun w3m-filter ()
  (interactive)
  (goto-char (point-min))
  (while (re-search-forward html-regexp nil t)
     (replace-match
         (cdr (assoc (match-string 0) my-alist)) nil t)))

I don't know how to pass interactively the values of the html-regexp
variable to the M-% command, though.

It's a bit strange to use regexp here. The tree structure given by a
keymap would be better I think. If the keymap would
insert a non valid prefix key sequence instead of reporting an error,
we could just actually read the html file with a keymap
binding the "&#8212" key sequence to "-"  etc.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-05-09 21:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
2008-05-09  0:00 ` elisp optimization question harven
2008-05-09  1:45   ` Kevin Rodgers
     [not found]   ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
2008-05-09  9:59     ` Rupert Swarbrick
2008-05-09 21:43       ` harven
2008-05-09  0:36 ` Xah
2008-05-08 18:03 brad clawsie
2008-05-08 22:14 ` Lennart Borgman (gmail)
2008-05-09  1:42   ` Kevin Rodgers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).