* elisp optimization question
@ 2008-05-08 18:03 brad clawsie
2008-05-08 22:14 ` Lennart Borgman (gmail)
0 siblings, 1 reply; 8+ messages in thread
From: brad clawsie @ 2008-05-08 18:03 UTC (permalink / raw)
To: help-gnu-emacs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
hi, i use the following function to translate unicode and other
entities found on the web into ascii that i can view in emacs-w3m. i
am concerned that each search and replace as done in my example is
inefficient, is there a better way to do this? i.e., is there a better
way to group search/replace pairs? thanks in advance!
(defun w3m-filter-brad (url)
(goto-char (point-min))
(while (re-search-forward "»" nil t)
(replace-match ">>"))
(goto-char (point-min))
(while (re-search-forward "’" nil t)
(replace-match "'"))
(goto-char (point-min))
(while (re-search-forward "“" nil t)
(replace-match "\""))
(goto-char (point-min))
(while (re-search-forward "”" nil t)
(replace-match "\""))
(goto-char (point-min))
(while (re-search-forward "—" nil t)
(replace-match "-"))
(goto-char (point-min))
(while (re-search-forward "«" nil t)
(replace-match "<"))
(goto-char (point-min))
(while (re-search-forward "»" nil t)
(replace-match ">"))
(goto-char (point-min))
(while (re-search-forward "ö" nil t)
(replace-match "o"))
)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)
iEYEARECAAYFAkgjQIwACgkQxRg3RkRK91MO8gCgqJHsYhE/3bUERIeVztOkABUI
xy0An3rk59o/OCHfaOlSVmM3zBdTgUXQ
=lwIH
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
2008-05-08 18:03 brad clawsie
@ 2008-05-08 22:14 ` Lennart Borgman (gmail)
2008-05-09 1:42 ` Kevin Rodgers
0 siblings, 1 reply; 8+ messages in thread
From: Lennart Borgman (gmail) @ 2008-05-08 22:14 UTC (permalink / raw)
To: brad clawsie; +Cc: help-gnu-emacs
brad clawsie wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> hi, i use the following function to translate unicode and other
> entities found on the web into ascii that i can view in emacs-w3m. i
> am concerned that each search and replace as done in my example is
> inefficient, is there a better way to do this? i.e., is there a better
> way to group search/replace pairs? thanks in advance!
>
> (defun w3m-filter-brad (url)
> (goto-char (point-min))
> (while (re-search-forward "»" nil t)
> (replace-match ">>"))
> (goto-char (point-min))
> (while (re-search-forward "’" nil t)
> (replace-match "'"))
> (goto-char (point-min))
> (while (re-search-forward "“" nil t)
> (replace-match "\""))
> (goto-char (point-min))
> (while (re-search-forward "”" nil t)
> (replace-match "\""))
> (goto-char (point-min))
> (while (re-search-forward "—" nil t)
> (replace-match "-"))
> (goto-char (point-min))
> (while (re-search-forward "«" nil t)
> (replace-match "<"))
> (goto-char (point-min))
> (while (re-search-forward "»" nil t)
> (replace-match ">"))
> (goto-char (point-min))
> (while (re-search-forward "ö" nil t)
> (replace-match "o"))
> )
When you write it the way you do you do not need re-search-forward, just
search-forward since you search for strings, not regular expressions.
Another way to make it faster would perhaps be to make one regular
expression with regexp-opt and then check the match.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
[not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
@ 2008-05-09 0:00 ` harven
2008-05-09 1:45 ` Kevin Rodgers
[not found] ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
2008-05-09 0:36 ` Xah
1 sibling, 2 replies; 8+ messages in thread
From: harven @ 2008-05-09 0:00 UTC (permalink / raw)
To: help-gnu-emacs
hi,
you can save some typing by using an alist. Here is what i use to
convert
accented-letters into html and back.
(defun accent-html (prefix)
"Accented letter translation é -> é.
With an argument, reverse é <- é.
Works on the whole buffer"
(interactive "P")
(save-excursion
(let ((association
'(("É" . "É") ("á" . "á") ("à" . "à")
("â" . "â") ("ä" . "ä") (""" . "ã")
("é" . "é") ("è" . "è") ("ê" . "ê")
("ë" . "ë") ("í" . "í") ("ì" . "ì")
("î" . "î") ("ï" . "ï") ("ñ" . "ñ")
("ó" . "ó") ("ò" . "ò") ("ô" . "ô")
("ö" . "ö") ("ı" . "õ") ("ú" .
"ú")
("ù" . "ù") ("û" . "û") ("ü" . "ü")
("ç" . "ç")))
(case-fold-search nil))
(dolist (paire association)
(when prefix
(setq paire (cons (cdr paire) (car paire))))
(goto-char (point-min))
(while (search-forward (car paire) nil t)
(replace-match (cdr paire) nil t))))))
This is not more efficient than your own defun. If you only want to
translate characters, the function (subst-char-in-region) is a
primitive
that saves a while loop and is probably faster.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
[not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
2008-05-09 0:00 ` elisp optimization question harven
@ 2008-05-09 0:36 ` Xah
1 sibling, 0 replies; 8+ messages in thread
From: Xah @ 2008-05-09 0:36 UTC (permalink / raw)
To: help-gnu-emacs
On May 8, 11:03 am, brad clawsie <claw...@fastmail.fm> wrote:
> hi, i use the following function to translate unicode and other
> entities found on the web into ascii that i can view in emacs-w3m. i
> am concerned that each search and replace as done in my example is
> inefficient, is there a better way to do this? i.e., is there a better
> way to group search/replace pairs? thanks in advance!
>
> (defun w3m-filter-brad (url)
> (goto-char (point-min))
> (while (re-search-forward "»" nil t)
> (replace-match ">>"))
> (goto-char (point-min))
> (while (re-search-forward "’" nil t)
> (replace-match "'"))
...
> )
I had similar problem and also thought about the efficiency or
different implementation issues.
Here's a alternative implementation. The idea is that instead of
working on buffer, you grab them into a string, and do replacement on
the string, then put them back in buffer. I haven't tested whether it
is faster, but i think David Kastrup mentioned in the past that
working on string is slower.
(defun fold (f x li)
"Recursively apply (f x i), where i is the ith element in the list
li.\n
For example, (fold f x '(1 2)) returns (f (f x 1) 2)"
(let ((li2 li) (ele) (x2 x))
(while (setq ele (pop li2))
(setq x2 (funcall f x2 ele))
)
x2
)
)
(defun replace-string-pairs (str pairs)
"Replace the string str repeatedy by the list pairs.\n
Example: (replace-string-pairs \"yes or no\"
'( (\"yes\" \"no\") (\"no\" \"n\") ) )
⇒ \"n or n\""
(fold (lambda (x y) ""
(replace-regexp-in-string
(nth 0 y) (nth 1 y) x) ) str pairs) )
you might use replace-string instead of replace-regexp-in-string.
--------------------
Also, the following are 3 different implementations.
The first is same as yours except in works on region, by first narrow-
to-region. The second is avoided the narrow-to-region by grabing the
region as string and work on the string. Since i heard that working on
string is slower, and since i want to avoid narrow-to-region, i
thougth of using a temp buffer instead. That's the third solution,
which i believe to be the best.
However, at the time either the 2nd or the 3rd solution had a bug, so
i switched back to the first. I haven't had time to investigate what
was the problem.
(defun replace-string-pairs-region (start end mylist)
"Replace string pairs in region.
Example syntax:
(replace-string-pairs-region start end '((\"alpha\" \"α\") (\"beta\"
\"β\")))
The search string and replace string are all literal."
(save-restriction
(narrow-to-region start end)
(mapc
(lambda (arg)
(goto-char (point-min))
(while (search-forward (car arg) nil t) (replace-match (cadr
arg) t t) ))
mylist)))
(defun replace-string-pairs-region2 (start end mylist)
"Replace string pairs in region.
Same as replace-string-pairs-region but with different implementation.
This implementation does not use narrow-to-region or save-restriction.
Is cleaner in a sense."
(let (mystr)
(setq mystr (buffer-substring start end))
(mapc
(lambda (x) (setq mystr (replace-regexp-in-string (car x) (cadr
x) mystr)))
mylist)
(delete-region start end)
(insert mystr)
)
)
(defun replace-string-pairs-region3 (start end mylist)
"Replace string pairs in region.
Same as replace-string-pairs-region but with different
implementation."
(let (mystr tempbuff)
(setq mystr (buffer-substring start end))
(setq tempbuff (concat " " (random)))
(save-current-buffer
(set-buffer (get-buffer-create tempbuff))
(insert mystr)
(mapc
(lambda (arg)
(goto-char (point-min))
(while (search-forward (car arg) nil t) (replace-match (cadr
arg) t t) ))
mylist)
(kill-buffer tempbuff)
)
(delete-region start end)
(insert mystr)
)
)
Xah
xah@xahlee.org
∑ http://xahlee.org/
☄
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
2008-05-08 22:14 ` Lennart Borgman (gmail)
@ 2008-05-09 1:42 ` Kevin Rodgers
0 siblings, 0 replies; 8+ messages in thread
From: Kevin Rodgers @ 2008-05-09 1:42 UTC (permalink / raw)
To: help-gnu-emacs
Lennart Borgman (gmail) wrote:
> Another way to make it faster would perhaps be to make one regular
> expression with regexp-opt and then check the match.
That's a good suggestion, and it led me to look into regexp-opt for the
first time. But how do I get it to capture just the variant part of the
matched strings in "\\( ... \\)" i.e. excluding any common prefix or
suffix?
E.g.
(regexp-opt '("»" "’")) =>
"&#\\(?:\\(?:18\\|821\\)7;\\)"
(regexp-opt '("»" "’") t) =>
"\\(&#\\(?:\\(?:18\\|821\\)7;\\)\\)"
But what I'd like it to return is "&#\\(\\(?:18\\|821\\)7;\\)"
so that (match-string 1) would return just "187" or "8217".
--
Kevin Rodgers
Denver, Colorado, USA
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
2008-05-09 0:00 ` elisp optimization question harven
@ 2008-05-09 1:45 ` Kevin Rodgers
[not found] ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 8+ messages in thread
From: Kevin Rodgers @ 2008-05-09 1:45 UTC (permalink / raw)
To: help-gnu-emacs
harven wrote:
> hi,
> you can save some typing by using an alist. Here is what i use to
> convert
> accented-letters into html and back.
>
> (defun accent-html (prefix)
> "Accented letter translation é -> é.
> With an argument, reverse é <- é.
> Works on the whole buffer"
> (interactive "P")
> (save-excursion
> (let ((association
> '(("É" . "É") ("á" . "á") ("à" . "à")
> ("â" . "â") ("ä" . "ä") (""" . "ã")
> ("é" . "é") ("è" . "è") ("ê" . "ê")
> ("ë" . "ë") ("í" . "í") ("ì" . "ì")
> ("î" . "î") ("ï" . "ï") ("ñ" . "ñ")
> ("ó" . "ó") ("ò" . "ò") ("ô" . "ô")
> ("ö" . "ö") ("ı" . "õ") ("ú" .
> "ú")
> ("ù" . "ù") ("û" . "û") ("ü" . "ü")
> ("ç" . "ç")))
> (case-fold-search nil))
> (dolist (paire association)
> (when prefix
> (setq paire (cons (cdr paire) (car paire))))
> (goto-char (point-min))
> (while (search-forward (car paire) nil t)
> (replace-match (cdr paire) nil t))))))
Even faster than an alist is a hash table.
--
Kevin Rodgers
Denver, Colorado, USA
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
[not found] ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
@ 2008-05-09 9:59 ` Rupert Swarbrick
2008-05-09 21:43 ` harven
0 siblings, 1 reply; 8+ messages in thread
From: Rupert Swarbrick @ 2008-05-09 9:59 UTC (permalink / raw)
To: help-gnu-emacs
Kevin Rodgers <kevin.d.rodgers@gmail.com> writes:
> harven wrote:
>> hi,
>> you can save some typing by using an alist. Here is what i use to
>> convert
>> accented-letters into html and back.
>>
>> (defun accent-html (prefix)
>> "Accented letter translation é -> é.
>> With an argument, reverse é <- é.
>> Works on the whole buffer"
>> (interactive "P")
>> (save-excursion
>> (let ((association
>> '(("É" . "É") ("á" . "á") ("à" . "à")
>> ("â" . "â") ("ä" . "ä") (""" . "ã")
>> ("é" . "é") ("è" . "è") ("ê" . "ê")
>> ("ë" . "ë") ("í" . "í") ("ì" . "ì")
>> ("î" . "î") ("ï" . "ï") ("ñ" . "ñ")
>> ("ó" . "ó") ("ò" . "ò") ("ô" . "ô")
>> ("ö" . "ö") ("ı" . "õ") ("ú" .
>> "ú")
>> ("ù" . "ù") ("û" . "û") ("ü" . "ü")
>> ("ç" . "ç")))
>> (case-fold-search nil))
>> (dolist (paire association)
>> (when prefix
>> (setq paire (cons (cdr paire) (car paire))))
>> (goto-char (point-min))
>> (while (search-forward (car paire) nil t)
>> (replace-match (cdr paire) nil t))))))
>
> Even faster than an alist is a hash table.
>
Huh? In this code, he's iterating over the alist (which is pretty fast
- there's only a small, fixed number of items). For each element of
this alist, he's doing a search/replace. Each of those is expensive.
The data structure he uses for association is thus completely
irrelevant. Not sure that this is the best approach, but your
criticism definitely doesn't hold.
I wonder whether one could use that alist to "build" a regexp which
you could use with regexp-replace: you could use the \, syntax to add
lisp code to the stuff run.
Rupert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: elisp optimization question
2008-05-09 9:59 ` Rupert Swarbrick
@ 2008-05-09 21:43 ` harven
0 siblings, 0 replies; 8+ messages in thread
From: harven @ 2008-05-09 21:43 UTC (permalink / raw)
To: help-gnu-emacs
> I wonder whether one could use that alist to "build" a regexp which
> you could use with regexp-replace: you could use the \, syntax to add
> lisp code to the stuff run.
>
> Rupert
Here is a short command that take advantage of the advices in the
previous posts.
(setq my-alist '(
("»" . ">>") ("ö" . "o") ("—" . "-")))
(setq html-regexp (regexp-opt (mapcar 'car my-alist)))
(defun w3m-filter ()
(interactive)
(goto-char (point-min))
(while (re-search-forward html-regexp nil t)
(replace-match
(cdr (assoc (match-string 0) my-alist)) nil t)))
I don't know how to pass interactively the values of the html-regexp
variable to the M-% command, though.
It's a bit strange to use regexp here. The tree structure given by a
keymap would be better I think. If the keymap would
insert a non valid prefix key sequence instead of reporting an error,
we could just actually read the html file with a keymap
binding the "—" key sequence to "-" etc.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-09 21:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.11345.1210283700.18990.help-gnu-emacs@gnu.org>
2008-05-09 0:00 ` elisp optimization question harven
2008-05-09 1:45 ` Kevin Rodgers
[not found] ` <mailman.11354.1210297808.18990.help-gnu-emacs@gnu.org>
2008-05-09 9:59 ` Rupert Swarbrick
2008-05-09 21:43 ` harven
2008-05-09 0:36 ` Xah
2008-05-08 18:03 brad clawsie
2008-05-08 22:14 ` Lennart Borgman (gmail)
2008-05-09 1:42 ` Kevin Rodgers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).