From: Jean Louis <bugs@gnu.support>
To: Help GNU Emacs <help-gnu-emacs@gnu.org>
Subject: Any faster way to find frequency of words?
Date: Sun, 09 May 2021 17:38:05 +0300 [thread overview]
Message-ID: <courier.000000006097F3E3.00004125@stw1.rcdrun.com> (raw)
I am interested if there is some better way for Emacs Lisp to find
frequency of words.
Purpose is to create HTML clickable tag clouds similar to image tag
clouds. But I will invoke Perl from Emacs to generate it. For that, I
have to analyze the text first.
(setq text "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam
lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit..")
(defun text-alphabetic-only (text)
"Return alphabetic characters from TEXT."
(replace-regexp-in-string "[^[:alpha:]]" " " text))
(defun word-frequency (text &optional length)
"Returns word frequency as hash from TEXT."
(let* ((hash (make-hash-table :test 'equal))
(text (text-alphabetic-only text))
(words (split-string text " " t " ")))
(mapc (lambda (word)
(when (> (length word) 2)
(let ((word (downcase word)))
(if (numberp (gethash word hash))
(puthash word (1+ (gethash word hash)) hash)
(puthash word 1 hash)))))
words)
hash))
(word-frequency text) ⇒ #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("lorem" 1 "ipsum" 2 "dolor" 1 "sit" 2 "amet" 2 "consectetur" 3 "adipiscing" 1 "elit" 1 "donec" 1 "diam" 1 "lectus" 1 "sed" 1 "mauris" 1 "maecenas" 2 "congue" 2 "ligula" 2 "quam" 2 "viverra" 2 "nec" 2 "ante" 2 "hendrerit" 2))
next reply other threads:[~2021-05-09 14:38 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-09 14:38 Jean Louis [this message]
2021-05-09 14:56 ` Any faster way to find frequency of words? Eric Abrahamsen
2021-05-09 15:05 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 17:16 ` Jean Louis
2021-05-10 3:37 ` Eric Abrahamsen
2021-05-10 7:14 ` Jean Louis
2021-05-10 14:02 ` [External] : " Drew Adams
2021-05-10 16:26 ` Jean Louis
2021-05-10 16:34 ` Drew Adams
2021-05-10 17:05 ` Jean Louis
2021-05-09 15:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 17:19 ` Jean Louis
2021-05-09 18:00 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 19:03 ` Jean Louis
2021-05-09 23:33 ` Emanuel Berg via Users list for the GNU Emacs text editor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=courier.000000006097F3E3.00004125@stw1.rcdrun.com \
--to=bugs@gnu.support \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.