all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Any faster way to find frequency of words?
@ 2021-05-09 14:38 Jean Louis
  2021-05-09 14:56 ` Eric Abrahamsen
  2021-05-09 15:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
  0 siblings, 2 replies; 15+ messages in thread
From: Jean Louis @ 2021-05-09 14:38 UTC (permalink / raw)
  To: Help GNU Emacs

I am interested if there is some better way for Emacs Lisp to find
frequency of words.

Purpose is to create HTML clickable tag clouds similar to image tag
clouds. But I will invoke Perl from Emacs to generate it. For that, I
have to analyze the text first.

(setq text "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a diam
lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit. Maecenas congue ligula ac quam
viverra nec consectetur ante hendrerit..")

(defun text-alphabetic-only (text)
  "Return alphabetic characters from TEXT."
  (replace-regexp-in-string "[^[:alpha:]]" " " text))

(defun word-frequency (text &optional length)
  "Returns word frequency as hash from TEXT."
  (let* ((hash (make-hash-table :test 'equal))
	 (text (text-alphabetic-only text))
	 (words (split-string text " " t " ")))
    (mapc (lambda (word)
	    (when (> (length word) 2)
	      (let ((word (downcase word)))
		(if (numberp (gethash word hash))
		    (puthash word (1+ (gethash word hash)) hash)
		  (puthash word 1 hash)))))
	  words)
    hash))

(word-frequency text) ⇒ #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("lorem" 1 "ipsum" 2 "dolor" 1 "sit" 2 "amet" 2 "consectetur" 3 "adipiscing" 1 "elit" 1 "donec" 1 "diam" 1 "lectus" 1 "sed" 1 "mauris" 1 "maecenas" 2 "congue" 2 "ligula" 2 "quam" 2 "viverra" 2 "nec" 2 "ante" 2 "hendrerit" 2))



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-10 17:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-05-09 14:38 Any faster way to find frequency of words? Jean Louis
2021-05-09 14:56 ` Eric Abrahamsen
2021-05-09 15:05   ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 17:16   ` Jean Louis
2021-05-10  3:37     ` Eric Abrahamsen
2021-05-10  7:14       ` Jean Louis
2021-05-10 14:02         ` [External] : " Drew Adams
2021-05-10 16:26           ` Jean Louis
2021-05-10 16:34             ` Drew Adams
2021-05-10 17:05               ` Jean Louis
2021-05-09 15:02 ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 17:19   ` Jean Louis
2021-05-09 18:00     ` Emanuel Berg via Users list for the GNU Emacs text editor
2021-05-09 19:03       ` Jean Louis
2021-05-09 23:33         ` Emanuel Berg via Users list for the GNU Emacs text editor

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.