From: Udyant Wig <udyantw@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Most used words in current buffer
Date: Wed, 18 Jul 2018 15:06:56 +0530 [thread overview]
Message-ID: <pin1no$l8e$1@dont-email.me> (raw)
In-Reply-To: <861sc1iu1m.fsf@zoho.com>
On 07/18/2018 12:11 AM, Emanuel Berg wrote:
> Do it!
>
> But if you can let go of the Elisp requirement here are some examples
> how to do it with everyday GNU/Unix tools:
>
>
https://unix.stackexchange.com/questions/41479/find-n-most-frequent-words-in-a-file
I went ahead and did it. I obtained many solutions, in fact. Only
today did I check the link above.
First, of the solutions in Emacs Lisp, this one came out as the
quickest:
---
(defun buffer-most-used-words-1 (n)
"Make a list of the N most used words in buffer."
(let ((counts (make-hash-table :test #'equal))
(words (split-string (buffer-string)))
sorted-counts)
(dolist (word words)
(let ((count (gethash (downcase word) counts 0)))
(puthash (downcase word) (1+ count) counts)))
(loop for word being the hash-keys of counts
using (hash-values count)
do
(push (list word count) sorted-counts)
finally (setf sorted-counts (cl-sort sorted-counts #'>
:key #'second)))
(mapcar #'first (cl-subseq sorted-counts 0 n))))
---
Briefly, it obtains a list of the strings in the buffer, hashes them,
puts the words and their counts in a list, sorts it, and lists the first
N words. (I had also written solutions (1) using alists; (2) using the
handy AVL tree library I found among the Emacs Lisp files in the Emacs
distribution; and (3) reading the words directly and hashing them. None
beat the above.)
The function is suffixed with '-1' because it is the the core of
another, interactive function, which takes the above generated list and
displays it nicely in another buffer.
I was curious about possible solutions in other languages. I wrote
programs in both Common Lisp and Python, based on the essential hash
table approach. While a lot faster than the Emacs Lisp solution above,
they were left behind by this old Awk solution (also using hashing) I
found in the classic /The Unix Programming Environment/ by Kernighan and
Pike:
---
#!/bin/sh
awk ' { for (i = 1; i <= NF; i++) num[$i]++ }
END { for (word in num) print word, num[word] }
' $* | sort +1 -nr | head -10 | awk '{ print $1 }'
---
I appended the last awk pipeline to only give the words without the
counts. I wrapped it up in an Emacs command to display the words in
another buffer, just like my original Emacs Lisp solution above.
Udyant Wig
--
We make our discoveries through our mistakes: we watch one another's
success: and where there is freedom to experiment there is hope to
improve.
-- Arthur Quiller-Couch
next prev parent reply other threads:[~2018-07-18 9:36 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-17 9:28 Most used words in current buffer Udyant Wig
2018-07-17 18:41 ` Emanuel Berg
2018-07-18 9:36 ` Udyant Wig [this message]
2018-07-18 11:48 ` Emanuel Berg
2018-07-18 14:50 ` Udyant Wig
2018-07-18 16:32 ` Emanuel Berg
2018-07-18 22:39 ` Ben Bacarisse
2018-07-19 0:45 ` Bob Proulx
[not found] ` <mailman.3785.1531961144.1292.help-gnu-emacs@gnu.org>
2018-07-19 5:33 ` Udyant Wig
2018-07-19 7:04 ` Bob Proulx
2018-07-19 7:25 ` tomas
2018-07-19 17:19 ` Nick Dokos
2018-07-19 17:30 ` Eli Zaretskii
2018-07-19 20:08 ` Bob Proulx
2018-07-20 16:39 ` Nick Dokos
[not found] ` <mailman.3909.1532104802.1292.help-gnu-emacs@gnu.org>
2018-07-20 18:13 ` Udyant Wig
2018-07-20 22:24 ` Bob Newell
2018-07-21 0:00 ` Nick Dokos
2018-07-21 0:18 ` Nick Dokos
[not found] ` <mailman.3843.1532030947.1292.help-gnu-emacs@gnu.org>
2018-07-20 6:19 ` Udyant Wig
2018-07-20 23:25 ` Bob Proulx
2018-07-21 0:26 ` Nick Dokos
2018-07-21 4:03 ` Bob Proulx
[not found] ` <mailman.3934.1532129163.1292.help-gnu-emacs@gnu.org>
2018-07-21 13:39 ` Udyant Wig
[not found] ` <mailman.3826.1532020800.1292.help-gnu-emacs@gnu.org>
2018-07-20 5:52 ` Udyant Wig
[not found] ` <mailman.3796.1531983885.1292.help-gnu-emacs@gnu.org>
2018-07-19 13:26 ` Udyant Wig
2018-07-19 20:42 ` Bob Proulx
2018-07-20 3:08 ` Bob Newell
[not found] ` <mailman.3861.1532056120.1292.help-gnu-emacs@gnu.org>
2018-07-21 12:51 ` Udyant Wig
2018-07-21 16:15 ` Eric Abrahamsen
[not found] ` <mailman.3982.1532189751.1292.help-gnu-emacs@gnu.org>
2018-07-21 19:46 ` Udyant Wig
2018-07-22 3:57 ` Eric Abrahamsen
2018-07-22 4:00 ` Eric Abrahamsen
2018-07-22 4:05 ` Eric Abrahamsen
[not found] ` <mailman.4008.1532232144.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:28 ` Udyant Wig
2018-07-22 20:05 ` Eric Abrahamsen
[not found] ` <mailman.4007.1532231884.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:19 ` Udyant Wig
[not found] ` <mailman.3845.1532032966.1292.help-gnu-emacs@gnu.org>
2018-07-20 13:18 ` Udyant Wig
2018-07-21 18:22 ` Stefan Monnier
2018-07-22 9:02 ` tomas
2018-07-23 6:09 ` Bob Proulx
2018-07-23 7:34 ` tomas
[not found] ` <mailman.4074.1532326162.1292.help-gnu-emacs@gnu.org>
2018-07-23 7:26 ` Udyant Wig
[not found] ` <mailman.4013.1532250176.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:58 ` Udyant Wig
[not found] ` <mailman.3991.1532197378.1292.help-gnu-emacs@gnu.org>
2018-07-21 19:39 ` Udyant Wig
2018-07-21 20:54 ` Stefan Monnier
[not found] ` <mailman.3995.1532206511.1292.help-gnu-emacs@gnu.org>
2018-07-22 18:43 ` Udyant Wig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pin1no$l8e$1@dont-email.me' \
--to=udyantw@gmail.com \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.